Skip to main content

Requirements

PyMuPDF4LLM requires Python 3.8+. It is built on top of PyMuPDF, which is installed automatically as a dependency.

Basic Installation

Install PyMuPDF4LLM from PyPI using pip:
pip install pymupdf4llm
This gives you full access to Markdown, JSON, and plain text extraction from document files.

Optional Dependencies

OCR Support

Enables automatic Optical Character Recognition for PDFs containing scanned or image-based content. Tesseract is included by default. Support for Rapid OCR and Paddle OCR is also available as optional OCR engines and should be installed if required.
OCR is only triggered automatically when PyMuPDF4LLM detects that a page that requires it.See:

Verify Your Installation

import pymupdf4llm

print(pymupdf4llm.version)

Next Steps

Quickstart

Convert your first PDF to Markdown in a few lines.

Supported Formats

See all supported input and output formats.