Installation

Requirements
Basic Installation
Optional Dependencies
OCR Support
Verify Your Installation
Next Steps

Requirements

PyMuPDF4LLM requires Python 3.8+. It is built on top of PyMuPDF, which is installed automatically as a dependency.

Basic Installation

Install PyMuPDF4LLM from PyPI using pip:

pip install pymupdf4llm

This gives you full access to Markdown, JSON, and plain text extraction from document files.

Optional Dependencies

OCR Support

Enables automatic Optical Character Recognition for PDFs containing scanned or image-based content. Tesseract is included by default. Support for Rapid OCR and Paddle OCR is also available as optional OCR engines and should be installed if required.

OCR is only triggered automatically when PyMuPDF4LLM detects that a page that requires it.See:

Hybrid OCR Strategy
How OCR is Triggered

Verify Your Installation

import pymupdf4llm

print(pymupdf4llm.version)

Next Steps

Quickstart

Convert your first PDF to Markdown in a few lines.

Supported Formats

See all supported input and output formats.

Quickstart

Getting Started

Guides

Integrations

Reference

Requirements

Basic Installation

Optional Dependencies

OCR Support

Verify Your Installation

Next Steps

Quickstart

Supported Formats

Getting Started

Guides

Integrations

Reference

​Requirements

​Basic Installation

​Optional Dependencies

​OCR Support

​Verify Your Installation

​Next Steps

Quickstart

Supported Formats

Requirements

Basic Installation

Optional Dependencies

OCR Support

Verify Your Installation

Next Steps