Quickstart

Convert a PDF to Markdown
Save the Output to a File
Process Specific Pages
Extract as Page Chunks
What Happens Under the Hood
Next Steps

Convert a PDF to Markdown

import pymupdf4llm

md_text = pymupdf4llm.to_markdown("my-document.pdf")
print(md_text)

That’s it. PyMuPDF4LLM reads every page, extracts content in reading order, and returns a single Markdown string.

Save the Output to a File

To write the result to a .md file, pass the output to Python’s built-in pathlib:

import pymupdf4llm
from pathlib import Path

md_text = pymupdf4llm.to_markdown("my-document.pdf")
Path("output.md").write_text(md_text)

write_text automatically uses UTF-8 encoding when writing Markdown files, ensuring special characters and symbols are preserved correctly.

Process Specific Pages

To extract only a subset of pages, pass a list of zero-based page numbers:

md_text = pymupdf4llm.to_markdown("my-document.pdf", pages=[0, 1, 2])

Extract as Page Chunks

For RAG pipelines and LLM ingestion, page_chunks=True returns a list of dictionaries — one per page — with the text and metadata:

chunks = pymupdf4llm.to_markdown("my-document.pdf", page_chunks=True)

for chunk in chunks:
    print(chunk["metadata"]["page"])  # page number
    print(chunk["text"])              # Markdown content

Each chunk includes bounding box data, page dimensions, and document metadata. See Chunk Schema for the full schema.

What Happens Under the Hood

When you call to_markdown(), PyMuPDF4LLM:

Opens the document with PyMuPDF
Analyses the layout of each page — detecting columns, headings, tables, and images
Reconstructs reading order from the visual structure
Detects pages with no selectable text and triggers OCR automatically if installed
Returns the result as a Markdown string or list of chunk dictionaries

Next Steps

Supported Formats

See every supported input and output format.

Saving Output

Write .md, .json, and .txt files with pathlib.

Installation Supported Formats

Getting Started

Guides

Integrations

Reference

Convert a PDF to Markdown

Save the Output to a File

Process Specific Pages

Extract as Page Chunks

What Happens Under the Hood

Next Steps

Supported Formats

Saving Output

Getting Started

Guides

Integrations

Reference

​Convert a PDF to Markdown

​Save the Output to a File

​Process Specific Pages

​Extract as Page Chunks

​What Happens Under the Hood

​Next Steps

Supported Formats

Saving Output

Convert a PDF to Markdown

Save the Output to a File

Process Specific Pages

Extract as Page Chunks

What Happens Under the Hood

Next Steps