Convert a PDF to Markdown
Save the Output to a File
To write the result to a.md file, pass the output to Python’s built-in pathlib:
Process Specific Pages
To extract only a subset of pages, pass a list of zero-based page numbers:Extract as Page Chunks
For RAG pipelines and LLM ingestion,page_chunks=True returns a list of dictionaries — one per page — with the text and metadata:
Each chunk includes bounding box data, page dimensions, and document metadata. See Chunk Schema for the full schema.
What Happens Under the Hood
When you callto_markdown(), PyMuPDF4LLM:
- Opens the document with PyMuPDF
- Analyses the layout of each page — detecting columns, headings, tables, and images
- Reconstructs reading order from the visual structure
- Detects pages with no selectable text and triggers OCR automatically if installed
- Returns the result as a Markdown string or list of chunk dictionaries
Next Steps
Supported Formats
See every supported input and output format.
Saving Output
Write .md, .json, and .txt files with pathlib.