Signature
Parameters
Path to the document file, or an already-opened
pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.Additional parameters are shared with
to_markdown(). See the to_markdown() API reference for details.Returns
A list of page objects, one per extracted page. See JSON Schema for the full field reference.See Extract JSON for detailed block structure examples.
Raises
| Exception | Condition |
|---|---|
FileNotFoundError | doc is a path string that does not exist |
ValueError | An index in pages is out of range for the document |
ImportError | ocr=True or force_ocr=True but the ocr dependency is not installed |
Examples
Minimal
Iterate over blocks
See Also
Extract JSON Guide
Full walkthrough with bounding boxes, span flags, and pipeline examples.
JSON Schema
Complete field reference for every object in the JSON output.
to_markdown()
Markdown output for LLM ingestion and readable docs.
Tables Guide
Working with table blocks in the JSON output.