Signature
Parameters
Path to the document file, or an already-opened
pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.Additional parameters are shared with
to_markdown(). See the to_markdown() API reference for details.Returns
When
page_chunks=False (default). A single plain text string containing all extracted pages.When
page_chunks=True. A list of dictionaries, one per extracted page, each with the following keys:| Key | Type | Description |
|---|---|---|
text | str | Plain text content of the page |
metadata | dict | Page metadata |
Raises
| Exception | Condition |
|---|---|
FileNotFoundError | doc is a path string that does not exist |
ValueError | An index in pages is out of range for the document |
ImportError | ocr=True or force_ocr=True but the ocr dependency is not installed |
Examples
Minimal
Page chunks
Save to file
See Also
Extract Text Guide
Full guided overview.
to_markdown()
Markdown output preserving document structure.
to_json()
Structured output with bounding boxes and layout data.