to_json()

Signature

pymupdf4llm.to_json(
    doc: str | pymupdf.Document,
    **kwargs
) -> list[dict]

Parameters

doc

str | pymupdf.Document

required

Path to the document file, or an already-opened pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.

**kwargs

various

Additional parameters are shared with to_markdown(). See the to_markdown() API reference for details.

For other parameters, see the shared to_markdown() API reference which applies to all extraction functions.

Returns

list[dict]

list

A list of page objects, one per extracted page. See JSON Schema for the full field reference.See Extract JSON for detailed block structure examples.

Raises

Exception	Condition
`FileNotFoundError`	`doc` is a path string that does not exist
`ValueError`	An index in `pages` is out of range for the document
`ImportError`	`ocr=True` or `force_ocr=True` but the `ocr` dependency is not installed

Examples

Minimal

import pymupdf4llm

data = pymupdf4llm.to_json("document.pdf")

Iterate over blocks

for page_num, page in enumerate(data.get("pages", [])):
    for block in page.get("boxes", []):
        for line in block.get("textlines", []):
            for span in line.get("spans", []):
                bbox = span.get("bbox", []) # bounding box for this text span
                text = span.get("text", "") # text content of the span
                flags = span.get("flags", 0) # font style flags (bitmask)

Extract JSON Guide

Full walkthrough with bounding boxes, span flags, and pipeline examples.

JSON Schema

Complete field reference for every object in the JSON output.

to_markdown()

Markdown output for LLM ingestion and readable docs.

Tables Guide

Working with table blocks in the JSON output.

Getting Started

Guides

Integrations

Reference

Signature

Parameters

Returns

Raises

Examples

Minimal

Iterate over blocks

See Also

Extract JSON Guide

JSON Schema

to_markdown()

Tables Guide

Getting Started

Guides

Integrations

Reference

​Signature

​Parameters

​Returns

​Raises

​Examples

​Minimal

​Iterate over blocks

​See Also

Extract JSON Guide

JSON Schema

to_markdown()

Tables Guide

Signature

Parameters

Returns

Raises

Examples

Minimal

Iterate over blocks

See Also