to_text()

Signature

pymupdf4llm.to_text(
    doc: str | pymupdf.Document,
    **kwargs
) -> str  | list[dict]

Parameters

doc

str | pymupdf.Document

required

Path to the document file, or an already-opened pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.

**kwargs

various

Additional parameters are shared with to_markdown(). See the to_markdown() API reference for details.

For other parameters, see the shared to_markdown() API reference which applies to all extraction functions.

Returns

str

string

When page_chunks=False (default). A single plain text string containing all extracted pages.

list[dict]

list

When page_chunks=True. A list of dictionaries, one per extracted page, each with the following keys:

Key	Type	Description
`text`	`str`	Plain text content of the page
`metadata`	`dict`	Page metadata

Raises

Exception	Condition
`FileNotFoundError`	`doc` is a path string that does not exist
`ValueError`	An index in `pages` is out of range for the document
`ImportError`	`ocr=True` or `force_ocr=True` but the `ocr` dependency is not installed

Examples

Minimal

import pymupdf4llm

text = pymupdf4llm.to_text("document.pdf")

Page chunks

chunks = pymupdf4llm.to_text("document.pdf", page_chunks=True)
for chunk in chunks:
    print(f"Page {chunk['metadata']['page']}: {len(chunk['text'])} chars")
    print(chunk['text'])

Save to file

from pathlib import Path

text = pymupdf4llm.to_text("document.pdf")
Path("output.txt").write_text(text, encoding="utf-8")

Extract Text Guide

Full guided overview.

to_markdown()

Markdown output preserving document structure.

to_json()

Structured output with bounding boxes and layout data.

Getting Started

Guides

Integrations

Reference

Signature

Parameters

Returns

Raises

Examples

Minimal

Page chunks

Save to file

See Also

Extract Text Guide

to_markdown()

to_json()

Getting Started

Guides

Integrations

Reference

​Signature

​Parameters

​Returns

​Raises

​Examples

​Minimal

​Page chunks

​Save to file

​See Also

Extract Text Guide

to_markdown()

to_json()

Signature

Parameters

Returns

Raises

Examples

Minimal

Page chunks

Save to file

See Also