Saving Output

Overview

PyMuPDF4LLM’s extraction functions return strings or Python objects — writing them to disk is handled by standard Python. The recommended approach is pathlib.Path, which is clean, cross-platform, and available in the standard library with no additional dependencies.

Saving Markdown

import pymupdf4llm
from pathlib import Path

md_text = pymupdf4llm.to_markdown("document.pdf")
Path("output.md").write_text(md_text, encoding="utf-8")

Always specify encoding="utf-8" when writing text files to ensure special characters, symbols, and non-Latin scripts are preserved correctly.

Saving JSON

Use Python’s built-in json module to serialise the output before writing:

import pymupdf4llm
import json
from pathlib import Path

data = pymupdf4llm.to_json("document.pdf")

Path("output.json").write_text(
    json.dumps(data, indent=2, ensure_ascii=False),
    encoding="utf-8"
)

indent=2 produces human-readable JSON. For large documents where file size matters, omit it to write compact single-line JSON:

Path("output.json").write_text(
    json.dumps(data, ensure_ascii=False),
    encoding="utf-8"
)

Saving Plain Text

import pymupdf4llm
from pathlib import Path

text = pymupdf4llm.to_text("document.pdf")
Path("output.txt").write_text(text, encoding="utf-8")

Saving Page Chunks

When using page_chunks=True, you’ll typically want to save each page as a separate file. Use the page number from the chunk metadata to name each file:

import pymupdf4llm
from pathlib import Path

output_dir = Path("output/pages")
output_dir.mkdir(parents=True, exist_ok=True)

chunks = pymupdf4llm.to_markdown("document.pdf", page_chunks=True)

for chunk in chunks:
    page_num = chunk["metadata"]["page"]
    filepath = output_dir / f"page-{page_num}.md"
    filepath.write_text(chunk["text"], encoding="utf-8")
    print(f"Saved {filepath}")

Saving with a Matching Filename

To derive the output filename from the input document automatically:

import pymupdf4llm
from pathlib import Path

input_path = Path("reports/annual-report-2025.pdf")

md_text = pymupdf4llm.to_markdown(str(input_path))

output_path = input_path.with_suffix(".md")
output_path.write_text(md_text, encoding="utf-8")

print(f"Saved to {output_path}")
# Saved to reports/annual-report-2025.md

Path.with_suffix() swaps the file extension cleanly, keeping the same directory and stem.

Saving to a Different Directory

To write output to a different folder while keeping the original filename:

import pymupdf4llm
from pathlib import Path

input_path = Path("source/document.pdf")
output_dir = Path("extracted")
output_dir.mkdir(parents=True, exist_ok=True)

md_text = pymupdf4llm.to_markdown(str(input_path))

output_path = output_dir / input_path.with_suffix(".md").name
output_path.write_text(md_text, encoding="utf-8")

print(f"Saved to {output_path}")
# Saved to extracted/document.md

Processing Multiple Files

To extract and save output for an entire folder of PDFs:

import pymupdf4llm
from pathlib import Path

input_dir = Path("documents/")
output_dir = Path("extracted/")
output_dir.mkdir(parents=True, exist_ok=True)

pdf_files = list(input_dir.glob("*.pdf"))
print(f"Found {len(pdf_files)} PDF(s)")

for pdf_path in pdf_files:
    print(f"Processing {pdf_path.name}...")
    try:
        md_text = pymupdf4llm.to_markdown(str(pdf_path))
        output_path = output_dir / pdf_path.with_suffix(".md").name
        output_path.write_text(md_text, encoding="utf-8")
        print(f"  ✓ Saved to {output_path}")
    except Exception as e:
        print(f"  ✗ Failed: {e}")

print("Done.")

Saving Images Alongside Markdown

When write_images=True is used, images are written to disk automatically during extraction:

import pymupdf4llm
from pathlib import Path

image_dir = Path("output/images")
image_dir.mkdir(parents=True, exist_ok=True)

md_text = pymupdf4llm.to_markdown(
    "document.pdf",
    write_images=True,
    image_path=str(image_dir),
    image_format="png",
    dpi=150
)

Path("output/document.md").write_text(md_text, encoding="utf-8")

Image paths in the Markdown output are relative to wherever the .md file is opened from. Keep your Markdown file and image directory in the same parent folder to ensure image links resolve correctly.

File Format Summary

Output	Function	Extension	Write Method
Markdown	`to_markdown()`	`.md`	`Path.write_text()`
JSON	`to_json()`	`.json`	`json.dumps()` + `Path.write_text()`
Plain text	`to_text()`	`.txt`	`Path.write_text()`
Page chunks	`to_markdown(page_chunks=True)`	`.md` per page	`Path.write_text()` in a loop
Images	`to_markdown(write_images=True)`	`.png` / `.jpeg`	Written automatically

Next Steps

Extract Markdown

Full walkthrough of to_markdown() with all common options.

Extract JSON

Bounding boxes and layout data for custom pipelines.

Extract Text

Plain text extraction and whitespace handling.

Images & Graphics

Controlling image extraction, DPI, format, and output path.

Getting Started

Guides

Integrations

Reference

Overview

Saving Markdown

Saving JSON

Saving Plain Text

Saving Page Chunks

Saving with a Matching Filename

Saving to a Different Directory

Processing Multiple Files

Saving Images Alongside Markdown

File Format Summary

Next Steps

Extract Markdown

Extract JSON

Extract Text

Images & Graphics

Getting Started

Guides

Integrations

Reference

​Overview

​Saving Markdown

​Saving JSON

​Saving Plain Text

​Saving Page Chunks

​Saving with a Matching Filename

​Saving to a Different Directory

​Processing Multiple Files

​Saving Images Alongside Markdown

​File Format Summary

​Next Steps

Extract Markdown

Extract JSON

Extract Text

Images & Graphics

Overview

Saving Markdown

Saving JSON

Saving Plain Text

Saving Page Chunks

Saving with a Matching Filename

Saving to a Different Directory

Processing Multiple Files

Saving Images Alongside Markdown

File Format Summary

Next Steps