# PDF4LLM

## Docs

- [Core Document Links](https://docs.pdf4llm.com/core-docs.md): This page cites the Read The Docs documentation guides for MuPDF software.
- [API](https://docs.pdf4llm.com/dotnet/api/index.md): Complete reference for all PDF4LLM methods and types.
- [FAQ](https://docs.pdf4llm.com/dotnet/getting-started/faq/index.md): Common questions about the `PDF4LLM` package for .NET.
- [Installation](https://docs.pdf4llm.com/dotnet/getting-started/installation/index.md): Install PDF4LLM via NuGet, understand the MuPDF.NET dependency, and resolve the common assembly conflict.
- [Quickstart](https://docs.pdf4llm.com/dotnet/getting-started/quickstart/index.md): Go from zero to a working PDF-to-Markdown conversion in under five minutes.
- [Supported Formats](https://docs.pdf4llm.com/dotnet/getting-started/supported-formats/index.md): Input formats MuPDF.NET can read, and output formats it can produce.
- [OCR](https://docs.pdf4llm.com/dotnet/guides/OCR/index.md): Use Tesseract OCR to extract text from scanned PDFs, image-based pages, and documents where native text selection returns nothing useful.
- [Tesseract Language Packs](https://docs.pdf4llm.com/dotnet/guides/OCR/tesseract-language-packs.md): How to install additional Tesseract language packs on Windows, macOS, and Linux for use with PDF4LLM OCR.
- [Extract JSON](https://docs.pdf4llm.com/dotnet/guides/extract-JSON/index.md): Use [ToJson()](/dotnet/api/PdfExtractor#tojson) to get bounding boxes, layout data, and structured page content for custom pipelines.
- [Extract Markdown](https://docs.pdf4llm.com/dotnet/guides/extract-Markdown/index.md): A full walkthrough of [ToMarkdown()](/dotnet/api/PdfExtractor#tomarkdown) with common options and use cases.
- [Extract Text](https://docs.pdf4llm.com/dotnet/guides/extract-Text/index.md): Use [ToText()](/dotnet/api/PdfExtractor#totext) to get clean, plain text output stripped of all Markdown formatting.
- [Images & Graphics](https://docs.pdf4llm.com/dotnet/guides/images-and-graphics/index.md): Extract embedded images and vector graphics from documents — controlling output path, format, and whether images are written to disk or embedded inline.
- [Page Selection](https://docs.pdf4llm.com/dotnet/guides/page-selection/index.md): Use the pages parameter to extract content from specific pages rather than processing an entire document.
- [Saving Output](https://docs.pdf4llm.com/dotnet/guides/saving-output/index.md): Write extracted Markdown, JSON, and plain text to disk using System.IO.
- [Tables](https://docs.pdf4llm.com/dotnet/guides/tables/index.md): How PDF4LLM detects, extracts, and renders tables as Markdown — and how to access raw table data for custom pipelines.
- [Azure OpenAI](https://docs.pdf4llm.com/dotnet/integrations/azure.md): Chunk PDFs with PDF4LLM and feed them into Azure OpenAI embeddings and chat completions — end-to-end patterns for .NET RAG pipelines.
- [JSON Schema](https://docs.pdf4llm.com/dotnet/reference/JSON-schema.md): Full field reference for the structured output returned by [ToJson()](/dotnet/api/PdfExtractor#tojson).
- [Changelog](https://docs.pdf4llm.com/dotnet/reference/changelog.md): Version history and release notes for PDF4LLM.NET.
- [Chunk Schema](https://docs.pdf4llm.com/dotnet/reference/chunk-schema.md): Full schema for each page chunk returned when `pageChunks=true` is passed to [ToMarkdown()](/dotnet/api/PdfExtractor#tomarkdown) or [ToText()](/dotnet/api/PdfExtractor#totext).
- [API](https://docs.pdf4llm.com/python/api/index.md): Complete reference for all PyMuPDF4LLM functions and classes.
- [FAQ](https://docs.pdf4llm.com/python/getting-started/faq/index.md): Common questions about the `pymupdf4llm` Python library.
- [Installation](https://docs.pdf4llm.com/python/getting-started/installation/index.md): Install PyMuPDF4LLM and its optional dependencies.
- [Quickstart](https://docs.pdf4llm.com/python/getting-started/quickstart/index.md): Convert a PDF to Markdown in a couple of lines of Python.
- [Supported Formats](https://docs.pdf4llm.com/python/getting-started/supported-formats/index.md): Input formats PyMuPDF4LLM can read, and output formats it can produce.
- [OCR](https://docs.pdf4llm.com/python/guides/OCR/index.md): How automatic OCR works in PyMuPDF4LLM, when to force it, and how to swap in a different OCR engine.
- [OCR Plugins](https://docs.pdf4llm.com/python/guides/OCR/plugins.md): How to use OCR engines other than Tesseract with PyMuPDF4LLM, and how to create your own custom OCR plugin.
- [Tesseract Language Packs](https://docs.pdf4llm.com/python/guides/OCR/tesseract-language-packs.md): How to install additional Tesseract language packs on macOS, Linux, and Windows.
- [Extract JSON](https://docs.pdf4llm.com/python/guides/extract-JSON/index.md): Use [to_json()](../api/to_json) to get bounding boxes, layout data, and structured page content for custom pipelines.
- [Extract Markdown](https://docs.pdf4llm.com/python/guides/extract-Markdown/index.md): A full walkthrough of [to_markdown()](../api/to_markdown) with common options and use cases.
- [Extract Text](https://docs.pdf4llm.com/python/guides/extract-Text/index.md): Use [to_text()](../api/to_text) to get clean, plain text output stripped of all Markdown formatting.
- [Images & Graphics](https://docs.pdf4llm.com/python/guides/images-and-graphics/index.md): Extract embedded images and vector graphics from documents — controlling output path, DPI, format, and whether images are written to disk or embedded inline.
- [Page Selection](https://docs.pdf4llm.com/python/guides/page-selection/index.md): Use the pages parameter to extract content from specific pages rather than processing an entire document.
- [Saving Output](https://docs.pdf4llm.com/python/guides/saving-output/index.md): Write extracted Markdown, JSON, and plain text to disk using pathlib.
- [Tables](https://docs.pdf4llm.com/python/guides/tables/index.md): How PyMuPDF4LLM detects, extracts, and renders tables as Markdown — and how to access raw table data for custom pipelines.
- [LangChain](https://docs.pdf4llm.com/python/integrations/LangChain.md): Use PyMuPDF4LLM as a LangChain document loader to feed PDF content into chains, agents, and retrieval pipelines.
- [PyMuPDF Pro](https://docs.pdf4llm.com/python/integrations/PyMuPDF-Pro.md): Unlock Office document support in PyMuPDF4LLM — extract content from `.doc`, `.ppt`, `.xls`, and more.
- [JSON Schema](https://docs.pdf4llm.com/python/reference/JSON-schema.md): Full field reference for the structured output returned by [to_json()](/python/api/to_json).
- [Changelog](https://docs.pdf4llm.com/python/reference/changelog.md): Version history and release notes for PyMuPDF4LLM.
- [Chunk Schema](https://docs.pdf4llm.com/python/reference/chunk-schema.md): Full dictionary schema for each page chunk returned when `page_chunks=True`.

## OpenAPI Specs

- [openapi](https://docs.pdf4llm.com/api-reference/openapi.json)

## Optional

- [pdf4llm.com](https://pdf4llm.com)
- [PyPI](https://pypi.org/project/pymupdf4llm/)
- [NuGet](https://www.nuget.org/packages/PDF4LLM/)
- [WebViewer](https://www.pdf4llm.com/#webviewer)
- [Discord](https://pymupdf.pro/discord/4llm)
- [Forum](https://forum.mupdf.com)