# PDF4LLM ## Docs - [Core Document Links](https://docs.pdf4llm.com/core-docs.md): This page cites the Read The Docs documentation guides for MuPDF software. - [API](https://docs.pdf4llm.com/dotnet/api/index.md): Complete reference for all PDF4LLM methods and types. - [FAQ](https://docs.pdf4llm.com/dotnet/getting-started/faq/index.md): Common questions about the `PDF4LLM` package for .NET. - [Installation](https://docs.pdf4llm.com/dotnet/getting-started/installation/index.md): Install PDF4LLM via NuGet, understand the MuPDF.NET dependency, and resolve the common assembly conflict. - [Quickstart](https://docs.pdf4llm.com/dotnet/getting-started/quickstart/index.md): Go from zero to a working PDF-to-Markdown conversion in under five minutes. - [Supported Formats](https://docs.pdf4llm.com/dotnet/getting-started/supported-formats/index.md): Input formats MuPDF.NET can read, and output formats it can produce. - [OCR](https://docs.pdf4llm.com/dotnet/guides/OCR/index.md): Use Tesseract OCR to extract text from scanned PDFs, image-based pages, and documents where native text selection returns nothing useful. - [Tesseract Language Packs](https://docs.pdf4llm.com/dotnet/guides/OCR/tesseract-language-packs.md): How to install additional Tesseract language packs on Windows, macOS, and Linux for use with PDF4LLM OCR. - [Extract JSON](https://docs.pdf4llm.com/dotnet/guides/extract-JSON/index.md): Use [ToJson()](/dotnet/api/PdfExtractor#tojson) to get bounding boxes, layout data, and structured page content for custom pipelines. - [Extract Markdown](https://docs.pdf4llm.com/dotnet/guides/extract-Markdown/index.md): A full walkthrough of [ToMarkdown()](/dotnet/api/PdfExtractor#tomarkdown) with common options and use cases. - [Extract Text](https://docs.pdf4llm.com/dotnet/guides/extract-Text/index.md): Use [ToText()](/dotnet/api/PdfExtractor#totext) to get clean, plain text output stripped of all Markdown formatting. - [Images & Graphics](https://docs.pdf4llm.com/dotnet/guides/images-and-graphics/index.md): Extract embedded images and vector graphics from documents — controlling output path, format, and whether images are written to disk or embedded inline. - [Page Selection](https://docs.pdf4llm.com/dotnet/guides/page-selection/index.md): Use the pages parameter to extract content from specific pages rather than processing an entire document. - [Saving Output](https://docs.pdf4llm.com/dotnet/guides/saving-output/index.md): Write extracted Markdown, JSON, and plain text to disk using System.IO. - [Tables](https://docs.pdf4llm.com/dotnet/guides/tables/index.md): How PDF4LLM detects, extracts, and renders tables as Markdown — and how to access raw table data for custom pipelines. - [Azure OpenAI](https://docs.pdf4llm.com/dotnet/integrations/azure.md): Chunk PDFs with PDF4LLM and feed them into Azure OpenAI embeddings and chat completions — end-to-end patterns for .NET RAG pipelines. - [JSON Schema](https://docs.pdf4llm.com/dotnet/reference/JSON-schema.md): Full field reference for the structured output returned by [ToJson()](/dotnet/api/PdfExtractor#tojson). - [Changelog](https://docs.pdf4llm.com/dotnet/reference/changelog.md): Version history and release notes for PDF4LLM.NET. - [Chunk Schema](https://docs.pdf4llm.com/dotnet/reference/chunk-schema.md): Full schema for each page chunk returned when `pageChunks=true` is passed to [ToMarkdown()](/dotnet/api/PdfExtractor#tomarkdown) or [ToText()](/dotnet/api/PdfExtractor#totext). - [API](https://docs.pdf4llm.com/python/api/index.md): Complete reference for all PyMuPDF4LLM functions and classes. - [FAQ](https://docs.pdf4llm.com/python/getting-started/faq/index.md): Common questions about the `pymupdf4llm` Python library. - [Installation](https://docs.pdf4llm.com/python/getting-started/installation/index.md): Install PyMuPDF4LLM and its optional dependencies. - [Quickstart](https://docs.pdf4llm.com/python/getting-started/quickstart/index.md): Convert a PDF to Markdown in a couple of lines of Python. - [Supported Formats](https://docs.pdf4llm.com/python/getting-started/supported-formats/index.md): Input formats PyMuPDF4LLM can read, and output formats it can produce. - [OCR](https://docs.pdf4llm.com/python/guides/OCR/index.md): How automatic OCR works in PyMuPDF4LLM, when to force it, and how to swap in a different OCR engine. - [OCR Plugins](https://docs.pdf4llm.com/python/guides/OCR/plugins.md): How to use OCR engines other than Tesseract with PyMuPDF4LLM, and how to create your own custom OCR plugin. - [Tesseract Language Packs](https://docs.pdf4llm.com/python/guides/OCR/tesseract-language-packs.md): How to install additional Tesseract language packs on macOS, Linux, and Windows. - [Extract JSON](https://docs.pdf4llm.com/python/guides/extract-JSON/index.md): Use [to_json()](../api/to_json) to get bounding boxes, layout data, and structured page content for custom pipelines. - [Extract Markdown](https://docs.pdf4llm.com/python/guides/extract-Markdown/index.md): A full walkthrough of [to_markdown()](../api/to_markdown) with common options and use cases. - [Extract Text](https://docs.pdf4llm.com/python/guides/extract-Text/index.md): Use [to_text()](../api/to_text) to get clean, plain text output stripped of all Markdown formatting. - [Images & Graphics](https://docs.pdf4llm.com/python/guides/images-and-graphics/index.md): Extract embedded images and vector graphics from documents — controlling output path, DPI, format, and whether images are written to disk or embedded inline. - [Page Selection](https://docs.pdf4llm.com/python/guides/page-selection/index.md): Use the pages parameter to extract content from specific pages rather than processing an entire document. - [Saving Output](https://docs.pdf4llm.com/python/guides/saving-output/index.md): Write extracted Markdown, JSON, and plain text to disk using pathlib. - [Tables](https://docs.pdf4llm.com/python/guides/tables/index.md): How PyMuPDF4LLM detects, extracts, and renders tables as Markdown — and how to access raw table data for custom pipelines. - [LangChain](https://docs.pdf4llm.com/python/integrations/LangChain.md): Use PyMuPDF4LLM as a LangChain document loader to feed PDF content into chains, agents, and retrieval pipelines. - [PyMuPDF Pro](https://docs.pdf4llm.com/python/integrations/PyMuPDF-Pro.md): Unlock Office document support in PyMuPDF4LLM — extract content from `.doc`, `.ppt`, `.xls`, and more. - [JSON Schema](https://docs.pdf4llm.com/python/reference/JSON-schema.md): Full field reference for the structured output returned by [to_json()](/python/api/to_json). - [Changelog](https://docs.pdf4llm.com/python/reference/changelog.md): Version history and release notes for PyMuPDF4LLM. - [Chunk Schema](https://docs.pdf4llm.com/python/reference/chunk-schema.md): Full dictionary schema for each page chunk returned when `page_chunks=True`. ## OpenAPI Specs - [openapi](https://docs.pdf4llm.com/api-reference/openapi.json) ## Optional - [pdf4llm.com](https://pdf4llm.com) - [PyPI](https://pypi.org/project/pymupdf4llm/) - [NuGet](https://www.nuget.org/packages/PDF4LLM/) - [WebViewer](https://www.pdf4llm.com/#webviewer) - [Discord](https://pymupdf.pro/discord/4llm) - [Forum](https://forum.mupdf.com)