Documentation Index
Fetch the complete documentation index at: https://docs.pdf4llm.com/llms.txt
Use this file to discover all available pages before exploring further.
Installation
NuGet package
.csproj:
Dependencies
PDF4LLM depends on MuPDF.NET and lists it as a NuGet dependency. It is installed automatically — you do not need to add MuPDF.NET separately. After installation, two native DLLs are required at runtime:| File | Contents |
|---|---|
mupdfcpp64.dll | The MuPDF C library with C++ bindings |
mupdfcsharp.dll | The C# bindings for MuPDF |
PATH environment variable.
Supported targets
| Target framework | Supported |
|---|---|
| .NET 8.0 | ✓ |
| .NET 7.0 | ✓ |
| .NET 6.0 | ✓ |
| .NET 5.0 | ✓ |
| .NET Standard 2.0 | ✓ |
| .NET Framework 4.8 | ✓ |
| .NET Framework 4.7.2 | ✓ |
| .NET Framework 4.6.1 | ✓ |
Verify the installation
Add ausing directive and call ToMarkdown on any PDF to confirm everything is wired up:
Resolving the assembly conflict
If you see this error at build or runtime:PDF4LLM package reference and rely on the bundled version:
.csproj manually:
using PDF4LLM; and PdfExtractor.* work regardless of which package supplies the assembly.
A future release of MuPDF.NET will stop bundling PDF4LLM, allowing both packages to coexist without conflict. Until then, use one or the other.
Optional: OCR support
OCR is not required for most PDFs. Enable it only when working with scanned documents or pages that contain no selectable text. OCR requires Tesseract to be installed on the host system and available on thePATH. PDF4LLM does not bundle Tesseract.
Windows — Download the installer from UB Mannheim Tesseract builds and add the install directory to your PATH.
macOS