Overview
PyMuPDF Pro extends PyMuPDF4LLM with support for Microsoft Office formats. Without Pro, PyMuPDF4LLM is limited to PDF, XPS, and eBook inputs. With Pro activated, you can pass Office files directly to any extraction function — no conversion step required. Everything else stays the same. All standard options — page selection, layout analysis, OCR, page chunks, image extraction — work identically on Office documents.Contact Sales
Need a Commercial Licence for PyMuPDF Pro? Contact the sales team to discuss options and pricing.
Supported Office Formats
| Format | Extensions | Notes |
|---|---|---|
| Word | .docx, .doc | Full text, tables, images, and headers |
| PowerPoint | .pptx, .ppt | Slide content, speaker notes, embedded images |
| Excel | .xlsx, .xls | Sheet data rendered as tables |
| Hangul | .hwpx, .hwp | Hangul Word Processor format |
Office documents are converted to PDF internally by PyMuPDF Pro before extraction. This means all PyMuPDF4LLM features work on Office files exactly as they do on PDFs.
Installation
Install PyMuPDF Pro:PyMuPDF Pro requires a valid licence key. Request a trial or purchase a licence from the PyMuPDF website.
Usage
Trial Keys
To obtain a trial license key please fill out the form on this page. You will then have the trial key emailed to the address you submitted.Trial keys are valid for 60 days and allow you to test the full functionality of PyMuPDF Pro on any document. This is ideal for evaluation and development purposes.
Activating Your Licence
Activate the licence explicitly at the start of your script:unlock() once before making any extraction calls. A good place to do this is at application startup or in your environment initialisation.
Commercial License Keys
Commercial licence keys are also supported. If you have a commercial key, simply pass it tounlock() instead of the trial key. Commercial keys do not have the time limit restriction and may also include additional features or support options. Contact the PyMuPDF sales team for more information on commercial licences.
Contact Sales
Need a Commercial Licence for PyMuPDF Pro? Contact the sales team to discuss options and pricing.
Extracting Office Documents
Once Pro is activated, pass Office files to any extraction function exactly as you would a PDF:Word Documents
PowerPoint Presentations
Excel Spreadsheets
Hangul Documents
Converting an Office document to PDF
The following code snippet can convert your Office document to PDF format:Using All Standard Options
Because Office documents are converted to PDF internally, every standard PyMuPDF4LLM option works without modification:Processing a Mixed Document Library
With Pro activated you can process a folder containing a mix of PDFs and Office files using the same code path:PyMuPDF Pro and Fonts
By defaultpymupdf.pro.unlock() searches for all installed font directories.
This can be controlled with keyword-only args:
fontpath: specific font directories, either as a list/tuple oros.sep-separated string.None(the default)- If not
Nonewe use the value set inos.environ['PYMUPDFPRO_FONT_PATH'].
fontpath_auto: Whether to append system font directories.None(the default)- We use
Trueifos.environ['PYMUPDFPRO_FONT_PATH_AUTO']is1, then all system font directories are appended.
pymupdf.pro.get_fontpath() returns a tuple of all font directories used by unlock().
Next Steps
LangChain
Load Office documents into LangChain pipelines.
Supported Formats
Full list of supported input and output formats.
Extract Markdown
All to_markdown() options that work with Office files.