Overview
get_key_values() parses a PDF and extracts structured data from every form field (widget) it contains. It returns a list of dictionaries — one per field — each capturing the field name, its current value, and the pages on which it appears.
This method is only meaningful for Form PDFs — documents that contain interactive widgets. For non-form PDFs, it returns an empty list.
Signature
Parameters
Path to the document file, or an already-opened
pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.Return Value
Returns a list of dictionaries, where each dictionary represents one form field:Field Dictionary Properties
The fully-qualified name of the form field. For hierarchical forms, parent and child names are separated by dots (e.g.
"section1.address.city").The field’s current value, always represented as a string regardless of the original widget type (text, checkbox, radio button, etc.).
A list of zero-based page indices where this field is present. A field can appear on multiple pages (e.g. when a master field has multiple instances across pages).
Usage
Basic Example
Example Output
Given a simple two-page application form, the output might look like:Behaviour Notes
Non-form PDFs
Non-form PDFs
If the document contains no widgets,
get_key_values() returns an empty list []. It will never raise an error for this case — it is always safe to call.Field values are always strings
Field values are always strings
Regardless of the original widget type — text box, checkbox, radio group, dropdown, or signature — the
value is always returned as a str. For empty fields, this will be an empty string "".Multi-page fields
Multi-page fields
A single logical field can appear on multiple pages. In this case the field appears once in the returned list, and
pages will contain all page indices where the field is rendered (e.g. [0, 2, 4]).Common Use Cases
Form Data Extraction
Pull structured responses from filled PDF forms — employment applications, tax documents, surveys — without manual copying.
RAG Pre-processing
Augment your Retrieval-Augmented Generation pipeline with clean, structured form data alongside the text content from
to_markdown().Data Validation
Check that required fields are filled before processing a submitted PDF form programmatically.
Form Auditing
Inventory all fields across a batch of PDF templates to confirm naming conventions and completeness.
See Also
TocHeaders
Detect table-of-contents style heading structure.
IdentifyHeaders
Detect and classify page headers and footers across a document for exclusion or analysis.
Extract Markdown
Practical guide to using margins in extraction.