get_key_values()

Overview

get_key_values() parses a PDF and extracts structured data from every form field (widget) it contains. It returns a list of dictionaries — one per field — each capturing the field name, its current value, and the pages on which it appears.

This method is only meaningful for Form PDFs — documents that contain interactive widgets. For non-form PDFs, it returns an empty list.

Signature

pymupdf4llm.get_key_values(doc: str | pymupdf.Document) -> list[dict]

Parameters

doc

str | pymupdf.Document

required

Path to the document file, or an already-opened pymupdf.Document instance. Supports PDF, XPS, eBooks, and — with PyMuPDF Pro — Office formats.

Return Value

Returns a list of dictionaries, where each dictionary represents one form field:

{
    field_name:             # Full field name; nested components separated by dots
    {
        "value": str,       # The current field value, cast to string
        "pages": list,      # 0-based page number(s) where the field appears
    }
    ...
}

Field Dictionary Properties

field_name

string

required

The fully-qualified name of the form field. For hierarchical forms, parent and child names are separated by dots (e.g. "section1.address.city").

value

string

required

The field’s current value, always represented as a string regardless of the original widget type (text, checkbox, radio button, etc.).

pages

list[int]

required

A list of zero-based page indices where this field is present. A field can appear on multiple pages (e.g. when a master field has multiple instances across pages).

Usage

Basic Example

import pymupdf4llm

result = pymupdf4llm.get_key_values("my_form.pdf")

for key, field in result.items():
    print(key, field["value"], field["pages"])

Example Output

Given a simple two-page application form, the output might look like:

{
  "applicant.name":  {"value": "Jane Smith",        "pages": [0]},
  "applicant.email": {"value": "jane@example.com",  "pages": [0]},
  "terms_accepted":  {"value": "Yes",               "pages": [1]},
  "signature":       {"value": "",                  "pages": [1]}
}

Behaviour Notes

Non-form PDFs

If the document contains no widgets, get_key_values() returns an empty list []. It will never raise an error for this case — it is always safe to call.

Field values are always strings

Regardless of the original widget type — text box, checkbox, radio group, dropdown, or signature — the value is always returned as a str. For empty fields, this will be an empty string "".

Multi-page fields

A single logical field can appear on multiple pages. In this case the field appears once in the returned list, and pages will contain all page indices where the field is rendered (e.g. [0, 2, 4]).

Common Use Cases

Form Data Extraction

Pull structured responses from filled PDF forms — employment applications, tax documents, surveys — without manual copying.

RAG Pre-processing

Augment your Retrieval-Augmented Generation pipeline with clean, structured form data alongside the text content from to_markdown().

Data Validation

Check that required fields are filled before processing a submitted PDF form programmatically.

Form Auditing

Inventory all fields across a batch of PDF templates to confirm naming conventions and completeness.

TocHeaders

Detect table-of-contents style heading structure.

IdentifyHeaders

Detect and classify page headers and footers across a document for exclusion or analysis.

Extract Markdown

Practical guide to using margins in extraction.

Getting Started

Guides

Integrations

Reference

get_key_values()

Overview

Signature

Parameters

Return Value

Field Dictionary Properties

Usage

Basic Example

Example Output

Behaviour Notes

Common Use Cases

Form Data Extraction

RAG Pre-processing

Data Validation

Form Auditing

See Also

TocHeaders

IdentifyHeaders

Extract Markdown

Getting Started

Guides

Integrations

Reference

​Overview

​Signature

​Parameters

​Return Value

​Field Dictionary Properties

​Usage

​Basic Example

​Example Output

​Behaviour Notes

​Common Use Cases

Form Data Extraction

RAG Pre-processing

Data Validation

Form Auditing

​See Also

TocHeaders

IdentifyHeaders

Extract Markdown

Overview

Signature

Parameters

Return Value

Field Dictionary Properties

Usage

Basic Example

Example Output

Behaviour Notes

Common Use Cases

See Also