> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pdf4llm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Tables

> How PyMuPDF4LLM detects, extracts, and renders tables as Markdown — and how to access raw table data for custom pipelines.

<div id="apiIndicatorBadge">
  <div class="inner pymupdf" />
</div>

## Overview

PyMuPDF4LLM includes automatic table detection. When a table is found on a page, it is extracted and rendered as a GitHub-flavoured Markdown table in `to_markdown()` output, or returned as a structured block in `to_json()` output.

Table extraction is enabled by default — no configuration required.

```python theme={null}
import pymupdf4llm

md_text = pymupdf4llm.to_markdown("document.pdf")
print(md_text)
```

A detected table will appear in the Markdown output like this:

```markdown theme={null}
| A | B | C | D |
|---|---|---|---|
| 0 | 1 | 2 | 3 |
| 0 | 1 | 2 | 3 |
```

***

## How Table Detection Works

PyMuPDF4LLM detects tables by analysing the visual structure of the page — looking for ruled lines, column alignment, and consistent row spacing. It does not rely on tagged PDF structure, so it works on both tagged and untagged PDFs.

Detection handles:

* Tables with explicit borders (ruled lines on all sides)
* Tables with partial borders (header rule only, or row dividers only)
* Borderless tables detected through column alignment and whitespace
* Multi-line cell content
* Merged header cells

<Note>
  Tables that span multiple pages may not be detected perfectly in all cases. If a table is not rendering as expected, see [Troubleshooting](#troubleshooting) below.
</Note>

***

## Accessing Raw Table Data

When using `to_json()`, detected tables are returned as `"table"` blocks with full cell-level data including bounding boxes:

```python theme={null}
json_str = pymupdf4llm.to_json("document.pdf")

data = json.loads(json_str)

for page_num, page in enumerate(data.get("pages", [])):
    print(f"\nPage {page_num}")

    for block in page.get("boxes", []):
        if block["boxclass"] == "table":
            print(f"Table details: {block['table']}")

```

### Table Block Structure

```json theme={null}
{
  "boxclass": "table",
  "table":
    {
      "bbox": ["x0","y0","x1","y1"], 
      "row_count": 3, 
      "col_count": 4, 
      "cells": [], 
      "extract": [
          ["A", "B", "C", "D"], 
          ["A1", "B1", "C1", "D1"],
          ["A2", "B2", "C2", "D2"]
        ], 
        "markdown": "|A|B|C|D|\n|---|---|---|---|\n|A1|B1|C1|D1|\n|A2|B2|C2|D2|\n\n"
    }
}
```

***

## Troubleshooting

### Table Not Detected

If a table is being returned as plain text rather than a table block:

* The table may be borderless with inconsistent spacing — ensure that [`use_layout(True)`](/python/api/use_layout) is enabled to improve detection
* The table may be an image (scanned) — enable OCR and check whether cells are being recognised
* The table may be very small or have only one column

### Incorrect Column Splitting

If columns are being merged or split incorrectly, the table may have irregular spacing. Accessing the raw data via `to_json()` and post-processing it manually often gives better results than relying on the Markdown rendering.

***

<Note>
  For the full API signature, see the [`to_markdown()` API reference](/python/api/to_markdown) and [`to_json()` API reference](/python/api/to_json).
</Note>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="OCR" icon="eye" href="/python/guides/OCR">
    Control automatic OCR behaviour and adaptors.
  </Card>

  <Card title="Extract JSON" icon="brackets-curly" href="/python/guides/extract-JSON">
    Full guide to working with the JSON output format.
  </Card>

  <Card title="Extract Markdown" icon="markdown" href="/python/guides/extract-Markdown">
    Markdown extraction with all common options.
  </Card>

  <Card title="JSON Schema" icon="file-code" href="/python/reference/JSON-schema">
    Complete field reference for the JSON output structure.
  </Card>
</CardGroup>
