Overview
PDF4LLM’s extraction methods return plain .NET strings — writing them to disk is handled by the standard library. The recommended approach is System.IO.File.WriteAllText(), which is straightforward, cross-platform, and available without additional dependencies.
Saving Markdown
using System . IO ;
using PDF4LLM ;
string mdText = PdfExtractor . ToMarkdown ( "document.pdf" );
File . WriteAllText ( "output.md" , mdText , System . Text . Encoding . UTF8 );
Always pass System.Text.Encoding.UTF8 explicitly when writing text files. The two-argument overload of File.WriteAllText uses the platform default encoding, which can silently corrupt special characters, symbols, and non-Latin scripts on Windows.
Saving JSON
ToJson() returns a JSON string directly — no additional serialisation step is needed:
using System . IO ;
using PDF4LLM ;
string json = PdfExtractor . ToJson ( "document.pdf" );
File . WriteAllText ( "output.json" , json , System . Text . Encoding . UTF8 );
The returned JSON is compact by default. To write human-readable indented JSON, round-trip it through System.Text.Json:
using System . IO ;
using System . Text . Json ;
using PDF4LLM ;
string json = PdfExtractor . ToJson ( "document.pdf" );
var parsed = JsonSerializer . Deserialize < object >( json );
string indented = JsonSerializer . Serialize ( parsed , new JsonSerializerOptions { WriteIndented = true });
File . WriteAllText ( "output.json" , indented , System . Text . Encoding . UTF8 );
For large documents where file size matters, skip the indentation step and write the compact string directly.
Saving plain text
using System . IO ;
using PDF4LLM ;
string text = PdfExtractor . ToText ( "document.pdf" );
File . WriteAllText ( "output.txt" , text , System . Text . Encoding . UTF8 );
Saving per-page chunks
When using LlamaMarkdownReader, save each page as a separate file using the page number from the chunk metadata to name each file:
using System . IO ;
using PDF4LLM ;
string outputDir = "output/pages" ;
Directory . CreateDirectory ( outputDir );
var reader = PdfExtractor . LlamaMarkdownReader ();
var chunks = reader . LoadData ( "document.pdf" );
foreach ( var chunk in chunks )
{
int pageNum = ( int ) chunk . ExtraInfo [ "page" ];
string filePath = Path . Combine ( outputDir , $"page- { pageNum } .md" );
File . WriteAllText ( filePath , chunk . Text , System . Text . Encoding . UTF8 );
Console . WriteLine ( $"Saved { filePath } " );
}
Saving with a matching filename
To derive the output filename from the input document automatically:
using System . IO ;
using PDF4LLM ;
string inputPath = "reports/annual-report-2025.pdf" ;
string mdText = PdfExtractor . ToMarkdown ( inputPath );
string outputPath = Path . ChangeExtension ( inputPath , ".md" );
File . WriteAllText ( outputPath , mdText , System . Text . Encoding . UTF8 );
Console . WriteLine ( $"Saved to { outputPath } " );
// Saved to reports/annual-report-2025.md
Path.ChangeExtension() swaps the file extension cleanly, keeping the same directory and stem.
Saving to a different directory
To write output to a different folder while keeping the original filename:
using System . IO ;
using PDF4LLM ;
string inputPath = "source/document.pdf" ;
string outputDir = "extracted" ;
Directory . CreateDirectory ( outputDir );
string mdText = PdfExtractor . ToMarkdown ( inputPath );
string outputName = Path . ChangeExtension ( Path . GetFileName ( inputPath ), ".md" );
string outputPath = Path . Combine ( outputDir , outputName );
File . WriteAllText ( outputPath , mdText , System . Text . Encoding . UTF8 );
Console . WriteLine ( $"Saved to { outputPath } " );
// Saved to extracted/document.md
Processing multiple files
To extract and save output for an entire folder of PDFs:
using System . IO ;
using PDF4LLM ;
string inputDir = "documents/" ;
string outputDir = "extracted/" ;
Directory . CreateDirectory ( outputDir );
string [] pdfFiles = Directory . GetFiles ( inputDir , "*.pdf" );
Console . WriteLine ( $"Found { pdfFiles . Length } PDF(s)" );
foreach ( string pdfPath in pdfFiles )
{
Console . WriteLine ( $"Processing { Path . GetFileName ( pdfPath )} ..." );
try
{
string mdText = PdfExtractor . ToMarkdown ( pdfPath );
string outputName = Path . ChangeExtension ( Path . GetFileName ( pdfPath ), ".md" );
string outputPath = Path . Combine ( outputDir , outputName );
File . WriteAllText ( outputPath , mdText , System . Text . Encoding . UTF8 );
Console . WriteLine ( $" ✓ Saved to { outputPath } " );
}
catch ( Exception ex )
{
Console . WriteLine ( $" ✗ Failed: { ex . Message } " );
}
}
Console . WriteLine ( "Done." );
Saving images alongside Markdown
When writeImages: true is used, image files are written to disk automatically during extraction. Create the image directory first, then save the Markdown file alongside it:
using System . IO ;
using PDF4LLM ;
string imageDir = "output/images" ;
Directory . CreateDirectory ( imageDir );
string mdText = PdfExtractor . ToMarkdown (
"document.pdf" ,
writeImages : true ,
imagePath : imageDir ,
imageFormat : "png"
);
File . WriteAllText ( "output/document.md" , mdText , System . Text . Encoding . UTF8 );
Image paths in the Markdown output are relative to wherever the .md file is opened from. Keep your Markdown file and image directory in the same parent folder to ensure image links resolve correctly.
Output Method Extension How to write Markdown ToMarkdown().mdFile.WriteAllText()JSON ToJson().jsonFile.WriteAllText() directlyPlain text ToText().txtFile.WriteAllText()Per-page chunks LlamaMarkdownReader.LoadData().md per pageFile.WriteAllText() in a loopImages ToMarkdown(writeImages: true).png / .jpgWritten automatically to imagePath
Next steps
Extract Markdown Full walkthrough of ToMarkdown() with all common options.
Extract JSON Bounding boxes and layout data for custom pipelines.
Extract Text Plain text extraction and whitespace handling.
Images & Graphics Controlling image extraction, format, and output path.