This guide shows how to connect PDF4LLM’s extraction output to Azure OpenAI — specifically, how to embed extracted text using text-embedding-3-small (or equivalent) and how to pass document content to a chat completion endpoint for summarisation, Q&A, and RAG.The guide assumes you have an active Azure OpenAI resource with at least one deployment for an embedding model and one for a chat model (e.g. gpt-4o).
The most common use of PDF4LLM with Azure OpenAI is building a searchable index from document content: extract text, split into chunks, embed each chunk, and store embeddings for later retrieval.
var embeddingClient = client.GetEmbeddingClient(embeddingDeployment);var embeddings = new List<(string Text, float[] Embedding)>();foreach (string chunk in cleanChunks){ EmbeddingGenerationOptions options = new() { Dimensions = 1536 }; ClientResult<Embedding> result = await embeddingClient.GenerateEmbeddingAsync(chunk, options); float[] vector = result.Value.ToFloats().ToArray(); embeddings.Add((chunk, vector));}Console.WriteLine($"Generated {embeddings.Count} embeddings");
Azure OpenAI rate limits vary by tier. For documents with many chunks, add a small delay between requests or use GenerateEmbeddingsAsync with a batch of inputs rather than one call per chunk.
The GenerateEmbeddingsAsync overload accepts a list of inputs, reducing round-trips:
var embeddingClient = client.GetEmbeddingClient(embeddingDeployment);// Embed in batches of 16 to stay within token limits per requestconst int batchSize = 16;var allEmbeddings = new List<(string Text, float[] Embedding)>();for (int i = 0; i < cleanChunks.Count; i += batchSize){ var batch = cleanChunks.Skip(i).Take(batchSize).ToList(); var result = await embeddingClient.GenerateEmbeddingsAsync(batch); for (int j = 0; j < batch.Count; j++) { float[] vector = result.Value[j].ToFloats().ToArray(); allEmbeddings.Add((batch[j], vector)); }}
A full RAG pipeline has three stages: ingest (embed and store), retrieve (find relevant chunks for a query), and generate (pass retrieved chunks to the LLM). PDF4LLM handles the extraction step in the ingest stage.
// In-memory store for this example.// In production, replace with Azure AI Search, Qdrant, or another vector store.var vectorStore = new List<(string Text, float[] Embedding, string Source, int Page)>();var reader = PdfExtractor.LlamaMarkdownReader();var pages = reader.LoadData("product-manual.pdf");var embeddingClient = client.GetEmbeddingClient(embeddingDeployment);foreach (var page in pages){ string text = page.Text.Trim(); int pageNum = (int)page.ExtraInfo["page"]; string filePath = (string)page.ExtraInfo["file_path"]; if (text.Length < 50) continue; // skip near-empty pages var result = await embeddingClient.GenerateEmbeddingAsync(text); float[] vector = result.Value.ToFloats().ToArray(); vectorStore.Add((text, vector, filePath, pageNum));}Console.WriteLine($"Indexed {vectorStore.Count} pages");
async Task<string> AskAsync(string question){ var retrieved = await RetrieveAsync(question, topK: 5); // Build a context block from the top results string context = string.Join("\n\n---\n\n", retrieved.Select(r => $"[Source: {Path.GetFileName(r.Source)}, Page {r.Page + 1}]\n{r.Text}")); string systemPrompt = "You are a helpful assistant. Answer questions using only the " + "provided context. If the answer is not in the context, say so. " + "Cite the source and page number for each factual claim."; string userPrompt = $""" Context: {context} Question: {question} """; var chatClient = client.GetChatClient(chatDeployment); ClientResult<ChatCompletion> result = await chatClient.CompleteChatAsync( [ new SystemChatMessage(systemPrompt), new UserChatMessage(userPrompt) ]); return result.Value.Content[0].Text;}// Usagestring answer = await AskAsync("What is the maximum operating temperature?");Console.WriteLine(answer);
For summarising a document or a set of pages, pass the extracted Markdown directly to a chat completion without embedding:
Document doc = new Document("executive-briefing.pdf");string markdown = PdfExtractor.ToMarkdown(doc, pages: new List<int> { 0, 1, 2 });doc.Close();var chatClient = client.GetChatClient(chatDeployment);string prompt = $""" Summarise the following document in three to five bullet points. Focus on key decisions, numbers, and action items. Do not include information not present in the document. Document: {markdown} """;ClientResult<ChatCompletion> result = await chatClient.CompleteChatAsync([ new SystemChatMessage("You are a precise document summariser."), new UserChatMessage(prompt)]);Console.WriteLine(result.Value.Content[0].Text);
For long documents that exceed the model’s context window, summarise page-by-page and then summarise the summaries:
Document doc = new Document("annual-report.pdf");var chatClient = client.GetChatClient(chatDeployment);var pageSummaries = new List<string>();for (int i = 0; i < doc.PageCount; i++){ string pageText = PdfExtractor.ToMarkdown(doc, pages: new List<int> { i }); if (pageText.Trim().Length < 100) continue; var result = await chatClient.CompleteChatAsync( [ new SystemChatMessage("Summarise the following page in two sentences."), new UserChatMessage(pageText) ]); pageSummaries.Add($"Page {i + 1}: {result.Value.Content[0].Text.Trim()}");}doc.Close();// Final roll-up summarystring rollup = string.Join("\n", pageSummaries);var finalResult = await chatClient.CompleteChatAsync([ new SystemChatMessage("Produce a five-sentence executive summary from the page summaries below."), new UserChatMessage(rollup)]);Console.WriteLine(finalResult.Value.Content[0].Text);
For documents where images carry meaningful information — technical diagrams, charts, infographics — embed images alongside text using gpt-4o’s vision capability:
Document doc = new Document("system-diagram.pdf");string markdown = PdfExtractor.ToMarkdown( doc, embedImages: true, // inline images as Base64 data URIs pages: new List<int> { 0 });doc.Close();// The markdown string contains both text and embedded images.// gpt-4o accepts markdown with inline data URIs as message content.var chatClient = client.GetChatClient(chatDeployment);ClientResult<ChatCompletion> result = await chatClient.CompleteChatAsync([ new SystemChatMessage( "You are a technical document analyst. " + "Describe both the text content and any diagrams or charts present."), new UserChatMessage(markdown)]);Console.WriteLine(result.Value.Content[0].Text);
Not all Azure OpenAI deployments support vision input. Confirm that your gpt-4o deployment has the vision capability enabled in Azure OpenAI Studio before using this pattern.
Pattern 5 — Form data extraction and LLM enrichment
Combine structured form field extraction with an LLM call to normalise, validate, or enrich the extracted values:
Document doc = new Document("insurance-claim.pdf");var fields = PdfExtractor.GetKeyValues(doc);doc.Close();var formData = fields.ToDictionary(f => f.Name, f => f.Value);string formJson = System.Text.Json.JsonSerializer.Serialize( formData, new System.Text.Json.JsonSerializerOptions { WriteIndented = true });var chatClient = client.GetChatClient(chatDeployment);string prompt = $""" The following JSON represents form fields extracted from an insurance claim PDF. Respond with a JSON object containing: - "valid": true/false — whether all required fields are present and plausible - "missing_fields": array of field names that are empty or absent - "anomalies": array of strings describing any values that look incorrect or unusual - "summary": a one-sentence plain-English description of the claim Respond with JSON only. No explanation or markdown fences. Form data: {formJson} """;ClientResult<ChatCompletion> result = await chatClient.CompleteChatAsync([ new SystemChatMessage("You are a document validation assistant. Respond only with JSON."), new UserChatMessage(prompt)]);string analysisJson = result.Value.Content[0].Text;Console.WriteLine(analysisJson);
Every pattern above passes text to an Azure OpenAI endpoint that has a token limit per request. Keep these constraints in mind:
Model
Context window
Practical limit for RAG context
gpt-4o
128 000 tokens
~100 000 tokens (leave room for system prompt + response)
gpt-4o-mini
128 000 tokens
~100 000 tokens
text-embedding-3-small
8 191 tokens
Chunk to ≤ 512 tokens for best embedding quality
text-embedding-ada-002
8 191 tokens
Chunk to ≤ 512 tokens
A token is approximately 4 characters for English text. A typical A4 page of dense text is 400–600 tokens.To stay safely within limits, estimate chunk token counts before sending:
// Rough estimate — replace with SharpToken for accuracystatic int EstimateTokens(string text) => text.Length / 4;var safeChunks = cleanChunks .Where(c => EstimateTokens(c) <= 512) .ToList();// For chunks over the limit, split furthervar oversized = cleanChunks.Where(c => EstimateTokens(c) > 512).ToList();// ... apply token-based splitting from the Page Selection & Chunking guide
For production deployments, prefer Managed Identity over API keys to avoid storing credentials:
using Azure.Identity;// Works in Azure App Service, Azure Functions, AKS, and other managed environmentsAzureOpenAIClient client = new( new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!), new DefaultAzureCredential());
Assign the Cognitive Services OpenAI User role to the managed identity in the Azure Portal, or via the Azure CLI:
az role assignment create \ --role "Cognitive Services OpenAI User" \ --assignee <managed-identity-object-id> \ --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<resource-name>
RequestFailedException with status 401
The API key is incorrect, expired, or the endpoint URL does not match the key’s resource. Verify both in the Azure Portal under Keys and Endpoint.RequestFailedException with status 429 (Too Many Requests)
You have exceeded the tokens-per-minute or requests-per-minute quota for your deployment. Apply the retry-with-backoff pattern above, reduce batch sizes, or request a quota increase in Azure OpenAI Studio.RequestFailedException with status 400 on embedding calls
The input text exceeds the embedding model’s token limit (8 191 tokens). Reduce chunk sizes — the text being embedded is too long for a single embedding call.Content filter triggered (status 400 with a content filter error code)
Azure OpenAI applies content filtering by default. If document content triggers the filter, the request fails with a content filter error rather than a rate limit error. Check ex.ErrorCode to distinguish. For legitimate documents triggering false positives, content filter configuration can be adjusted in Azure OpenAI Studio under Content Filters.Empty or low-quality embedding results
Very short chunks (fewer than ~20 tokens) and very long chunks (over 512 tokens) both produce lower-quality embeddings. The short-chunk problem is common for page separators and headers extracted as standalone chunks — filter them with a minimum length check. The long-chunk problem requires splitting before embedding.Managed Identity auth fails locallyDefaultAzureCredential works in managed environments but requires az login locally. Run az login in your terminal, or switch to AzureCliCredential explicitly for local development:
#if DEBUG AzureOpenAIClient client = new(new Uri(endpoint), new AzureCliCredential());#else AzureOpenAIClient client = new(new Uri(endpoint), new DefaultAzureCredential());#endif