Prompting guidance
Building effective fields requires making strategic decisions about field types and writing well-crafted prompts that extract accurate data from documents.
Choosing field types
Select field types based on document characteristics, data structure, and performance requirements.
Extraction vs. reasoning
Extraction field types provide more predictable results and more accurate source highlighting, especially for smaller documents.
Document reasoning fields might require more cleaning to format output consistently, and results are more variable. But, document reasoning fields perform better for large documents over 200 pages, even for straightforward fields like names or addresses. Document reasoning works particularly well when:
-
Relevant information appears in multiple locations throughout the document.
-
Contextual understanding or inference is required.
-
Documents are dense or have complex formatting.
Writing effective prompts
Follow these general best practices to create clear, specific prompts.
-
Keep prompts concise while including necessary detail.
-
Use direct commands that state exactly what you want extracted.
Extract the policy holderβs full name as it appears on the insurance declaration page.
-
Provide context to distinguish between similar data points or explain document structure.
Extract the effective date, not the application date, from the policy summary section.
-
Specify formatting requirements only when basic standardization is required. For complex formatting, extract first, then clean.
Extract the phone number in the format (XXX) XXX-XXXX.
-
For list extraction fields, limit attributes to 10 for greatest accuracy (30 maximum).
-
For document reasoning fields:
-
Use the Enhance prompt option to optimize clarity and effectiveness automatically.
-
Specify date and time if needed, otherwise the current date and time is used.
-
For chain of thought reasoning use numbered steps when you want to see the modelβs logic. For a clean final result, use declarative commands instead.
-
-
For derived fields:
-
Order fields so that referenced fields precede derived fields.
-
When referencing table or list extraction fields, normalize results as text using cleaning.
-
Extracting tables
Extract tables using either extraction or reasoning fields.
Follow these prompting best practices to extract tables from documents.
-
Enable table recognition in digitization settings.
To see recognized tables in a document, select the prediction iconin the header and enable Show detected objects for tables. Tables in the document are highlighted and you can use the adjacent table icon to view, copy, or download a table. -
Identify tables by descriptive characteristics rather than position or appearance.
Extract the βClaims Summaryβ table that contains columns for Date, Description, Amount, and Status.
-
Specify sheet name for multi-sheet Excel files.
Extract the transaction table from the βAccount Activityβ or βTransaction Historyβ sheet.
-
Apply filtering and manipulation through prompt instructions.
-
Extract transactions and return results with amounts greater than $1,000.
-
Extract transactions and return results for 01 April through 15 April.
-
Extract transactions and sort amounts from smallest to largest.
-
-
Specify output format when structured data is needed. Markdown is returned by default.
- Extract transactions as JSON.
Extracting checkboxes, signatures, and barcodes
Extract information from certain visual objects, including checkboxes, signatures, and barcodes. The type of information you can extract differs by object type.
Extract objects using either extraction or reasoning fields.
For most object extraction, descriptive field names often provide adequate results:
-
Filing Status (for grouped checkboxes)
-
Signatory name, Signatory title, Signature date
-
Barcode value
For standalone objects or more specific requirements, use prompts or descriptions:
-
Is the filer claiming capital gains or losses? (standalone checkbox)
-
Extract all signatures or Return yes if this document is signed
-
Return all barcode values or Return yes if this document contains a barcode
Common patterns
These proven prompt patterns address frequent extraction scenarios typical of reasoning fields. Adapt them to your specific field requirements.
Chain-of-thought reasoning
Use numbered steps when you want to see the modelβs reasoning process in the result.
For a clean final result without intermediate reasoning, use declarative commands instead.
Date-based selection with fallback logic
Extract values associated with the most recent date, with fallback options when dates arenβt available.
Conditional extraction
Extract data only when specific criteria are met, with clear handling of edge cases.
Multi-source data consolidation
Search multiple document sections in priority order and return the first valid result.
Categorization
Categorize values using a predefined set of categories with clear criteria.
Extraction with categorization
Extract numeric values and automatically categorize them into meaningful ranges.
Aggregation and calculation
Perform calculations across multiple values found throughout the document.
Presence detection
Determine whether specific information exists anywhere in the document.