For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Logo
AI Hub
OverviewApp editorFlow editorAdminAPI & SDK
OverviewApp editorFlow editorAdminAPI & SDK
  • Flow editor
    • About flows
    • Creating flows
    • Flow step reference
      • Process files
      • Map records
      • Agent classifier
      • Apply classifier
      • Apply checkpoint
      • Creating validation checkpoints
      • Agent extract
      • Apply refiner
      • Creating refiner programs
      • Process case
    • Using custom functions
    • Flow guides
    • Reviewing flow results
AI Hub
On this page
  • Parameters
  • Supported data types
  • Reasoning fields
  • Sample extraction schema
Flow editorFlow step reference

Agent extract

Was this page helpful?
Built with
Enterprise Single-tenant

The Agent extract step extracts structured data from documents using LLMs, according to the schema defined in a linked agent extract module.

Parameters

  • Extraction schema — Select the agent extract module containing your extraction schema JSON file.

    Schema files are JSON objects whose keys are document class names. Each class value has a description string and a fields array. Each field defines at least a name and a data_type. Most fields also include a description, telling the LLM what to extract.

Supported data types

Set data_type on each field in fields to one of TEXT, TEXT_LIST, OBJECT_LIST, or TABLE.

  • TEXT — A single text value.

  • TEXT_LIST — A list of text values.

  • OBJECT_LIST — A repeating group of sub-fields with a fixed shape, such as a list with defined properties. Add a prompt_schema array on the parent field. Each object in prompt_schema lists sub-fields with name and description only—data_type is not required.

  • TABLE — Tabular extraction.

Reasoning fields

Reasoning fields use model instructions you write in prompt instead of a static description. Set "prompt_type": "advanced" and include data_type as for any other field. Omit description for these entries (use prompt only).

1{
2 "name": "reasoning field",
3 "prompt_type": "advanced",
4 "data_type": "TEXT",
5 "prompt": "calculate the sum of the deductions"
6}

Sample extraction schema

1{
2 "Invoice": {
3 "description": "An invoice document requesting payment for goods or services",
4 "fields": [
5 {
6 "name": "Invoice Number",
7 "data_type": "TEXT",
8 "description": "The unique invoice identifier or number"
9 },
10 {
11 "name": "Invoice Date",
12 "data_type": "TEXT",
13 "description": "The date the invoice was issued"
14 },
15 {
16 "name": "Total Amount",
17 "data_type": "TEXT",
18 "description": "The total amount due on the invoice including tax"
19 },
20 {
21 "name": "Vendor Name",
22 "data_type": "TEXT",
23 "description": "The name of the vendor or supplier"
24 },
25 {
26 "name": "Line Items",
27 "data_type": "OBJECT_LIST",
28 "description": "Table of line items on the invoice",
29 "prompt_schema": [
30 {
31 "name": "Item Description",
32 "description": "Description of the item or service"
33 },
34 {
35 "name": "Quantity",
36 "description": "Quantity of the item"
37 },
38 {
39 "name": "Unit Price",
40 "description": "Price per unit"
41 },
42 {
43 "name": "Amount",
44 "description": "Total amount for this line item"
45 }
46 ]
47 }
48 ]
49 }
50}