Agent extract

Enterprise Single-tenant

The Agent extract step extracts structured data from documents using LLMs, according to the schema defined in a linked agent extract module.

Parameters

  • Extraction schema — Select the agent extract module containing your extraction schema JSON file.

    Schema files are JSON objects whose keys are document class names. Each class value has a description string and a fields array. Each field defines at least a name and a data_type. Most fields also include a description, telling the LLM what to extract.

Supported data types

Set data_type on each field in fields to one of TEXT, TEXT_LIST, OBJECT_LIST, or TABLE.

  • TEXT — A single text value.

  • TEXT_LIST — A list of text values.

  • OBJECT_LIST — A repeating group of sub-fields with a fixed shape, such as a list with defined properties. Add a prompt_schema array on the parent field. Each object in prompt_schema lists sub-fields with name and description only—data_type is not required.

  • TABLE — Tabular extraction.

Reasoning fields

Reasoning fields use model instructions you write in prompt instead of a static description. Set "prompt_type": "advanced" and include data_type as for any other field. Omit description for these entries (use prompt only).

1{
2 "name": "reasoning field",
3 "prompt_type": "advanced",
4 "data_type": "TEXT",
5 "prompt": "calculate the sum of the deductions"
6}

Sample extraction schema

1{
2 "Invoice": {
3 "description": "An invoice document requesting payment for goods or services",
4 "fields": [
5 {
6 "name": "Invoice Number",
7 "data_type": "TEXT",
8 "description": "The unique invoice identifier or number"
9 },
10 {
11 "name": "Invoice Date",
12 "data_type": "TEXT",
13 "description": "The date the invoice was issued"
14 },
15 {
16 "name": "Total Amount",
17 "data_type": "TEXT",
18 "description": "The total amount due on the invoice including tax"
19 },
20 {
21 "name": "Vendor Name",
22 "data_type": "TEXT",
23 "description": "The name of the vendor or supplier"
24 },
25 {
26 "name": "Line Items",
27 "data_type": "OBJECT_LIST",
28 "description": "Table of line items on the invoice",
29 "prompt_schema": [
30 {
31 "name": "Item Description",
32 "description": "Description of the item or service"
33 },
34 {
35 "name": "Quantity",
36 "description": "Quantity of the item"
37 },
38 {
39 "name": "Unit Price",
40 "description": "Price per unit"
41 },
42 {
43 "name": "Amount",
44 "description": "Total amount for this line item"
45 }
46 ]
47 }
48 ]
49 }
50}