Agent extract | Instabase AI Hub Documentation

Enterprise Single-tenant

The Agent extract step extracts structured data from documents using LLMs, according to the schema defined in a linked agent extract module.

Parameters

Extraction schema — Select the agent extract module containing your extraction schema JSON file.

Schema files are JSON objects whose keys are document class names. Each class value has a description string and a fields array. Each field defines at least a name and a data_type. Most fields also include a description, telling the LLM what to extract.

Supported data types

Set data_type on each field in fields to one of TEXT, TEXT_LIST, OBJECT_LIST, or TABLE.

TEXT — A single text value.
TEXT_LIST — A list of text values.
OBJECT_LIST — A repeating group of sub-fields with a fixed shape, such as a list with defined properties. Add a prompt_schema array on the parent field. Each object in prompt_schema lists sub-fields with name and description only—data_type is not required.
TABLE — Tabular extraction.

Reasoning fields

Reasoning fields use model instructions you write in prompt instead of a static description. Set "prompt_type": "advanced" and include data_type as for any other field. Omit description for these entries (use prompt only).

1 {
2   "name": "reasoning field",
3   "prompt_type": "advanced",
4   "data_type": "TEXT",
5   "prompt": "calculate the sum of the deductions"
6 }

Sample extraction schema

1 {
2   "Invoice": {
3     "description": "An invoice document requesting payment for goods or services",
4     "fields": [
5       {
6         "name": "Invoice Number",
7         "data_type": "TEXT",
8         "description": "The unique invoice identifier or number"
9       },
10       {
11         "name": "Invoice Date",
12         "data_type": "TEXT",
13         "description": "The date the invoice was issued"
14       },
15       {
16         "name": "Total Amount",
17         "data_type": "TEXT",
18         "description": "The total amount due on the invoice including tax"
19       },
20       {
21         "name": "Vendor Name",
22         "data_type": "TEXT",
23         "description": "The name of the vendor or supplier"
24       },
25       {
26         "name": "Line Items",
27         "data_type": "OBJECT_LIST",
28         "description": "Table of line items on the invoice",
29         "prompt_schema": [
30           {
31             "name": "Item Description",
32             "description": "Description of the item or service"
33           },
34           {
35             "name": "Quantity",
36             "description": "Quantity of the item"
37           },
38           {
39             "name": "Unit Price",
40             "description": "Price per unit"
41           },
42           {
43             "name": "Amount",
44             "description": "Total amount for this line item"
45           }
46         ]
47       }
48     ]
49   }
50 }