Agent extract
The Agent extract step extracts structured data from documents using LLMs, according to the schema defined in a linked agent extract module.
Parameters
-
Extraction schema — Select the agent extract module containing your extraction schema JSON file.
Schema files are JSON objects whose keys are document class names. Each class value has a
descriptionstring and afieldsarray. Each field defines at least anameand adata_type. Most fields also include adescription, telling the LLM what to extract.
Supported data types
Set data_type on each field in fields to one of TEXT, TEXT_LIST, OBJECT_LIST, or TABLE.
-
TEXT— A single text value. -
TEXT_LIST— A list of text values. -
OBJECT_LIST— A repeating group of sub-fields with a fixed shape, such as a list with defined properties. Add aprompt_schemaarray on the parent field. Each object inprompt_schemalists sub-fields withnameanddescriptiononly—data_typeis not required. -
TABLE— Tabular extraction.
Reasoning fields
Reasoning fields use model instructions you write in prompt instead of a static description. Set "prompt_type": "advanced" and include data_type as for any other field. Omit description for these entries (use prompt only).
