The Agent extract step extracts structured data from documents using LLMs, according to the schema defined in a linked agent extract module.
Extraction schema — Select the agent extract module containing your extraction schema JSON file.
Schema files are JSON objects whose keys are document class names. Each class value has a description string and a fields array. Each field defines at least a name and a data_type. Most fields also include a description, telling the LLM what to extract.
Set data_type on each field in fields to one of TEXT, TEXT_LIST, OBJECT_LIST, or TABLE.
TEXT — A single text value.
TEXT_LIST — A list of text values.
OBJECT_LIST — A repeating group of sub-fields with a fixed shape, such as a list with defined properties. Add a prompt_schema array on the parent field. Each object in prompt_schema lists sub-fields with name and description only—data_type is not required.
TABLE — Tabular extraction.
Reasoning fields use model instructions you write in prompt instead of a static description. Set "prompt_type": "advanced" and include data_type as for any other field. Omit description for these entries (use prompt only).