Extracting data from documents
To process documents, you must specify which data points, or fields, you want to extract. If your project includes different document types, like a mix of passports and driverâs licenses, create a class for each document type and specify fields for each class.
You can create up to 250 classes per project, and up to 100 fields per class.
Creating classes
If your project includes different document types, start by creating a class for each document type. You can then specify a different set of fields for each class.
Organization members can import prebuilt classes from a library of established schemas, such as paystubs, invoices, bank statements, and utility bills.
In projects with classification, a default class called other is assigned to documents that canât be classified. You canât delete or modify this class.
-
In the editing panel, click the Create classes icon , then select one of these options based on your AI Hub subscription and project requirements:
-
Create classes â Lets you create a custom class without any fields. If you select this option, enter a succinct name for your document type, then click â to exit the class editing panel.
-
Browse prebuilt classes Commercial & Enterprise â Lets you add common document types and their associated fields based on a library of available schemas. If you select this option, choose the prebuilt classes that you want to add and click Add to project.
-
-
Use the Create classes icon to add more classes as needed.
-
When youâre done creating classes, click Classify documents.
Commercial & Enterprise If your project includes multipage files, youâre prompted to optionally enable splitting files. You can split by documentâwhich lets the model determine where document breaks occurâor split each pageâwhich creates a new document at every page break. After classifying your documents, page ranges indicate how files are split.
Classes are assigned to your documents and documents are grouped by class in the document list. Any documents that canât be classified are assigned the other class.
-
Verify classification. If documents werenât classified as expected, edit classes to improve your results.
-
In a class that wasnât identified accurately, click the overflow icon , then select Edit class.
-
Enter a description to help the model more accurately identify documents in the class, then click â to exit the class editing panel.
Effective descriptions include unique identifying details about a document class. Use details related to text in the documents, rather than visual elements like color, which the model canât âsee.â
In projects that donât use file splitting, you can reference file extensions to help classify documents. For example, the description for an Images class might be Files with image file extensions, like JPEG, PNG, and TIF.
As a best practice, limit class descriptions to 1,000 characters (4,000 maximum).
-
Use the overflow icon to edit more classes as needed.
-
When youâre done editing classes, click Classify documents.
-
đ Visual tutorial: Creating classes
Creating fields
Create fields for each of the data points you want to identify.
-
In the editing panel, click Add field.
-
Enter a field name or select a suggested field name, then press Enter.
Data is extracted based on field name alone and the result is displayed.
-
Do one of the following, based on whether your result is accurate:
-
Accurate result â Click â to exit the field editing panel and continue adding fields.
-
Inaccurate result â Edit the field. When youâre done editing, click â to exit the field editing panel and continue adding fields.
-
đ Visual tutorial: Creating fields
Editing fields
If field name alone doesnât return the results you expect, you can edit fields to provide more guidance.
Access the field editor for an existing field by hovering over the field and clicking the edit icon .
In the field editor, first choose the field type appropriate for the data you want to identify.
With a suitable field type selected, if necessary, provide a more detailed description or prompt describing the information youâre looking for. As a best practice, keep field and attribute names under 48 characters and use a description or prompt for longer content up to 1,000 characters (4,000 maximum). For best practices, see Writing effective prompts.
For most field types, you can change the model using the model selector dropdown.
-
Use the standard model for straightforward fields that perform basic text extraction or calculations. The standard model tends to perform best on shorter documents less than 50 pages. Its faster processing is suitable when speed is your priority.
-
Use the advanced model for specialized fields that perform multistep reasoning or complex math. The advanced model performs better on longer documents and those with challenging formatting, and itâs required for visual reasoning fields. Its more deliberate processing is suitable when accuracy is your priority.
For details about model capabilities, see Choosing a model.
When youâre done editing a field, click Run to see results and further refine your edits if needed.
đ Visual tutorial: Editing fields
Field types
Choose the field type appropriate for the data you want to identify.
For more guidance, see Choosing field types.
Custom function fields
Commercial & EnterpriseThe custom function field type lets you use a Python function to compute values or import data to your project schema.
For example, you might use a custom function to calculate total invoice amount using existing subtotal and tax rate fields:
Custom function fields accept these parameters:
For additional guidance about custom functions, see Writing custom functions.
Viewing results across documents
To quickly scan or compare results, click the Results table icon in the Documents header.
The results table corresponds to the current view in the editing panel, so the results you see change depending on your current task.
Reordering fields
To change the order of fields in the field editor, use the up and down arrows that display when you hover over a field.
Reordering fields can be necessary when creating derived fields, which can reference fields that precede it in the field editor. Additionally, reordering fields can be helpful to speed up reviews or support downstream integrations, because fields are displayed in processed results in the same order as in the field editor.
Hiding fields
Commercial & EnterpriseHiding intermediate or computational fields can help simplify human review and downstream integration output.
Consider hiding fields that are used exclusively as input for derived fields or custom functions. For example, you might extract individual date components in separate hidden fields, then combine them into a final formatted date field that reviewers and downstream systems actually need.
To mark a field as hidden, open the field editor and enable Hide field.
Hidden fields canât have validation rules, because validations on hidden fields could create confusing review scenarios. If you hide a field with an active validation rule, the rule is removed. If you later unhide the same field, any previous validation rules are restored.
Hidden fields use processing resources and count toward field limits, but their visibility varies across different AI Hub interfaces:

