Extracting data from packets
To process data across related document types, extend your project schema to manage packets.
Packets are sets of related documents processed as a unit, such as a loan application with supporting bank statements and tax documents. While each document type has specific data points, they also share common information like applicant name that you can consolidate at the packet level.
Packet-level data consolidation is achieved with cross-class fields, which leverage fields from existing classes to provide overarching results. Projects with cross-class fields become packet-processing apps. Conversely, removing all cross-class fields reverts the project to a standard automation app.
Understanding packets
When you add cross-class fields to a project, existing project files are automatically organized into packets based on upload batch. You can manually reorganize packets as needed.
During project development, aim for five or so representative packets in your project so you can see how your app performs across a variety of packets.
In production, upload structure dictates packet composition. Each upload––whether file, folder, or email––is treated as a packet and processed in its own run. Plan upstream deployment integrations with this constraint in mind.
Reorganizing packets
As you develop packet-processing apps, you can manually reorganize project files to reflect a different packet structure if needed.
-
From the Cross-class tab, in the aggregated table view, select Reorganize packets.
-
In the Reorganize packets window, drag documents to the appropriate packet or click Add packet to create new packets as needed, then click Save.
Saving automatically triggers re-extraction of cross-class fields to provide updated results that reflect the new packet organization.
Creating cross-class fields
Create cross-class fields for data points that need to be consolidated across classes.
Before you begin
You must create classes and fields before you can add cross-class fields, because cross-class fields build on class fields. For details, see Extracting data from documents.-
From the Cross-class tab, click Add cross-class field.
-
Enter a field name and select a cross-class field type, then specify required details based on the field type.
-
When you’re satisfied with the result, click ← to exit the cross-class field editing panel and continue adding fields.
Cross-class field types
Choose the cross-class field type that matches how you want to consolidate data across classes.
-
Ranked — Used to select the best result among multiple input fields based on criteria you specify: first valid field, highest field confidence, or highest OCR confidence. If your ranking logic specifies First valid field, select input fields in prioritized order. For confidence-based ranking logic, the order of input fields doesn’t matter.
-
Derived — Used to generate values based on class fields. In the Prompt, reference fields by field name: either type the field name or select it from the dropdown. For example, Generate a risk score by combining Tax document: Annual income, Bank statement: Average balance, and Application: Loan amount using the formula: (income + balance) / loan amount.
When referencing table or list extraction fields, derived fields try to match the input format. For consistent results, especially when combining different field types, consider normalizing tables or lists as text first. -
Custom function — Used to compute values or import third-party data with a custom Python function. For more details, see Cross-class custom function fields.
Cross-class custom function fields
The cross-class custom function field type lets you use a Python function to consolidate or compute values across multiple document classes within a packet.
For example, you might use a cross-class custom function to prioritize applicant name based on multiple sources:
Cross-class custom function fields accept these parameters:
Return type
When defining a cross-class custom function field, you can set the return type to Text choices to limit valid returned values to a specified set. In the custom function field editor, use the return type dropdown to switch from Text to Text choices, then add the allowed values. Define as a comma-separated list or use Import as CSV to upload a CSV file with one value per cell (up to 1,000 options).
The custom function must return one of those values, otherwise a validation error is shown. In human review, fields with text choices display a dropdown of valid options from which reviewers can select.
Intra-class prioritization
When a packet contains multiple documents of the same class, intra-class prioritization chooses which document’s class field values feed cross-class calculations and class-level derived and custom function fields. You can configure which prioritization strategy is used.
-
In the editing panel, select the Cross-class > Intra class tab. The intra-class aggregated view opens, showing field values for each class in each packet across your project.
-
Choose how to prioritize documents within each class:
-
Field priority (default) — Uses the result with the highest field confidence and greatest frequency across all documents in a class.
-
Document priority — Uses results from one document based on criteria you specify.
-
Fewest empty values — Selects the document with the fewest empty field values.
-
Highest average confidence score across all fields — Selects the document with the highest average confidence score across all fields.
-
User defined functions — Selects the document using a Python function you write. The editor opens with a short comment stub; see Writing intra-class prioritization functions for a reference template you can adapt.
-
-
-
Click Apply changes to all files or, if using the User defined functions strategy, click Run & apply changes to all files after testing your function.
Review prioritized results in the aggregated view. For each class in each packet, the table shows the field values used for cross-class and class-level calculations. When you use Document priority, the view also shows which document was selected for each class. Drill into a packet to view class-level results for every document in the packet, or select a document row to open that document’s class-level results.
Writing intra-class prioritization functions
Use an intra-class prioritization function when built-in strategies don’t match your business rules. For example, needing to choose the paystub with the highest gross pay, or using different rules per class.
The function runs once per packet, selects one document per class, and applies that choice to cross-class calculations and class-level derived and custom function fields for that packet.
Requirements
-
One function per project — Handles every class that has multiple documents in a packet.
-
Return one document per class — Do not combine field values from different documents in the same class. Downstream logic, including ranked cross-class fields, expects all class field values to come from a single document.
Parameters
Return value
Return a JSON object that maps each class name to the index of the document to use for that class. You can omit classes with only one document.
Quick start
Reference template
The following example matches the intra-class selection contract. Return each document’s index value (the document’s id). Access field values with doc['fields']['<field-name>']['value'].
For a packet with two paystubs, you might read class_docs[n]['fields']['Gross pay']['value'] for each document and return the index of the document with the highest amount.
After you edit the function, click Run to test your function, or Run & apply changes to all files to save it and recalculate intra-class results for all files in the project. To view output for each packet, click Open logs on the packet accordion.
Using class-level functions alongside intra-class prioritization
Intra-class prioritization supplies one prioritized value per class field into derived and custom function fields. Sometimes you might want access to every document’s value — for example, to return the highest gross pay across a packet’s paystubs rather than just the prioritized one.
After prioritization runs, each class field argument you add in the field editor includes an all_document_values list with every document’s extraction for that field. The other properties on the argument reflect the prioritized document.
To add class fields as arguments, see Custom function fields.
Viewing cross-class results
With the Cross-class tab selected in the editing panel, use the Schema and Validations sub-tabs to review packet-level results.
On the Schema sub-tab, the cross-class aggregated view shows cross-class field results across packets. Drill into individual packets to view class-level results for documents within that packet.
On the Validations sub-tab, the aggregated view shows cross-class validation rules and their results.
