Managing ground truth datasets | Instabase AI Hub Documentation

Commercial & Enterprise

Ground truth datasets are sets of files and associated ground truth values that you use to test app accuracy.

Ground truth datasets are associated with a specific app. For a 1:1 comparison, you use the same set of documents to create the ground truth dataset and to test the app. You can create multiple ground truth datasets for an app to test different batches of input documents.

Several tasks are associated with managing ground truth datasets:

Creating a dataset establishes the set of files used for the dataset and runs these files against your app to generate results.
Updating a dataset reruns the dataset files against a new app version to generate new results.
Verifying ground truth values uses human review to confirm or correct dataset values.
Configuring dataset parameters lets you tune rules for comparing ground truth values to run results during accuracy testing.

Ground truth datasets have statuses that indicate readiness for accuracy testing.

Ready to use indicates that the dataset aligns with the corresponding project and the ground truth values are verified.
Review required indicates that ground truth values haven’t been verified for the dataset. Verify ground truth values.
Outdated indicates that the underlying app schema has changed, or for ground truth datasets based on project files, more files were added to the project. Ideally, update your dataset and verify ground truth values.

Creating ground truth datasets

Create a ground truth dataset to establish a new set of files to use for accuracy testing.

You can create ground truth datasets using project files, a previous app run, or files you upload. As a best practice, use project files for initial testing and iteration, then test with a separate dataset of new documents. This approach first verifies your production app’s consistency with your automation project, then assesses how effectively the app processes new data.

From the Hub, open the app you want to test.
Select the Accuracy tests tab.
Click Create dataset.
Specify details about your ground truth dataset, then click Next.
- Name — If you’re uploading files, specify a unique name for the ground truth dataset. If you’re using project files, the dataset name is fixed as Project files. You can have only one project files dataset per app.
- Workspace — Select the workspace where users can run and review accuracy tests for this dataset.
- File source — Select whether to use project files, files uploaded from a connected or local drive, or files from a recently completed app run.
If you’re uploading files, select files.

If you’re using project files or files from an app run, all files are automatically included in the dataset. You can’t modify the file list in this step.
Click Run app or, if using a previous app run, Create dataset.
- If you’re using project files or uploaded files, an app run begins. When the run completes, click Review dataset to verify ground truth values in human review.
- If you’re using a previous app run, the dataset is generated using existing results as ground truth values. You can modify ground truth values if needed.

Updating ground truth datasets

When you edit a project schema or add new project files, update associated ground truth datasets to ensure ground truth values are aligned with the new app version.

You can run accuracy tests against outdated datasets, but doing so typically lowers accuracy metrics, because results aren’t aligned with existing ground truth values.

When you update an outdated dataset, you must verify results for net new files, classes, and fields. Existing ground truth values are preserved.

From the Hub, open the app that you modified.
Select the Accuracy tests tab.
In the Ground truth datasets table, hover over a dataset with the Outdated status and click the update dataset icon .
Click Update dataset to run your the app against the dataset files and generate new results.

When the run completes, click Review dataset to verify ground truth values in human review.

Verifying ground truth values

After running files in a dataset through your app, verify results in human review. Verified results become the ground truth values for the dataset. Ground truth values reflect your desired end state results, including formatting.

From an app’s Accuracy tests tab, in the Ground truth datasets table, hover over a dataset with the Review required status and click the edit icon .

All files in the dataset open in human review so you can verify results.
For each file you’re reviewing, verify and correct data if needed, then mark the file as reviewed.
- To correct mapping data where multipage files were incorrectly parsed into individual documents, select the documents grid. Select pages and use the button controls to move or delete pages or create additional documents.
- To correct classification data, use one of these methods:
  - In the fields list, click the Edit classification icon near the assigned class. Select the correct class, then click Confirm.
  - In the documents grid, click the class name. Select the correct class, then click Confirm.
    When you change document classification, you can specify how the schema for the new class is applied. By default, the app reprocesses the document to identify field results for the new class. Reprocessing incurs usage charges at the same rates as regular app runs. To apply the schema for the new class without reprocessing, deselect Extract fields for the updated class.
- To correct text fields, in the fields list, select a field. Enter a new value or, in the document viewer, select the area of the document that contains the information for that field. You can click to select text, or use your mouse to draw a box around the information.
  If validations apply to the selected document, all fields are automatically revalidated when you modify a value.
- To correct tables, in the fields list, select a table to open it for editing. Select any table cell, then in the document viewer, select the area of the document that contains the information for that cell. You can click to select text, or use your mouse to draw a box around the information.
Review any additional files in the file list.
Click Finish review.

This action marks everything in the dataset as reviewed and advances you to the dataset overview where you can run an accuracy test.

Modifying ground truth values

You can view ground truth values by opening a dataset and selecting the Ground truth values tab.

To see the values full-screen, hover over any cell and click the maximize icon . The full-screen comparison is helpful to review tables and longer results.

If you need to modify any values, click Edit values to open the dataset in human review. Follow the steps for verifying ground truth values to make any changes.

Configuring dataset parameters

Tune how ground truth values are compared to run results during accuracy testing by configuring parameters for each ground truth dataset. You can configure global rules that apply across all fields, as well as field rules, which can override global rules for specific fields.

Dataset parameters are available on the Configuration tab of ground truth datasets, and in the accuracy testing workflow.

Configuration changes applied during accuracy testing are saved to the underlying ground truth dataset.

Global rules

Global rules apply across all fields in a dataset, unless overridden by field rules.

Document rules

Exempt misclassified documents — Excludes incorrectly classified documents from extraction accuracy calculations. This option prevents skewed field-level accuracy calculations due to misclassified documents.

Text rules

Ignore whitespace — Disregards spaces, tabs, and line breaks when comparing values.
Ignore special characters — Excludes non-alphanumeric characters from comparisons.
Ignore casing — Treats uppercase and lowercase letters as equivalent.
Allow Levenshtein distance — Permits a maximum number of character differences between strings. For example, a Maximum difference of 1 treats cat and bat as a match because they differ by 1 character.

Integer rules

Use European number format — Interprets thousand separators using European conventions, where periods separate thousands. For example, “1.234” is interpreted as one thousand two hundred thirty-four, not as a decimal value.
Allow integer error tolerance — Accepts integer values that fall within a specified range of the expected value. For example, a maximum difference of 2 would treat 1234 and 1236 as a match.

Decimal rules

Use European number format — Interprets decimal separators using European conventions, where commas are used as decimal points. For example, “12,34” is interpreted as twelve and thirty-four hundredths.
Allow decimal error tolerance — Accepts numerical values that fall within a specified range of the expected value. For example, a tolerance of 0.01 would treat 12.34 and 12.35 as a match.
Round — Rounds decimal values up, down, or to the nearest whole number before comparison.

Field rules

For each field, you can specify the expected result type, which determines how extracted values are interpreted and validated. By default, global rules are applied to fields based on the selected result type. To specify different rules for text, integer, and decimal field types, enable Use custom rules for this field.

For structured result types (lists and tables), the content within lines or cells is always compared using global text rules.

Result types roughly correlate to field types in automation projects, with additional options for more precise handling. Select the result type that matches how you want to compare extracted data.

Use text comparison for exact string matching, including formatting.
Use integer or decimal for numeric results that need mathematical comparison.
Use structured types, like lists or table, to compare data content while ignoring formatting differences. Use the text type if formatting must match exactly.

Result type	Equivalent project field type	Use for…
Text	Text extraction	Character data, including numbers that shouldn’t be calculated or compared numerically, like ID numbers, phone numbers, or zip codes.
Integer	Text extraction	Whole numbers that need numerical comparison, like quantities, counts, or years.
Decimal	Text extraction	Numbers that might include decimal places, like prices, measurements, or percentages.
List of text	List extraction without attributes	Lists of single values, like item names in an invoice.
List of objects	List extraction with attributes	Lists where each item contains multiple fields, like line items containing both quantity and price.
Table	Table extraction	Grid data with consistent columns, like detailed transaction tables.

By default, all fields are included in accuracy tests except hidden fields. Enable the Hidden fields toggle to include hidden fields in accuracy testing. You can optionally exclude any individual field by deselecting Include in accuracy tests. Excluded fields aren’t reported in accuracy tests or counted in accuracy metrics.