Testing apps
Accuracy testing compares the results of an app run to verified values for a set of documents. By comparing actual results to verified values, you can see how accurate your app is and where you might want to make improvements.
Accuracy testing overview
Accuracy testing can help you refine apps to ensure they meet satisfactory thresholds for validity and accuracy.
Follow these high-level steps to implement accuracy testing.
-
Develop or iterate on an app in Build and create a new app version with the Production release state.
Your new app version is stored in the Hub.
Share the app to enable other organization members to access app versions with the production release state.
-
Create ground truth datasets or update existing datasets associated with your app.
Verify ground truth values for any new or updated datasets.
You can run accuracy tests against outdated datasets, but doing so typically lowers accuracy metrics, because results aren’t aligned with existing ground truth values. -
Conduct accuracy testing on the new app version.
Review accuracy metrics and examine error patterns to identify areas for improvement.
-
Repeat the previous steps as needed, using accuracy test results to guide incremental improvements to your app.
When you’re satisfied with accuracy test results, your app is ready for production use.
Return to this process whenever your document processing needs change. Regularly tracking accuracy metrics over time helps ensure your app continually meets accuracy thresholds.
Managing ground truth datasets
Ground truth datasets are sets of files and associated ground truth values that you use to test app accuracy.
Ground truth datasets are associated with a specific app. For a 1:1 comparison, you use the same set of documents to create the ground truth dataset and to test the app. You can create multiple ground truth datasets for an app to test different batches of input documents.
Several tasks are associated with managing ground truth datasets:
-
Creating a dataset establishes the set of files used for the dataset and runs these files against your app to generate results.
-
Updating a dataset reruns the dataset files against a new app version to generate new results.
-
Verifying ground truth values uses human review to confirm or correct dataset values.
-
Configuring dataset parameters lets you tune rules for comparing ground truth values to run results during accuracy testing.
Ground truth datasets have statuses that indicate readiness for accuracy testing.
-
Ready to use indicates that the dataset aligns with the corresponding project and the ground truth values are verified.
-
Review required indicates that ground truth values haven’t been verified for the dataset. Verify ground truth values.
-
Outdated indicates that the underlying app schema has changed, or for ground truth datasets based on project files, more files were added to the project. Ideally, update your dataset and verify ground truth values.
Creating ground truth datasets
Create a ground truth dataset to establish a new set of files to use for accuracy testing.
You can create ground truth datasets using project files, a previous app run, or files you upload. As a best practice, use project files for initial testing and iteration, then test with a separate dataset of new documents. This approach first verifies your production app’s consistency with your Build project, then assesses how effectively the app processes new data.
-
From the Hub, open the app you want to test.
-
In the app sidebar, select Accuracy tests.
-
Click Manage datasets, then click Create dataset.
-
Specify details about your ground truth dataset, then click Next.
-
Name — If you’re uploading files, specify a unique name for the ground truth dataset. If you’re using project files, the dataset name is fixed as Project files. You can have only one project files dataset per app.
-
Workspace — Select the workspace where users can run and review accuracy tests for this dataset.
-
File source — Select whether to use project files from Build, files uploaded from a connected or local drive, or files from a recently completed app run.
-
-
If you’re uploading files, select files.
If you’re using project files or files from an app run, all files are automatically included in the dataset; you can’t modify the file list in this step.
-
Click Run app.
The app processes the selected files and generates results.
What's next
Click Close to return to the Ground truth datasets page. When the app run completes, verify ground truth values in human review. Verified results become the ground truth values for the dataset.Updating ground truth datasets
When you edit a project schema or add new project files, update associated ground truth datasets to ensure ground truth values are aligned with the new app version.
When you update an outdated dataset, you must verify results for net new files, classes, and fields. Existing ground truth values are preserved.
-
From the Hub, open the app that you modified.
-
In the app sidebar, select Accuracy tests.
-
Click Manage datasets.
-
From the Ground truth datasets page, hover over a dataset with the Outdated status and click the update dataset icon
. -
Click Update dataset to run your the app against the dataset files and generate new results.
What's next
When the app run completes, verify ground truth values in human review. Verified results become the ground truth values for the dataset.Verifying ground truth values
After running files in a dataset through your app, verify results in human review. Verified results become the ground truth values for the dataset. Ground truth values reflect your desired end state results, including formatting.
-
From the Ground truth datasets page, hover over a dataset with the Review required status and click the edit icon
.All files in the dataset open in human review so you can verify results.
-
For each file you’re reviewing, verify and correct data if needed, then mark the file as reviewed.
-
To correct mapping data where multipage files were incorrectly parsed into individual documents, select the documents grid. Select pages and use the button controls to move or delete pages or create additional documents.
-
To correct classification data, use one of these methods:
-
In the fields list, click the Edit classification icon near the assigned class. Select the correct class, then click Confirm.
-
In the documents grid, click the class name. Select the correct class, then click Confirm.
When you change document classification, you can specify how the schema for the new class is applied. By default, the app reprocesses the document to identify field results for the new class. Reprocessing incurs usage charges at the same rates as regular app runs. To apply the schema for the new class without reprocessing, deselect Extract fields for the updated class.
-
-
To correct text fields, in the fields list, select a field. Enter a new value or, in the document viewer, select the area of the document that contains the information for that field. You can click to select text, or use your mouse to draw a box around the information.
If validations apply to the selected document, all fields are automatically revalidated when you modify a value. -
To correct tables, in the fields list, select a table to open it for editing. Select any table cell, then in the document viewer, select the area of the document that contains the information for that cell. You can click to select text, or use your mouse to draw a box around the information.
-
-
Review any additional files in the file list.
-
Click Finish review.
This action marks everything in the dataset as reviewed and advances you to the dataset overview where you can run an accuracy test.
Modifying ground truth values
You can view ground truth values by opening a dataset and selecting the Ground truth values tab.
To see the values full-screen, hover over any cell and click the maximize icon
If you need to modify any values, click Edit values to open the dataset in human review. Follow the steps for verifying ground truth values to make any changes.
Configuring dataset parameters
Tune how ground truth values are compared to run results during accuracy testing by configuring parameters for each ground truth dataset. You can configure global rules that apply across all fields, as well as field rules, which override global rules for specific fields.
Dataset parameters are available on the Configuration tab of each ground truth dataset.
Global rules
Global rules apply across all fields in a dataset, unless overridden by field rules.
Document rules
- Exempt misclassified documents — Excludes incorrectly classified documents from extraction accuracy calculations. This option prevents skewed field-level accuracy calculations due to misclassified documents.
Text rules
-
Ignore whitespace — Disregards spaces, tabs, and line breaks when comparing values.
-
Ignore special characters — Excludes non-alphanumeric characters from comparisons.
-
Ignore casing — Treats uppercase and lowercase letters as equivalent.
-
Allow Levenshtein distance — Permits a maximum number of character differences between strings. For example, a Maximum difference of 1 treats cat and bat as a match because they differ by 1 character.
Integer rules
-
Use European number format — Interprets thousand separators using European conventions, where periods separate thousands. For example, “1.234” is interpreted as one thousand two hundred thirty-four, not as a decimal value.
-
Allow integer error tolerance — Accepts integer values that fall within a specified range of the expected value. For example, a maximum difference of 2 would treat 1234 and 1236 as a match.
Decimal rules
-
Use European number format — Interprets decimal separators using European conventions, where commas are used as decimal points. For example, “12,34” is interpreted as twelve and thirty-four hundredths.
-
Allow decimal error tolerance — Accepts numerical values that fall within a specified range of the expected value. For example, a tolerance of 0.01 would treat 12.34 and 12.35 as a match.
-
Round — Rounds decimal values up, down, or to the nearest whole number before comparison.
Field rules
Use field rules to override global rules for specific fields.
For each field, you can specify the expected result type, which determines how extracted values are interpreted and validated. Result types roughly correlate to field types in Build, with additional options for more precise handling.
Select the result type that matches how you want to compare extracted data.
-
Use text comparison for exact string matching, including formatting.
-
Use integer or decimal for numeric results that need mathematical comparison.
-
Use structured types, like lists or table, to compare data content while ignoring formatting differences. Use the text type if formatting must match exactly. For structured result types, the content within lines or cells is compared using global text rules, if configured.
Based on the selected result type, you can specify applicable parameters as described for global rules. For example, if you select Text as the result type, you can specify text rules.
Testing accuracy
Before you begin
You must have one or more ground truth datasets for the app you want to test.-
From the Hub, open the app you want to test.
-
In the app sidebar, select Accuracy tests.
-
Click Test accuracy.
-
Select the app version and ground truth dataset you want to use for testing, then click Next.
-
In the Test accuracy window, click Run test.
The files in the selected ground truth dataset are run through the app version you selected and compared to the dataset ground truth values to generate accuracy metrics.
Viewing accuracy test results
The Accuracy test page for each app, accessible from the app sidebar, displays a list of accuracy tests run against all app versions. The page summarizes key accuracy metrics for each test.
To see a full accuracy report, click any completed accuracy test.
Accuracy reports include the testing metadata and key metrics shown on the accuracy test page. The report also includes a snapshot of the associated ground truth dataset at the time the accuracy test was run, providing visibility into the exact data used for the test.
Project overview metrics provide a high-level summary of classification and extraction validity and accuracy. Similarly, the Summary by class section reports classification and extraction validity and accuracy by class.
Extraction details breaks down accuracy metrics for individual fields within each class. Use this section to gain a better understanding of project and class summary metrics. You can expand the Ground truth table to drill into results for each field. The ground truth table reports classification and extraction results in each table cell. Validation results are indicated with icons and ground truth values are displayed when they differ from the run results.
-
Result failed validation, indicated by a red error icon.
-
Result passed validation, indicated by a green checkmark.
-
Result didn’t match ground truth value, indicated in red strikethrough text with the ground truth value reported below.
-
Result matched ground truth value, indicated in standard text.
To see a full-screen comparison of the ground truth value and the run result, hover over the result in any cell and click the view icon
Accuracy metrics
Accuracy metrics measure how valid and how accurate app results are.
Automation rate indicates the percent of classes or fields that are processed without human intervention, either because they pass validation or have no validation rules. While higher automation rates generally indicate efficiency, they can be misleading in apps with no or few validations.
Validated accuracy indicates the percent of automated classes or fields that match the ground truth dataset. A higher validated accuracy means that results are both valid, as measured by validation rules, and accurate, as measured by ground truth values.
Raw accuracy indicates the percent of classes or fields that match corresponding ground truth values. This metric measures classification or extraction accuracy only, without factoring in validations. Raw accuracy is particularly helpful when it differs significantly in either direction from automation rate. For example, if automation rate is low but raw accuracy is high, this suggests accurate run results but overly strict validation rules.