Testing apps

Commercial & Enterprise

Accuracy testing compares the results of an app run to verified values for a set of documents. By comparing actual results to verified values, you can see how accurate your app is and where you might want to make improvements.

Accuracy testing overview

Accuracy testing can help you refine apps to ensure they meet satisfactory thresholds for validity and accuracy.

Follow these high-level steps to implement accuracy testing.

  1. Develop or iterate on an app in Build and create a new app version with the Production release state.

    Your new app version is stored in the Hub.

    Share the app to enable other organization members to access app versions with the production release state.

  2. Create ground truth datasets or update existing datasets associated with your app.

    Verify ground truth values for any new or updated datasets.

  3. Conduct accuracy testing on the new app version.

    Review accuracy metrics and examine error patterns to identify areas for improvement.

  4. Repeat the previous steps as needed, using accuracy test results to guide incremental improvements to your app.

    When you’re satisfied with accuracy test results, your app is ready for production use.

Return to this process whenever your document processing needs change. Regularly tracking accuracy metrics over time helps ensure your app continually meets accuracy thresholds.

Managing ground truth datasets

Ground truth datasets are sets of files and associated ground truth values that you use to test app accuracy.

Ground truth datasets are associated with a specific app. For a 1:1 comparison, you use the same set of documents to create the ground truth dataset and to test the app. You can create multiple ground truth datasets for an app to test different batches of input documents.

There are several tasks associated with managing ground truth datasets:

Creating ground truth datasets

Create a ground truth dataset to establish a new set of files to use for accuracy testing.

There are two methods for creating ground truth datasets: using project files or uploading new documents. As a best practice, use project files for initial testing and iteration, then test with a separate dataset of new documents. This approach first verifies your production app’s consistency with your Build project, then assesses your app’s generalizability to unseen data.

  1. From the Hub, open the app you want to test.

  2. In the app sidebar, select Accuracy tests.

  3. Click Manage datasets, then click Create dataset.

  4. Specify details about your ground truth dataset, then click Next.

    • Name – If you’re uploading files, specify a unique name for the ground truth dataset. If you’re using project files, the dataset name is fixed as Project files. You can have only one project files dataset per app.

    • Workspace – Select the workspace where users can run and review accuracy tests for this dataset.

    • File source – Select whether to use project files from Build, or new files from a connected or local drive.

  5. If you’re uploading files, select files to test with. If you’re using project files, all files are automatically included in the dataset; you can’t modify files in this step. Click Next.

    The files you selected are run through your app and results are generated.

What's next
Click Close to return to the Ground truth datasets page. When the app run completes, verify ground truth values in human review. Verified results become the ground truth values for the dataset.

Updating ground truth datasets

Any time you edit an app, you must update the ground truth datasets associated with the app. For example, if you add fields, modify prompts, or tune validation rules, the ground truth values in your datasets must be reverified.

When you update an outdated dataset, you have to verify results for classes as well as net new files and fields. Existing ground truth values for fields are preserved.

  1. From the Hub, open the app that you modified.

  2. In the app sidebar, select Accuracy tests.

  3. Click Manage datasets.

  4. From the Ground truth datasets page, hover over a dataset with the Outdated status and click the update dataset icon

    Circle icon with a tiny exclamation point centered within it.
    .

  5. Click Update dataset to run your the app against the dataset files and generate new results.

What's next
When the app run completes, verify ground truth values in human review. Verified results become the ground truth values for the dataset.

Verifying ground truth values

After running files in a dataset through your app, verify results in human review. Verified results become the ground truth values for the dataset.

Ground truth datasets have statuses that indicate readiness for accuracy testing. The Review required status indicates that ground truth values haven’t yet been established for the dataset.

  1. From the Ground truth datasets page, hover over a dataset with the Review required status and click the edit icon

    Pencil icon.
    .

    All files in the dataset open in human review so you can verify results.

  2. For each file you’re reviewing, verify and correct data if needed, then mark the file as reviewed.

    • To correct mapping data where multipage files were incorrectly parsed into individual documents, select the documents grid. Select pages and use the button controls to move or delete pages or create additional documents.

    • To correct classification data, use one of these methods:

      • In the fields list, click the Edit classification icon near the assigned class. Select the correct class, then click Confirm.

      • In the documents grid, click the class name. Select the correct class, then click Confirm.

        When you change document classification, you can specify how the schema for the new class is applied. By default, the app reprocesses the document to identify field results for the new class. Reprocessing incurs usage charges at the same rates as regular app runs. To apply the schema for the new class without reprocessing, deselect Extract fields for the updated class.

    • To correct text fields, in the fields list, select a field. Enter a new value or, in the document viewer, select the area of the document that contains the information for that field. You can click to select text, or use your mouse to draw a box around the information.

      If validations apply to the selected document, all fields are automatically revalidated when you modify a value.

    • To correct tables, in the fields list, select a table to open it for editing. Select any table cell, then in the document viewer, select the area of the document that contains the information for that cell. You can click to select text, or use your mouse to draw a box around the information.

  3. Review any additional files in the file list.

  4. Click Finish review.

    This action marks everything in the dataset as reviewed and advances you to the dataset overview where you can run an accuracy test.

Modifying ground truth values

You can view ground truth values by opening a dataset and selecting the Ground truth values tab.

To see the values full-screen, hover over any cell and click the maximize icon

Four arrows arranged as if they're pointing at the four corners of a box.
. The full-screen comparison is helpful to review tables and longer-form results.

If you need to modify any values, you can open the dataset in human review by selecting More actions > Modify ground truth values. Follow the steps for verifying ground truth values to make any necessary changes.

Configuring dataset parameters

Tune how ground truth values are compared to run results during accuracy testing by configuring parameters for each ground truth dataset.

  1. From the Ground truth datasets page, select a dataset to open its overview page.

  2. Select the Configuration tab, modify parameters as needed, then click Save.

    • Exempt misclassified documents – Excludes incorrectly classified documents from extraction accuracy calculations. If enabled, this option prevents skewed field-level accuracy calculations due to misclassified documents.

    • Ignore whitespace – Disregards spaces, tabs, and line breaks when comparing values.

    • Ignore special characters – Excludes non-alphanumeric characters from comparisons.

    • Ignore casing – Treats uppercase and lowercase letters as equivalent.

    • Allow Levenshtein distance – Permits a maximum number of character differences between strings. For example, a Maximum difference of 1 treats cat and bat as a match because they differ by 1 character.

Testing accuracy

Before you begin
You must have one or more ground truth datasets for the app you want to test.
  1. From the Hub, open the app you want to test.

  2. In the app sidebar, select Accuracy tests.

  3. Click Test accuracy.

  4. Select the app version and ground truth dataset you want to use for testing, then click Next.

  5. In the Test accuracy window, click Run test.

    The files in the selected ground truth dataset are run through the app version you selected and compared to the dataset’s ground truth values to generate accuracy metrics.

Viewing accuracy test results

The Accuracy test page for each app, accessible from the app sidebar, displays a list of accuracy tests run against all app versions. The page summarizes key accuracy metrics for each test.

To see a full accuracy report, hover over a completed acccuracy test and click the view accuracy report icon

Box with an arrow pointing out of the upper right corner.
.

Accuracy reports reiterate the key metrics shown on the accuracy test page. These Project overview metrics provide a high-level summary of classification and extraction validity and accuracy. Similarly, the Summary by class section reports classification and extraction validity and accuracy by class.

The Extraction details section breaks down accuracy metrics for individual fields within each class. Use this section to gain a better understanding of project and class summary metrics. You can expand the Ground truth table to drill into results for each field. The ground truth table reports classification and extraction results in each table cell. Validation results are indicated with icons and ground truth values are displayed when they differ from the run results.

One row of the ground truth table, showing a variety of results.

  1. Result failed validation, indicated by a red error icon.

  2. Result passed validation, indicated by a green checkmark.

  3. Result didn’t match ground truth value, indicated in red strikethrough text with the ground truth value reported below.

  4. Result matched ground truth value, indicated in normal text.

To see a full-screen comparison of the ground truth value and the run result, hover over the result in any cell and click the view icon

Four arrows arranged as if they're pointing at the four corners of a box.
. This full-screen comparison is helpful to review tables and longer-form results. From this view, you can also set the run result as the new ground truth value if needed.

Accuracy metrics

Accuracy metrics measure how valid and how accurate app results are.

In projects without classification, classification metrics are reported at 100% because all documents are effectively classified in the default class.

Automation rate indicates the percent of classes or fields that are processed without human intervention, either because they pass validation or have no validation rules. While higher automation rates generally indicate efficiency, they can be misleading in apps with no or few validations.

Validated accuracy indicates the percent of automated classes or fields that match the ground truth dataset. A higher validated accuracy means that results are both valid, as measured by validation rules, and accurate, as measured by ground truth values.

Raw accuracy indicates the percent of fields that match corresponding ground truth values. This metric measures extraction accuracy only, without factoring in validations. Raw accuracy is particularly helpful when it differs significantly in either direction from automation rate. For example, if automation rate is low but raw accuracy is high, this suggests accurate run results but overly strict validation rules.

Was this page helpful?