For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Logo
AI Hub
OverviewApp editorFlow editorAdminAPI & SDK
AI Hub
OverviewApp editorFlow editorAdminAPI & SDK
  • App editor
    • About automation apps
    • Creating apps
    • Testing apps
      • Managing ground truth datasets
      • Running accuracy tests
    • Running and deploying apps
    • Version control for apps
    • Human review
    • Custom functions
    • Error handling
On this page
  • How accuracy testing works
  • The testing workflow
  • Best practices
App editor

Testing apps

Was this page helpful?
Built with

Accuracy testing ensures automation apps meet your accuracy requirements before deployment. Through systematic comparison of app results against verified ground truth values, you can measure performance, identify improvement areas, and confidently deploy apps that deliver reliable results.

Users with tester permissions or higher can run accuracy tests and access related features. Any user can verify ground values in workspaces they have access to.

Accuracy testing isn’t supported for packet-processing apps with cross-class fields.

How accuracy testing works

Accuracy testing compares app run results to verified values for a set of documents, showing you exactly how accurate your app is and where improvements are needed.

Ground truth datasets are collections of documents with verified, correct values that serve as the benchmark for measuring app performance. These datasets allow you to test consistently as you iterate on your app.

The testing workflow

Follow these high-level steps to implement accuracy testing for automation apps.

  1. Create ground truth datasets or update existing datasets associated with your app.

    Verify ground truth values for any new or updated datasets.

    You can run accuracy tests against outdated datasets, but doing so typically lowers accuracy metrics, because results aren’t aligned with existing ground truth values.
  2. Conduct accuracy testing on the new app version.

    View or compare accuracy tests and examine error patterns to identify areas for improvement.

  3. Review accuracy metrics and identify specific areas where your app needs improvement.

  4. Make changes to your app based on test results, then test again to measure improvements.

Repeat this cycle until your app meets your accuracy thresholds, then you’re ready to deploy.

Best practices

Test early and often — Create ground truth datasets as soon as you have a working app, then test with each iteration to track improvements.

Use representative data — Ensure your ground truth datasets include documents similar to your production use case.

Separate development and validation data — Use project files for initial testing, then validate with completely new documents to assess real-world performance.