Glossary

A reference of terms used in AI Hub.

Core functionality

Core functionality includes the architecture and tools shared across AI Hub.

  • digitization — The process of converting uploaded files into machine-readable text. Digitization settings in automation projects affect the quality and accuracy of document text.

  • document — A logical unit of content processed by AI Hub. In automation apps with file splitting enabled, one file can contain multiple documents. For example, a multipage PDF file might contain multiple bank statement documents.

  • file — Digital objects uploaded to AI Hub for processing. Files can contain one or more documents.

  • Hub — The central interface where users can discover, access, and run automation apps. The Hub displays all apps available to you, including prebuilt offerings, apps you’ve created, and items shared within organizations.

  • Marketplace — A filter option in the Hub that displays prebuilt automation apps created by Instabase to address common document processing use cases.

  • repo — See workspace.

  • workspace — An access-gated environment where members can create projects, manage data, configure deployments, and conduct reviews. Workspaces provide isolation of assets, data, and ongoing work. In flow editor, also called a repo.

    • development workspace — Any workspace that’s not designated as a production workspace. Development workspaces are used for building, testing, and iterating on apps and deployments before promoting changes to production environments.

    • personal workspace — A private workspace for creating personal projects.

    • production workspace — A shared workspace designated for operational data and workflows. Production workspaces display a visual indicator and, in single-tenant environments, receive prioritized resource allocation. When linked to a development workspace, you can promote deployment changes from development to production.

    • shared workspace — A collaborative workspace within an organization where members can work together on projects, manage shared data, configure deployments, and conduct document reviews. Shared workspaces are membership-gated and support role-based permissions.

  • token (API) — A unique security credential that authenticates API requests to Instabase services. Also called an OAuth token. Enterprise-tier organizations can configure the use of external OAuth providers for issuing and managing API tokens.

  • token (LLM) — The fundamental unit by which LLMs process text. Each token represents a piece of text, such as a whole word, part of a word, or a character. For English text, one token encodes approximately four characters.

Automate

Automate features enable creating, testing, and deploying document processing applications.

Projects and apps

  • advanced app — An app created from a flow. Advanced apps can be created by Instabase to solve specific processing needs, or can be developed using the flow editor in advanced view. Advanced apps can be tested, run, and deployed like other apps.

  • AI runtime — The version-controlled software layer that processes prompts and returns results. AI runtime includes the LLM, prompt templates, and processing pipelines.

  • app run — Executing an app from start to finish for the purpose of processing documents.

  • attribute — Data point associated with each item in a list extraction field. For example, if extracting a list of items from a receipt, price and SKU might be specified as attributes.

  • automation app — A document processing application created from an automation project and published to the Hub. Automation apps can be run on-demand or configured as a deployment to run at scale.

  • automation project — A collection of files and artifacts used to create an automation app. Projects correspond to a unique document understanding workflow, with document types (classes) and data points (fields) that address a specific use case.

  • class — Document type within an automation project or app. Each class can include a different set of fields to identify.

    • classification function — Classification method that uses a custom Python function to classify documents into specific classes based on deterministic rules rather than relying on LLM classification.
  • cleaning — Reformatting or standardizing field results to meet specific formatting requirements.

    • cleaning function — Cleaning method that uses a custom Python function to refine or standardize field results.
  • cross-class field — Field that consolidates data extracted from standard fields within a packet. Cross-class field types determine the method for selecting or generating results.

    • custom function — Cross-class field type that uses a custom Python function to compute values, typically based on input fields.

    • derived — Cross-class field type used to generate values based on input fields.

    • ranked — Cross-class field type that selects a value from specified input fields based on prioritization criteria. Ranking can be ordered (based on priority you set) or unordered (based on confidence scores).

  • field — Data points to be identified from documents in an automation project or app. Various field types determine the method for generating or extracting results from document content.

    • custom function — Field type that uses a custom Python function to compute values or import third-party data.

    • derived — Field type used to generate values based on preceding fields.

    • document reasoning — Field type used to generate results that aren’t explicitly found in the document but can be deduced, summarized, or calculated.

    • list extraction — Field type used to extract multiple like items, such as deposits on a banking summary or items on a receipt.

    • table extraction — Field type used to extract tables from documents.

    • text extraction — Field type used to extract a string of text or numbers, such as address, account balance, or filing status.

    • visual reasoning — Field type used to analyze visual and stylistic elements, including elements that OCR doesn’t capture, such as images, watermarks, layout, colors, text styling, and handwritten markup.

  • hidden field — A field that serves as a reference or input for other fields but isn’t strictly required in results. Hidden fields are excluded from most displays by default.

  • job — See run.

  • packet — A set of related documents processed together as a unit, such as a loan application with supporting bank statements and tax documents. In apps with cross-class fields, one run equals one packet.

  • processing mode — The method used to transform files into actionable insights within an automation project. Processing modes determine which AI runtime powers document analysis and which configuration options and features are available.

    • agent mode — A processing mode that uses agentic architecture and next-generation models.

    • legacy mode — A processing mode that uses conventional document understanding methods.

  • project files — The set of files used in an automation project to develop an automation app. Project files can also be used to run accuracy tests against the app.

  • results — Output generated when an automation app processes a document. When creating apps, a result is the value extracted or derived for a specific field. In app runs, results comprise the complete set of data extracted for a given document. Results can be reviewed, downloaded in structured formats (CSV/Excel), or transmitted to downstream systems with integrations.

  • run — The general term for a single processing execution in AI Hub, including app runs and deployment runs. Also called jobs, when referring to flow runs.

  • UDF — User-defined function. See custom function.

Confidence and validation

  • confidence score — Percentage value that indicates the level of certainty in results. Higher percentages suggest greater confidence.

    • classification confidence — Indicates the model’s certainty in predicting the class of a document.

    • field confidence — Indicates the model’s certainty in the predicted value for a given field.

    • OCR confidence — Indicates the OCR processor’s certainty in digitization accuracy.

  • validation — Process of checking results against validation rules. If a validation rule doesn’t pass, it’s a validation failure. If the validation can’t be executed for some reason, it’s a validation error.

  • validation function — Validation method that uses a custom Python function to validate field results.

  • validation prompt — Natural language prompt that describes how to validate a field. The prompt is used to generate a custom validation rule. Validation prompts can be permanently converted to validation functions.

  • validation rules — Rules that determine if extracted data meets specified criteria. Types include confidence rules (based on confidence scores) and custom rules (based on validation prompts or functions).

Accuracy testing

  • accuracy metrics — Quantitative measures that indicate how valid and accurate app results are when compared to ground truth values.

  • accuracy testing — The process of comparing app run results against ground truth values for a set of documents to measure how accurate an app is.

  • automation rate — Percent of classes or fields that are processed without human intervention, either because they pass validation or have no validation rules.

  • dataset parameters — Configuration settings that determine how ground truth values are compared to run results during accuracy testing. Parameters include global rules that apply across all fields and field rules that can override global rules for specific fields.

  • ground truth dataset — A set of files and associated ground truth values used to test app accuracy. Ground truth datasets are associated with a specific app.

  • ground truth value — The correct result for a field or class, confirmed in human review. Ground truth values are required for any ground truth dataset.

  • raw accuracy — Percent of classes or fields that match corresponding ground truth values. This metric measures classification or extraction accuracy only, without factoring in validations.

  • validated accuracy — Percent of automated classes or fields (so, a subset of automation rate) that match the ground truth dataset. A higher validated accuracy means that results are both valid and accurate.

Deployments

  • deployment — An implementation of an automation app configured to run at scale. Deployments typically include automation, integration, and human review.

  • deployment run — Executing a deployment from start to finish for the purpose of processing documents.

  • integration — A deployment configuration that pulls files from upstream systems for processing or sends results to downstream systems.

    • downstream integration — Sends processing results to external systems. Options include email, connected drives, and custom functions.

    • integration function — Downstream integration method that uses a custom Python function to send results to external systems or services, such as webhooks or APIs.

    • upstream integration — Pulls files or folder contents from connected sources for processing. Options include connected drives and connected mailboxes.

  • linked deployment — An established promotion pathway between designated deployments in a development and production workspace.

Custom functions

  • custom function — User-defined Python function used in automation projects to extract (custom function field), clean (cleaning function), or validate (validation function) results; or in deployments to integrate with downstream systems (integration function).

  • custom keys — Runtime variables defined for apps and deployments, accessed through the keys parameter using keys['custom']['<key-name>'] in custom functions.

  • runtime configuration — General term for custom or secret keys that are defined for a specific deployment.

  • secret keys — Organization-level encrypted values like API keys or credentials, accessed through the keys parameter using keys['secret']['<key-name>'] in custom functions.

  • system keys — Built-in values about files, documents, users, and execution environments available through the context parameter in custom functions.

  • test values — Sample data used for custom and secret keys while writing custom functions in automation projects. Test values are replaced with runtime values when you run the app or create a deployment.

Human review

  • escalation queue — A collection of items flagged for further review from an assigned group.

  • human review — The process of manually verifying and correcting results from deployment runs. Human review helps ensure data accuracy and allows for correction of any errors in automated extraction.

  • review managers - Users who oversee human review tasks, such as assigning reviews. By default, workspace managers function as review managers in workspaces where a deployment is run. In enterprise organizations, review managers can be assigned to review queues via the group manager role.

  • review queue — A collection of items awaiting initial review from an assigned group.

  • reviewers - Users tasked with reviewing results from deployment runs. By default, any member of the workspace where a deployment is run can function as a reviewer. In enterprise organizations, reviewers can be assigned to review queues via group membership.

  • review strategy — Determines how files are sent for human review. Options include review by file (sends only files that fail validation) and review by run (sends entire runs if any file fails validation).

  • service-level agreement (SLA) — Efficiency targets for human review, specified in minutes, hours, or days. Timing begins when a deployment run begins, and the SLA is satisfied when a file is marked as reviewed.

Automation metrics

  • automation metrics — Quantitative measures that indicate performance and usage for deployments.

  • automation rate — Percent of all fields that are extracted correctly as measured by unmodified human review results. This metric includes fields with and without validation rules. High automation rates indicate fields that are extracted accurately without needing human intervention.

  • automation state — Evaluates the effectiveness of automation through validation rules and human review, combining validation outcome (valid/invalid) and human review outcome (modified/unmodified). There are four possible states: valid and unmodified, invalid and unmodified, invalid and modified, and valid and modified.

    • invalid — A class or field result that fails one or more validation rules.

    • modified — A field value that’s changed by a reviewer during human review.

    • unmodified — A field value that remains unchanged during human review.

    • valid — A class or field result that passes all applicable validation rules, or that has no validation rules applied.

  • consumption — Total number of documents, pages, or runs successfully processed, from submission to completion of any reviews, over a specified period.

  • handling time — Time to process a document from submission to when the run is complete or, if human review is required, when the document is marked reviewed.

  • runtime accuracy — Percent of fields with validation rules that were extracted correctly as measured by unmodified human review results. This metric excludes fields without validation rules. High runtime accuracy indicates that validation rules are correctly passing accurate results.

Administration

Administration capabilities enable managing AI Hub access, security, and resources, providing the foundation for effective governance, collaboration, and resource optimization across your AI Hub environment.

User and access management

  • group — A collection of organization members. Groups can be added to workspaces and assigned workspace roles collectively, helping simplify workspace access and permissions management.

  • group role — A role assigned to a group member. Group roles offer no special permissions outside of the group context.

    • group manager — A group role that grants permissions to manage the group’s members.
  • organization — A collection of users, workspaces, resources, and configurations in AI Hub. Organizations exist at the Commercial and Enterprise tiers.

  • organization role — A role assigned to an organization member that grants organization-level permissions.

    • admin — An organization role that grants the highest level of administrative permissions and organization access. Admins can manage workspaces, groups, and roles across the organization and have access to all organization workspaces, including all members’ personal workspaces.

    • member — The default organization role for all organization members, which grants limited permissions and access. Members can access their personal workspace and any shared workspaces they’re added to.

  • service account — An AI Hub account not tied to a particular member, used only for interacting with the AI Hub API and SDK. Service accounts can be added to groups and workspaces and assigned roles like standard member accounts.

  • workspace role — A role assigned to a workspace member that grants workspace-level permissions. Each workspace role includes all permissions from lower-level roles, plus additional capabilities. Workspace roles reflect job functions.

    • tester — Testers perform accuracy, integration, or user acceptance testing on apps.

    • reviewer — Reviewers can review documents or runs that fail validation in assigned workspaces or review queues.

    • review manager — Review managers oversee reviewers in assigned workspaces or review queues.

    • developer — Developers create apps to turn unstructured data into insights. In Enterprise-tier, single-tenant organizations with advanced view enabled, developers can also use the flow editor to develop flows and publish them as advanced apps.

    • workspace manager — Workspace managers manage workspace membership and roles. They can also connect workspace-level data sources.

Authentication and security

  • audit log — A record of organization or member activities, used for tracking operations performed within an organization. Supported in Enterprise-tier organizations with single-tenant environments.

  • group mapping — A feature that maps groups created in an SSO identity provider to groups created in AI Hub, allowing group membership to be managed at the identity provider level.

  • OIDC SSO — OpenID Connect-based single sign-on. An authentication protocol that allows users to sign in using their organization’s identity provider rather than separate email and password credentials.

  • SAML SSO — Security assertion markup language-based single sign-on. An authentication protocol that allows users to sign in using their organization’s identity provider rather than separate email and password credentials.

  • secrets — A feature used to manage sensitive information such as API keys or login credentials, so they can be referenced in automation custom functions. Secrets are securely stored and managed at the organization level.

Data connections

  • connected drive — External storage connected to AI Hub as a source for uploading input files or as a destination for saving processed output files. Supported drives includes Google Drive, Amazon S3, Azure Blob Storage, and Google Cloud Storage.

  • connected mailbox — Email mailbox connected to AI Hub for use as an upstream integration data source.

  • Data Drive — Workspace-level included storage that serves as the default location for input files in each workspace.

  • default drive — The drive where all processed AI Hub files are stored by default. Default drives can be assigned at the organization level or the workspace level.

  • Instabase Drive — Default storage included with all AI Hub accounts, providing up to one terabyte of storage for processed AI Hub files. Instabase Drive serves as the organization default drive for output and storage. The Instabase Drive can be disabled but not disconnected.

  • organization drive — A drive connected at the organization level, available to all workspaces in the organization. All organization members can access files on organization drives.

  • workspace drive — A drive connected at the workspace level, available only within that workspace. All members of the workspace can access files on workspace drives.

Billing and usage

  • platform fee — The base subscription fee.

  • service period — The start and end date through which usage is tracked for billing. Service periods reflect billing cycles.