Glossary

A reference of terms used in AI Hub.

Core functionality

Core functionality includes the architecture and tools shared across AI Hub.

  • Hub — The central interface where users can discover, access, and run automation apps and chatbots. The Hub displays all apps available to you, including prebuilt offerings, apps and chatbots you’ve created, and items shared within organizations.

  • Marketplace — A filter option in the Hub that displays prebuilt automation apps created by Instabase to address common document processing use cases.

  • workspace — An access-gated environment where members can create projects, manage data, configure deployments, and conduct reviews. Workspaces provide isolation of assets, data, and ongoing work.

    • personal workspace — A private workspace for creating personal projects and conversations. Personal workspaces are the only locations where conversations can be created.

    • shared workspace — A collaborative workspace within an organization where members can work together on projects, manage shared data, configure deployments, and conduct document reviews. Shared workspaces are membership-gated and support role-based permissions.

    • production workspace — A shared workspace designated for operational data and workflows. Production workspaces display a visual indicator and, in single-tenant environments, receive prioritized resource allocation.

  • digitization — The process of converting uploaded files into machine-readable text. Digitization settings in conversations and automation projects affect the quality and accuracy of document text.

  • file — Digital objects uploaded to AI Hub for processing. Files can contain one or more documents.

  • document — A logical unit of content processed by AI Hub. In automation apps with file splitting enabled, one file can contain multiple documents. For example, a multipage PDF file might contain multiple bank statement documents.

  • token (API) — A unique security credential that authenticates API requests to Instabase services. Also called an OAuth token.

  • token (LLM) — The fundamental unit by which LLMs process text. Each token represents a piece of text, such as a whole word, part of a word, or a character. For English text, one token encodes approximately four characters.

Analyze

Analyze features enable interactive document analysis through conversations and chatbots.

Conversations

  • conversation — An interactive session where users can analyze uploaded documents by submitting queries to a large language model. Conversations are created in a user’s personal workspace and can’t be shared.

  • message scope — The set of documents that a query is applied to in a conversation. Users can select specific documents to narrow a query’s scope.

  • research mode — A mode that enables more complex reasoning capabilities for queries. Research mode uses a more powerful variant of the multistep model and supports visual reasoning.

  • visual reasoning — The ability to analyze visual and stylistic elements in documents, including images, diagrams, watermarks, layout, colors, text styling, and handwritten markup.

Chatbots

  • chatbot — A shareable document analysis app that answers questions about a defined knowledge base of documents. Chatbots are created from conversations and can be shared with other users.

  • origin conversation — The conversation from which a chatbot is created. The files in the origin conversation become the chatbot’s knowledge base.

  • knowledge base — The collection of documents included in a chatbot. Chatbot users can’t add or remove files from the knowledge base, but can upload up to 5 comparison files.

  • comparison files — Documents uploaded by a chatbot user to compare against the chatbot’s knowledge base. Up to five comparison files can be uploaded, and these files remain private to the user. Also called query documents.

  • sample prompts — Example queries provided by a chatbot publisher to guide users in how to interact with the chatbot.

  • research mode — A mode that enables more complex reasoning capabilities for queries. Research mode uses a more powerful variant of the multistep model and supports visual reasoning.

  • visual reasoning — The ability to analyze visual and stylistic elements in documents, including images, diagrams, watermarks, layout, colors, text styling, and handwritten markup.

Automate

Automate features enable creating, testing, and deploying document processing applications.

Projects and apps

  • automation project — A collection of files and artifacts used to create an automation app. Projects correspond to a unique document understanding workflow, with document types (classes) and data points (fields) that address a specific use case.

  • automation app — A document processing application created from an automation project and published to the Hub. Automation apps can be run on-demand or configured as a deployment to run at scale.

  • advanced app — Custom automation apps created by Instabase to address complex enterprise use cases. Advanced apps are available from the Hub. Advanced apps can be tested, run, and deployed like other apps, but users can’t edit them or access an underlying automation project.

  • project files — The set of files used in an automation project to develop an automation app. Project files can also be used to run accuracy tests against the app.

  • class — Document type within an automation project or app. Each class can include a different set of fields to identify.

  • field — Data points to be identified from documents in an automation project or app. Various field types determine the method for generating or extracting results from document content.

    • text extraction — Field type used to extract a string of text or numbers, such as address, account balance, or filing status.

    • table extraction — Field type used to extract tables from documents.

    • list extraction — Field type used to extract multiple like items, such as deposits on a banking summary or items on a receipt.

    • document reasoning — Field type used to generate results that aren’t explicitly found in the document but can be deduced, summarized, or calculated.

    • visual reasoning — Field type used to analyze visual and stylistic elements, including elements that OCR doesn’t capture, such as images, watermarks, layout, colors, text styling, and handwritten markup.

    • derived — Field type used to generate values based on preceding fields.

    • custom function — Field type used to compute values or import third-party data with a custom Python function.

  • attribute — Data point associated with each item in a list extraction field. For example, if extracting a list of items from a receipt, price and SKU might be specified as attributes.

  • cleaning — Reformatting or standardizing field results to meet specific formatting requirements.

  • results — Output generated when an automation app processes a document. When creating apps, a result is the value extracted or derived for a specific field. In app runs, results comprise the complete set of data extracted for a given document. Results can be reviewed, downloaded in structured formats (CSV/Excel), or transmitted to downstream systems with integrations.

  • app run — Executing an app from start to finish for the purpose of processing documents.

  • AI runtime — The version-controlled software layer that processes prompts and returns results. AI runtime includes the LLM, prompt templates, and processing pipelines.

Confidence and validation

  • validation — Process of checking results against validation rules. If a validation rule doesn’t pass, it’s a validation failure. If the validation can’t be executed for some reason, it’s a validation error.

  • validation rules — Rules that determine if extracted data meets specified criteria. Types include confidence rules (based on confidence scores) and custom rules (based on validation prompts or functions).

  • confidence score — Percentage value that indicates the level of certainty in results. Higher percentages suggest greater confidence.

    • classification confidence — Indicates the model’s certainty in predicting the class of a document.

    • field confidence — Indicates the model’s certainty in the predicted value for a given field.

    • OCR confidence — Indicates the OCR processor’s certainty in digitization accuracy.

  • validation prompt — Natural language prompt that describes how to validate a field. The prompt is used to generate a custom validation rule. Validation prompts can be permanently converted to validation functions.

  • validation function — A custom Python function that validates field results.

Accuracy testing

  • accuracy testing — The process of comparing app run results against ground truth values for a set of documents to measure how accurate an app is.

  • ground truth dataset — A set of files and associated ground truth values used to test app accuracy. Ground truth datasets are associated with a specific app.

  • ground truth value — The correct result for a field or class, confirmed in human review. Ground truth values are required for any ground truth dataset.

  • accuracy metrics — Quantitative measures that indicate how valid and accurate app results are when compared to ground truth values.

  • automation rate — Percent of classes or fields that are processed without human intervention, either because they pass validation or have no validation rules.

  • validated accuracy — Percent of automated classes or fields (so, a subset of automation rate) that match the ground truth dataset. A higher validated accuracy means that results are both valid and accurate.

  • raw accuracy — Percent of classes or fields that match corresponding ground truth values. This metric measures classification or extraction accuracy only, without factoring in validations.

  • dataset parameters — Configuration settings that determine how ground truth values are compared to run results during accuracy testing. Parameters include global rules that apply across all fields and field rules that can override global rules for specific fields.

Deployments

  • deployment — An implementation of an automation app configured to run at scale. Deployments typically include automation, integration, and human review.

  • integration — A deployment configuration that pulls files from upstream systems for processing or sends results to downstream systems.

    • upstream integration — Pulls files or folder contents from connected sources for processing. Options include connected drives and connected mailboxes.

    • downstream integration — Sends processing results to external systems. Options include email, connected drives, and custom functions.

    • integration function — A custom Python function used to send results to external systems or services, such as webhooks or APIs.

  • deployment run — Executing a deployment from start to finish for the purpose of processing documents.

Custom functions

  • custom function — User-defined Python function used in automation projects to extract (custom function field), clean (cleaning function), or validate (validation function) results; or in deployment integration functions to integrate with external systems.

  • system keys — Built-in values about files, documents, users, and execution environments available through the context parameter in custom functions.

  • custom keys — Runtime variables defined for apps and deployments, accessed through the keys parameter using keys['custom']['<key-name>'] in custom functions.

  • secret keys — Organization-level encrypted values like API keys or credentials, accessed through the keys parameter using keys['secret']['<key-name>'] in custom functions.

  • test values — Sample data used for custom and secret keys while writing custom functions in automation projects. Test values are replaced with runtime values when you run the app or create a deployment.

  • runtime configuration — General term for custom or secret keys that are defined for a specific deployment.

Human review

  • human review — The process of manually verifying and correcting results from deployment runs. Human review helps ensure data accuracy and allows for correction of any errors in automated extraction.

  • reviewers - Users tasked with reviewing results from deployment runs. By default, any member of the workspace where a deployment is run can function as a reviewer. In enterprise organizations, reviewers can be assigned to workflow queues via group membership.

  • review managers - Users who oversee human review tasks, such as assigning reviews. By default, workspace managers function as review managers in workspaces where a deployment is run. In enterprise organizations, review managers can be assigned to workflow queues via the group manager role.

  • review strategy — Determines how files are sent for human review. Options include review by file (sends only files that fail validation) and review by run (sends entire runs if any file fails validation).

  • review queue — A collection of items awaiting initial review from an assigned group.

  • escalation queue — A collection of items flagged for further review from an assigned group.

  • service-level agreement (SLA) — Efficiency targets for human review, specified in minutes, hours, or days. Timing begins when a deployment run begins, and the SLA is satisfied when a file is marked as reviewed.

Automation metrics

  • automation metrics — Quantitative measures that indicate performance and usage for deployments.

  • automation state — Evaluates the effectiveness of automation through validation rules and human review, combining validation outcome (valid/invalid) and human review outcome (modified/unmodified). There are four possible states: valid and unmodified, invalid and unmodified, invalid and modified, and valid and modified.

    • valid — A class or field result that passes all applicable validation rules, or that has no validation rules applied.

    • invalid — A class or field result that fails one or more validation rules.

    • modified — A field value that’s changed by a reviewer during human review.

    • unmodified — A field value that remains unchanged during human review.

  • consumption — Total number of documents, pages, or runs successfully processed, from submission to completion of any reviews, over a specified period.

  • handling time — Time to process a document from submission to when the run is complete or, if human review is required, when the document is marked reviewed.

  • automation rate — Percent of all fields that are extracted correctly as measured by unmodified human review results. This metric includes fields with and without validation rules. High automation rates indicate fields that are extracted accurately without needing human intervention.

  • runtime accuracy — Percent of fields with validation rules that were extracted correctly as measured by unmodified human review results. This metric excludes fields without validation rules. High runtime accuracy indicates that validation rules are correctly passing accurate results.

Administration

Administration capabilities enable managing AI Hub access, security, and resources, providing the foundation for effective governance, collaboration, and resource optimization across your AI Hub environment.

User and access management

  • organization — A collection of users, workspaces, resources, and configurations in AI Hub. Organizations exist at the Commercial and Enterprise tiers.

  • organization admin — A member role with organization-level permissions. Admins can manage workspaces, groups, and roles across the organization and have access to all organization workspaces, including all members’ personal workspaces.

  • organization member — The default role for all organization members. Members can access their personal workspace and any shared workspaces they’re added to.

  • workspace manager — A role assigned at the workspace level. Workspace managers can manage workspace members and their roles, manage human review tasks, and create and configure deployments.

  • group — A collection of organization members. Groups can be added to workspaces and assigned roles collectively, helping simplify workspace access and permissions management.

  • group manager — A role assigned within a group. Group managers can manage the group’s members.

  • service account — An AI Hub account not tied to a particular member, used only for interacting with the AI Hub API and SDK. Service accounts can be added to groups and workspaces and assigned roles like standard member accounts.

Authentication and security

  • secrets — A feature used to manage sensitive information such as API keys or login credentials, so they can be referenced in automation custom functions. Secrets are securely stored and managed at the organization level.

  • SAML SSO — Security Assertion Markup Language-based single sign-on authentication. Supported in single-tenant AI Hub environments.

  • OIDC SSO — OpenID Connect-based single sign-on authentication. Supported in single-tenant AI Hub environments.

  • group mapping — A feature that maps groups created in an SSO identity provider to groups created in AI Hub, allowing group membership to be managed at the identity provider level.

  • audit log — A record of organization or member activities, used for tracking operations performed within an organization. Supported in Enterprise-tier organizations with single-tenant environments.

Data connections

  • Instabase Drive — Default storage included with all AI Hub accounts, providing up to one terabyte of storage for processed AI Hub files. The Instabase Drive can be disabled but not disconnected.

  • connected drive — External storage connected to AI Hub as a source for uploading input files or as a destination for saving processed output files. Supported drives includes Google Drive, Amazon S3, Azure Blob Storage, and Google Cloud Storage.

  • connected mailbox — Email mailbox connected to AI Hub for use as an upstream integration data source.

  • default drive — The drive where all processed AI Hub files are stored by default. Default drives can be assigned at the organization level or the workspace level.

  • organization drive — A drive connected at the organization level, available to all workspaces in the organization. All organization members can access files on organization drives.

  • workspace drive — A drive connected at the workspace level, available only within that workspace. All members of the workspace can access files on workspace drives.

Billing and usage

  • consumption unit — The basic unit of measure for AI Hub usage. Consumption units are valued at a rate of 100 units per 1 USD (or 1 Unit per 0.015 USD for federal pricing).

  • platform fee — The monthly base subscription fee for commercial and enterprise subscriptions, billed in advance of the next service period.

  • monthly usage quota — A default limit of $250 (USD) or 25,000 consumption units per month for community subscribers, to protect against accidental overuse.

  • service period — The start and end date through which usage is tracked for billing. Service periods reflect billing cycles.