Planning flows
Flow steps occur in a specific order, but whether you include certain steps depends on your document processing requirements.
All flows begin with the Process files step, which performs OCR on sample documents. If your sample documents include multiple records, such as in multipage PDFs, you then use the Map records step. To classify different types of documents, use the Agent classifier step to classify with a large language model (LLM). After records are classified, it’s a good idea to verify classification with the Apply checkpoint step. Then, you can filter records into different branches, where they’re ready to undergo data extraction via the Agent extract step. After extraction, you can refine or validate data and perform other data processing tasks as needed.
A basic flow might be ordered something like this:
-
Process files — Converts various document formats into machine-readable text.
-
Map records — Splits multipage documents into separate records.
-
Agent classifier — Classifies records using an LLM.
-
Apply checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.
After using the checkpoint to verify records are routed to the correct branch, you might branch the flow into multiple document streams based on class.
-
Filter — Filters records based on class. Place a filter at the top of each branch, and for the filter parameter, specify the document class to allow through the filter.
If you don’t validate classification before branching and filtering, or if your production flow might include unclassifiable documents, create an additional branch that filters forother, which is the class Instabase assigns by default to documents that can’t be classified. -
Agent extract — Extracts data using a schema inferred by an LLM.
-
Apply refiner — Transforms extracted data according to your specifications.
-
Apply checkpoint — Verifies details identified in a previous step according to validation formulas, and triggers a review for failed validations.
-
Combine — Combines branches into singular flow output.
Using flow modules
Many steps don’t define all their behavior inside the flow editor. Instead, they link to modules connecting to artifacts built outside of the flow editor, such as validations, refiner programs, JSON schema files, or Python custom functions. While planning a flow, decide which steps need modules and whether you need to create a new module, or import an existing one.
Importing a module into a flow copies the module code into that flow, so any future edits you make to the original module aren’t reflected. Importing is useful for reuse, but doesn’t support syncing.
Using checkpoints
Checkpoints are defined by the Apply checkpoint step. Checkpoints verify classification or extraction data against validation rules that you write. In production, checkpoints that fail validation are queued for review by a human reviewer.
Checkpoints are typically inserted after classification and extraction steps to verify document class and extracted data. As a best practice, use a checkpoint after classification but before branching to ensure that records are routed to the correct branch for data extraction.
