Creating refiner programs
Refiner is a document extraction tool for extracting text and structure from documents. Connect a refiner program to a flow by linking an .ibrefiner file to an Apply refiner step module.
When to use refiner
Use refiner to do additional refinement on data extracted by deep learning models, from post-processing model output to adding custom business logic.
Refiner provides:
-
In-app visualization of the development dataset.
-
Detailed in-app documentation of refiner functions.
-
Full UDF authoring, executing, and debugging support.
-
Open file formats for describing images, documents, training data, and extraction.
-
A plugin system for extraction models to interact with these file formats.
-
Rules for how these models behave to preserve audit trails, explainability, and composition.
Supported extraction forms
Refiner supports these main forms of extraction:
Choosing an extraction approach
The right approach depends on whether your documents have a fixed or variable structure.
Fixed structure documents
The structure of fixed structure documents is known in advance and doesn’t change. Fixed structure documents have labels associated with fields to extract. For example, paystubs from the same vendor, bank statements from the same bank, or state-specific driver licenses.
Variable structure documents
The structure of variable structure documents can change between documents. These documents have labels associated with the fields to extract. For example US W-2s, bills of lading, and invoices.
Creating refiner programs
-
In the header, click the initials icon and select Switch to advanced view.
-
In the left sidebar, click the All apps icon, then select Refiner.
-
Click Create classic refiner.
-
Fill out the fields in the creation dialog.
Extracting fields
Selecting records to include
Use the records drawer to control which records appear in the output table and the run process.
-
To open the records drawer, click Select Records. The Available Records drawer opens with all the available records.
-
Click a record to preview the first 3 pages of the document.
-
Select All or select one or more record checkboxes to include the selected records in the output table and the run process.
Selecting only a subset of records speeds up development because unselected records are excluded from the run process.
Defining text fields
To extract a text field:
-
In the right panel, click + New Field.
- Replace the provided
field_name with a self-describing unique field name, then press Enter or click outside of the field to apply the name change. Duplicate field names are not supported. For helper fields, prefix the field name with a double underscore (__). The double underscore is a naming convention that prevents helper fields from being generated in the output and downstream applications.
- Replace the provided
-
In the bottom panel, define your field.
-
Leave the Field type with the default Text Field.
-
Optional: Select an Output type. The supported types are: No type, Text, Float, Integer, List, Image, Table, Dict. Defining a type lets you filter output display by the selected output type.
-
To use the Target Comparison feature, select a Target name to map the new field to a field in the targets file.
-
Optional: Add a Field description.
-
-
Optional: To enable Target Comparison, move the Run with targets slider to the right.
-
In the bottom panel, enter the refiner formula, and click Run Field.
-
The results show for each field. If the Target Comparison feature is enabled for the run, the mapped fields are indicated with a purple bar in the records list and in the fields pane.
-
Click Save.
When you start typing a refiner function, in-app documentation appears. To view the full formula list, select Help > Formula list.
Running fields
Fields in the field panel are processed in order with Run All:
-
Click Run All to run the refiner functions on all fields.
-
Click Run Field to run only the selected field.
Right-click a field to:
- Move the field to the top
- Move the field up
- Move the field down
- Move the field to the bottom
- Duplicate the field
- Create a field above
- Create a field below
Extending refiner
Refiner might not support all your extraction requirements, particularly if you want to integrate specific business logic. You can extend capabilities by writing user-defined functions and referencing the script directory in the Settings panel.
Reference
Use refiner more efficiently by understanding its display and navigation options.
Layouts
Layouts provide different ways to view documents, records, fields, and output in refiner.
-
Document Shows document viewer and field list.
-
Split Shows output table, document viewer, and field list.
-
Table Shows output table.
-
Custom You can drag to show or hide panels and save the layout.
When the document viewer is shown, you can toggle between the document image and the extracted OCR text.
View options
Use the View menu to filter views.
-
Show hidden fields toggles the display in the output table. Hidden fields follow the field name prefix convention of a double underscore (
__). -
Show annotations for selected fields only toggles the display for annotations.
Keyboard shortcuts
To view in-app keyboard shortcuts, select Help > Keyboard shortcuts.
Troubleshooting
Tips on isolating and resolving problems with refiner.
Files don't load
-
Refresh the page.
-
Reselect the IBOCR/IBDOC folder with File > Open Folder to make sure the path is still valid.
-
When selecting the input folder, make sure that the folder contains valid
.ibdoc(IBDOC) files that do not containrefined_phrases. The input folder is typically in the projectout/s2_map_recordsfolder. -
Try creating a new refiner program by right-clicking the IBOCR/IBDOC folder that you want included. Create a new refiner program to verify the file system and the
.ibdocfiles. Make sure that the upstream resources are in the expected location.
An unexpected error occurred warning if files aren’t in the designated location.Page goes blank
-
Open the JavaScript console, take a screenshot, and attach to a bug report. Provide details about what you were doing and enough information to help us reproduce the problem.
-
Refresh the page.
Debugging formulas and UDFs when you receive an error field in cell
You can see the error message directly in the cell, and that message can give you a clue.
Common errors:
-
Make sure your refiner formula does not have double quotation marks (
"), use a single quote instead ('). -
Make sure your parentheses are well-matched.
-
Make sure your regular expressions are valid and do the right thing. You can use an external website such as regexr.com to test your expressions.
-
Make sure you’re providing the correct values for the parameters that the refiner functions accept. Use the in-app documentation for function usage information.
-
If any UDFs are involved, make sure there are no errors.
Logging messages in the UDF log
To isolate problems in UDFs, you can log messages using the logging module:
