Refiner is a document extraction tool for extracting text and structure from documents. Connect a refiner program to a flow by linking an .ibrefiner file to an Apply refiner step module.
Use refiner to do additional refinement on data extracted by deep learning models, from post-processing model output to adding custom business logic.
Refiner provides:
In-app visualization of the development dataset.
Detailed in-app documentation of refiner functions.
Full UDF authoring, executing, and debugging support.
Open file formats for describing images, documents, training data, and extraction.
A plugin system for extraction models to interact with these file formats.
Rules for how these models behave to preserve audit trails, explainability, and composition.
Refiner supports these main forms of extraction:
The right approach depends on whether your documents have a fixed or variable structure.
The structure of fixed structure documents is known in advance and doesn’t change. Fixed structure documents have labels associated with fields to extract. For example, paystubs from the same vendor, bank statements from the same bank, or state-specific driver licenses.
The structure of variable structure documents can change between documents. These documents have labels associated with the fields to extract. For example US W-2s, bills of lading, and invoices.
In the header, click the initials icon and select Switch to advanced view.
In the left sidebar, click the All apps icon, then select Refiner.
Click Create classic refiner.
Fill out the fields in the creation dialog.
Use the records drawer to control which records appear in the output table and the run process.
To open the records drawer, click Select Records. The Available Records drawer opens with all the available records.
Click a record to preview the first 3 pages of the document.
Select All or select one or more record checkboxes to include the selected records in the output table and the run process.
Selecting only a subset of records speeds up development because unselected records are excluded from the run process.
To extract a text field:
In the right panel, click + New Field.
field_ name with a self-describing unique field name, then press Enter or click outside of the field to apply the name change. Duplicate field names are not supported. For helper fields, prefix the field name with a double underscore (__). The double underscore is a naming convention that prevents helper fields from being generated in the output and downstream applications.In the bottom panel, define your field.
Leave the Field type with the default Text Field.
Optional: Select an Output type. The supported types are: No type, Text, Float, Integer, List, Image, Table, Dict. Defining a type lets you filter output display by the selected output type.
To use the Target Comparison feature, select a Target name to map the new field to a field in the targets file.
Optional: Add a Field description.
Optional: To enable Target Comparison, move the Run with targets slider to the right.
In the bottom panel, enter the refiner formula, and click Run Field.
The results show for each field. If the Target Comparison feature is enabled for the run, the mapped fields are indicated with a purple bar in the records list and in the fields pane.
Click Save.
When you start typing a refiner function, in-app documentation appears. To view the full formula list, select Help > Formula list.
Fields in the field panel are processed in order with Run All:
Click Run All to run the refiner functions on all fields.
Click Run Field to run only the selected field.
Right-click a field to:
Refiner might not support all your extraction requirements, particularly if you want to integrate specific business logic. You can extend capabilities by writing user-defined functions and referencing the script directory in the Settings panel.
Use refiner more efficiently by understanding its display and navigation options.
Layouts provide different ways to view documents, records, fields, and output in refiner.
Document Shows document viewer and field list.
Split Shows output table, document viewer, and field list.
Table Shows output table.
Custom You can drag to show or hide panels and save the layout.
When the document viewer is shown, you can toggle between the document image and the extracted OCR text.
Use the View menu to filter views.
Show hidden fields toggles the display in the output table. Hidden fields follow the field name prefix convention of a double underscore (__).
Show annotations for selected fields only toggles the display for annotations.
To view in-app keyboard shortcuts, select Help > Keyboard shortcuts.
Tips on isolating and resolving problems with refiner.
Refresh the page.
Reselect the IBOCR/IBDOC folder with File > Open Folder to make sure the path is still valid.
When selecting the input folder, make sure that the folder contains valid .ibdoc (IBDOC) files that do not contain refined_phrases. The input folder is typically in the project out/s2_map_records folder.
Try creating a new refiner program by right-clicking the IBOCR/IBDOC folder that you want included. Create a new refiner program to verify the file system and the .ibdoc files. Make sure that the upstream resources are in the expected location.
An unexpected error occurred warning if files aren’t in the designated location.Open the JavaScript console, take a screenshot, and attach to a bug report. Provide details about what you were doing and enough information to help us reproduce the problem.
Refresh the page.
You can see the error message directly in the cell, and that message can give you a clue.
Common errors:
Make sure your refiner formula does not have double quotation marks ("), use a single quote instead (').
Make sure your parentheses are well-matched.
Make sure your regular expressions are valid and do the right thing. You can use an external website such as regexr.com to test your expressions.
Make sure you’re providing the correct values for the parameters that the refiner functions accept. Use the in-app documentation for function usage information.
If any UDFs are involved, make sure there are no errors.
To isolate problems in UDFs, you can log messages using the logging module: