Creating refiner programs

Enterprise Single-tenant

Refiner is a document extraction tool for extracting text and structure from documents. Connect a refiner program to a flow by linking an .ibrefiner file to an Apply refiner step module.

When to use refiner

Use refiner to do additional refinement on data extracted by deep learning models, from post-processing model output to adding custom business logic.

Refiner provides:

In-app visualization of the development dataset.
Detailed in-app documentation of refiner functions.
Full UDF authoring, executing, and debugging support.
Open file formats for describing images, documents, training data, and extraction.
A plugin system for extraction models to interact with these file formats.
Rules for how these models behave to preserve audit trails, explainability, and composition.

Supported extraction forms

Refiner supports these main forms of extraction:

Form	Example
Spatial extraction	Extract the phrase to the right of the text `"First Name:"`
Regular expressions	Extract the phrase captured by `$Amount: \$([0-9\.]+)^`
Metadata	What is the `FILENAME` associated with this data?
Structure	Generate a `LIST` of every bullet point under the heading `Risks`
Plugins (UDFs)	Run my custom model and return its result.

Choosing an extraction approach

The right approach depends on whether your documents have a fixed or variable structure.

Fixed structure documents

The structure of fixed structure documents is known in advance and doesn’t change. Fixed structure documents have labels associated with fields to extract. For example, paystubs from the same vendor, bank statements from the same bank, or state-specific driver licenses.

Action	Example	Method	Limitations
Extract text	Extract the phrase to the right of the text `First Name:`	Spatial functions, such as `scan_right`, `scan_below`, or `scan_box`. Or use regular expressions	Doesn’t support `scan_above`, `scan_left`
Extract structured information	Generate a list of every bullet point underneath the heading `Risks`	Use spatial functions to extract region, then split by delimiter, such as new line or comma	Doesn’t support special view to handle structured information
Specify different output format	Ensure dates are in `mm-dd-yy` format	UDF
Validate	Ensure `expiry_date` is after today’s date	UDF

Variable structure documents

The structure of variable structure documents can change between documents. These documents have labels associated with the fields to extract. For example US W-2s, bills of lading, and invoices.

Action	Example	Method	Limitations
Extract text around label	Extract the phrase around the text `Invoice Number:` or `Invoice No.`	Use regular expressions or more flexible spatial functions, such as `scan_near` or `scan_box`	Is text-based, does not support spatial scanning in the document image domain
Extract structured information	Generate a list of every bullet point underneath the heading `Risks`	First use the text-based high-variability technique above to find the right labels, then `scan_below`	More to come on table extraction
Specify different output format	Ensure dates are in `mm-dd-yy` format	UDF	More native output format selection coming
Validate	Ensure `expiry_date` is after today’s date	UDF	More native validation formulas coming

Creating refiner programs

In the header, click the initials icon and select Switch to advanced view.
In the left sidebar, click the All apps icon, then select Refiner.
Click Create classic refiner.
Fill out the fields in the creation dialog.

Extracting fields

Selecting records to include

Use the records drawer to control which records appear in the output table and the run process.

To open the records drawer, click Select Records. The Available Records drawer opens with all the available records.
Click a record to preview the first 3 pages of the document.
Select All or select one or more record checkboxes to include the selected records in the output table and the run process.

Selecting only a subset of records speeds up development because unselected records are excluded from the run process.

Defining text fields

To extract a text field:

In the right panel, click + New Field.
- Replace the provided field_ name with a self-describing unique field name, then press Enter or click outside of the field to apply the name change. Duplicate field names are not supported. For helper fields, prefix the field name with a double underscore (__). The double underscore is a naming convention that prevents helper fields from being generated in the output and downstream applications.
In the bottom panel, define your field.
- Leave the Field type with the default Text Field.
- Optional: Select an Output type. The supported types are: No type, Text, Float, Integer, List, Image, Table, Dict. Defining a type lets you filter output display by the selected output type.
- To use the Target Comparison feature, select a Target name to map the new field to a field in the targets file.
- Optional: Add a Field description.
Optional: To enable Target Comparison, move the Run with targets slider to the right.
In the bottom panel, enter the refiner formula, and click Run Field.
The results show for each field. If the Target Comparison feature is enabled for the run, the mapped fields are indicated with a purple bar in the records list and in the fields pane.
Click Save.

Unsaved Changes displays below the Save button if refiner program changes are not saved.

When you start typing a refiner function, in-app documentation appears. To view the full formula list, select Help > Formula list.

Running fields

Fields in the field panel are processed in order with Run All:

Click Run All to run the refiner functions on all fields.
Click Run Field to run only the selected field.

Right-click a field to:

Move the field to the top
Move the field up
Move the field down
Move the field to the bottom
Duplicate the field
Create a field above
Create a field below

Extending refiner

Refiner might not support all your extraction requirements, particularly if you want to integrate specific business logic. You can extend capabilities by writing user-defined functions and referencing the script directory in the Settings panel.

Reference

Use refiner more efficiently by understanding its display and navigation options.

Layouts

Layouts provide different ways to view documents, records, fields, and output in refiner.

Document Shows document viewer and field list.
Split Shows output table, document viewer, and field list.
Table Shows output table.
Custom You can drag to show or hide panels and save the layout.

When the document viewer is shown, you can toggle between the document image and the extracted OCR text.

View options

Use the View menu to filter views.

Show hidden fields toggles the display in the output table. Hidden fields follow the field name prefix convention of a double underscore (__).
Show annotations for selected fields only toggles the display for annotations.

Keyboard shortcuts

To view in-app keyboard shortcuts, select Help > Keyboard shortcuts.

Name	Shortcut	Description
Save	Command+S / Control+S	Save the program
Run Current Field	Command+Enter / Control+Enter	Run the current field only
Run (all fields)	Command+Shift+Enter or Control+Shift+Enter	Run all fields
Formula List	Command+/ or Control+/	Display searchable formula list
Next Record	Down Arrow Key	Go to the next record row
Previous Record	Up Arrow Key	Go to the previous record row
Next Field	Right Arrow Key	Go to the next field
Previous Field	Left Arrow Key	Go to the previous field

Troubleshooting

Tips on isolating and resolving problems with refiner.

Files don't load

Refresh the page.
Reselect the IBOCR/IBDOC folder with File > Open Folder to make sure the path is still valid.
When selecting the input folder, make sure that the folder contains valid .ibdoc (IBDOC) files that do not contain refined_phrases. The input folder is typically in the project out/s2_map_records folder.
Try creating a new refiner program by right-clicking the IBOCR/IBDOC folder that you want included. Create a new refiner program to verify the file system and the .ibdoc files. Make sure that the upstream resources are in the expected location.

You might get a An unexpected error occurred warning if files aren’t in the designated location.

Page goes blank

Open the JavaScript console, take a screenshot, and attach to a bug report. Provide details about what you were doing and enough information to help us reproduce the problem.
Refresh the page.

Debugging formulas and UDFs when you receive an error field in cell

You can see the error message directly in the cell, and that message can give you a clue.

Common errors:

Make sure your refiner formula does not have double quotation marks ("), use a single quote instead (').
Make sure your parentheses are well-matched.
Make sure your regular expressions are valid and do the right thing. You can use an external website such as regexr.com to test your expressions.
Make sure you’re providing the correct values for the parameters that the refiner functions accept. Use the in-app documentation for function usage information.
If any UDFs are involved, make sure there are no errors.

Logging messages in the UDF log

To isolate problems in UDFs, you can log messages using the logging module:

1 import logging
2 logging.info('my message here')

Enterprise Single-tenant

Refiner is a document extraction tool for extracting text and structure from documents. Connect a refiner program to a flow by linking an .ibrefiner file to an Apply refiner step module.

When to use refiner

Use refiner to do additional refinement on data extracted by deep learning models, from post-processing model output to adding custom business logic.

Refiner provides:

In-app visualization of the development dataset.
Detailed in-app documentation of refiner functions.
Full UDF authoring, executing, and debugging support.
Open file formats for describing images, documents, training data, and extraction.
A plugin system for extraction models to interact with these file formats.
Rules for how these models behave to preserve audit trails, explainability, and composition.

Supported extraction forms

Refiner supports these main forms of extraction:

Form	Example
Spatial extraction	Extract the phrase to the right of the text `"First Name:"`
Regular expressions	Extract the phrase captured by `$Amount: \$([0-9\.]+)^`
Metadata	What is the `FILENAME` associated with this data?
Structure	Generate a `LIST` of every bullet point under the heading `Risks`
Plugins (UDFs)	Run my custom model and return its result.

Choosing an extraction approach

The right approach depends on whether your documents have a fixed or variable structure.

Fixed structure documents

Action	Example	Method	Limitations
Extract text	Extract the phrase to the right of the text `First Name:`	Spatial functions, such as `scan_right`, `scan_below`, or `scan_box`. Or use regular expressions	Doesn’t support `scan_above`, `scan_left`
Extract structured information	Generate a list of every bullet point underneath the heading `Risks`	Use spatial functions to extract region, then split by delimiter, such as new line or comma	Doesn’t support special view to handle structured information
Specify different output format	Ensure dates are in `mm-dd-yy` format	UDF
Validate	Ensure `expiry_date` is after today’s date	UDF

Variable structure documents

The structure of variable structure documents can change between documents. These documents have labels associated with the fields to extract. For example US W-2s, bills of lading, and invoices.

Action	Example	Method	Limitations
Extract text around label	Extract the phrase around the text `Invoice Number:` or `Invoice No.`	Use regular expressions or more flexible spatial functions, such as `scan_near` or `scan_box`	Is text-based, does not support spatial scanning in the document image domain
Extract structured information	Generate a list of every bullet point underneath the heading `Risks`	First use the text-based high-variability technique above to find the right labels, then `scan_below`	More to come on table extraction
Specify different output format	Ensure dates are in `mm-dd-yy` format	UDF	More native output format selection coming
Validate	Ensure `expiry_date` is after today’s date	UDF	More native validation formulas coming

Creating refiner programs

In the header, click the initials icon and select Switch to advanced view.
In the left sidebar, click the All apps icon, then select Refiner.
Click Create classic refiner.
Fill out the fields in the creation dialog.

Extracting fields

Selecting records to include

Use the records drawer to control which records appear in the output table and the run process.

To open the records drawer, click Select Records. The Available Records drawer opens with all the available records.
Click a record to preview the first 3 pages of the document.
Select All or select one or more record checkboxes to include the selected records in the output table and the run process.

Selecting only a subset of records speeds up development because unselected records are excluded from the run process.

Defining text fields

To extract a text field:

In the right panel, click + New Field.
- Replace the provided field_ name with a self-describing unique field name, then press Enter or click outside of the field to apply the name change. Duplicate field names are not supported. For helper fields, prefix the field name with a double underscore (__). The double underscore is a naming convention that prevents helper fields from being generated in the output and downstream applications.
In the bottom panel, define your field.
- Leave the Field type with the default Text Field.
- Optional: Select an Output type. The supported types are: No type, Text, Float, Integer, List, Image, Table, Dict. Defining a type lets you filter output display by the selected output type.
- To use the Target Comparison feature, select a Target name to map the new field to a field in the targets file.
- Optional: Add a Field description.
Optional: To enable Target Comparison, move the Run with targets slider to the right.
In the bottom panel, enter the refiner formula, and click Run Field.
The results show for each field. If the Target Comparison feature is enabled for the run, the mapped fields are indicated with a purple bar in the records list and in the fields pane.
Click Save.

Unsaved Changes displays below the Save button if refiner program changes are not saved.

When you start typing a refiner function, in-app documentation appears. To view the full formula list, select Help > Formula list.

Running fields

Fields in the field panel are processed in order with Run All:

Click Run All to run the refiner functions on all fields.
Click Run Field to run only the selected field.

Right-click a field to:

Move the field to the top
Move the field up
Move the field down
Move the field to the bottom
Duplicate the field
Create a field above
Create a field below

Extending refiner

Reference

Use refiner more efficiently by understanding its display and navigation options.

Layouts

Layouts provide different ways to view documents, records, fields, and output in refiner.

Document Shows document viewer and field list.
Split Shows output table, document viewer, and field list.
Table Shows output table.
Custom You can drag to show or hide panels and save the layout.

When the document viewer is shown, you can toggle between the document image and the extracted OCR text.

View options

Use the View menu to filter views.

Show hidden fields toggles the display in the output table. Hidden fields follow the field name prefix convention of a double underscore (__).
Show annotations for selected fields only toggles the display for annotations.

Keyboard shortcuts

To view in-app keyboard shortcuts, select Help > Keyboard shortcuts.

Name	Shortcut	Description
Save	Command+S / Control+S	Save the program
Run Current Field	Command+Enter / Control+Enter	Run the current field only
Run (all fields)	Command+Shift+Enter or Control+Shift+Enter	Run all fields
Formula List	Command+/ or Control+/	Display searchable formula list
Next Record	Down Arrow Key	Go to the next record row
Previous Record	Up Arrow Key	Go to the previous record row
Next Field	Right Arrow Key	Go to the next field
Previous Field	Left Arrow Key	Go to the previous field

Troubleshooting

Tips on isolating and resolving problems with refiner.

Files don't load

Refresh the page.
Reselect the IBOCR/IBDOC folder with File > Open Folder to make sure the path is still valid.
When selecting the input folder, make sure that the folder contains valid .ibdoc (IBDOC) files that do not contain refined_phrases. The input folder is typically in the project out/s2_map_records folder.
Try creating a new refiner program by right-clicking the IBOCR/IBDOC folder that you want included. Create a new refiner program to verify the file system and the .ibdoc files. Make sure that the upstream resources are in the expected location.

You might get a An unexpected error occurred warning if files aren’t in the designated location.

Page goes blank

Open the JavaScript console, take a screenshot, and attach to a bug report. Provide details about what you were doing and enough information to help us reproduce the problem.
Refresh the page.

Debugging formulas and UDFs when you receive an error field in cell

You can see the error message directly in the cell, and that message can give you a clue.

Common errors:

Make sure your refiner formula does not have double quotation marks ("), use a single quote instead (').
Make sure your parentheses are well-matched.
Make sure your regular expressions are valid and do the right thing. You can use an external website such as regexr.com to test your expressions.
Make sure you’re providing the correct values for the parameters that the refiner functions accept. Use the in-app documentation for function usage information.
If any UDFs are involved, make sure there are no errors.

Logging messages in the UDF log

To isolate problems in UDFs, you can log messages using the logging module:

1 import logging
2 logging.info('my message here')