Creating refiner programs

Enterprise Single-tenant

Refiner is a document extraction tool for extracting text and structure from documents. Connect a refiner program to a flow by linking an .ibrefiner file to an Apply refiner step module.

When to use refiner

Use refiner to do additional refinement on data extracted by deep learning models, from post-processing model output to adding custom business logic.

Refiner provides:

  • In-app visualization of the development dataset.

  • Detailed in-app documentation of refiner functions.

  • Full UDF authoring, executing, and debugging support.

  • Open file formats for describing images, documents, training data, and extraction.

  • A plugin system for extraction models to interact with these file formats.

  • Rules for how these models behave to preserve audit trails, explainability, and composition.

Supported extraction forms

Refiner supports these main forms of extraction:

FormExample
Spatial extractionExtract the phrase to the right of the text "First Name:"
Regular expressionsExtract the phrase captured by $Amount: \$([0-9\.]+)^
MetadataWhat is the FILENAME associated with this data?
StructureGenerate a LIST of every bullet point under the heading Risks
Plugins (UDFs)Run my custom model and return its result.

Choosing an extraction approach

The right approach depends on whether your documents have a fixed or variable structure.

Fixed structure documents

The structure of fixed structure documents is known in advance and doesn’t change. Fixed structure documents have labels associated with fields to extract. For example, paystubs from the same vendor, bank statements from the same bank, or state-specific driver licenses.

ActionExampleMethodLimitations
Extract textExtract the phrase to the right of the text First Name:Spatial functions, such as scan_right, scan_below, or scan_box. Or use regular expressionsDoesn’t support scan_above, scan_left
Extract structured informationGenerate a list of every bullet point underneath the heading RisksUse spatial functions to extract region, then split by delimiter, such as new line or commaDoesn’t support special view to handle structured information
Specify different output formatEnsure dates are in mm-dd-yy formatUDF
ValidateEnsure expiry_date is after today’s dateUDF

Variable structure documents

The structure of variable structure documents can change between documents. These documents have labels associated with the fields to extract. For example US W-2s, bills of lading, and invoices.

ActionExampleMethodLimitations
Extract text around labelExtract the phrase around the text Invoice Number: or Invoice No.Use regular expressions or more flexible spatial functions, such as scan_near or scan_boxIs text-based, does not support spatial scanning in the document image domain
Extract structured informationGenerate a list of every bullet point underneath the heading RisksFirst use the text-based high-variability technique above to find the right labels, then scan_belowMore to come on table extraction
Specify different output formatEnsure dates are in mm-dd-yy formatUDFMore native output format selection coming
ValidateEnsure expiry_date is after today’s dateUDFMore native validation formulas coming

Creating refiner programs

  1. In the header, click the initials icon and select Switch to advanced view.

  2. In the left sidebar, click the All apps icon, then select Refiner.

  3. Click Create classic refiner.

  4. Fill out the fields in the creation dialog.

Extracting fields

Selecting records to include

Use the records drawer to control which records appear in the output table and the run process.

  1. To open the records drawer, click Select Records. The Available Records drawer opens with all the available records.

  2. Click a record to preview the first 3 pages of the document.

  3. Select All or select one or more record checkboxes to include the selected records in the output table and the run process.

Selecting only a subset of records speeds up development because unselected records are excluded from the run process.

Defining text fields

To extract a text field:

  1. In the right panel, click + New Field.

    • Replace the provided field_ name with a self-describing unique field name, then press Enter or click outside of the field to apply the name change. Duplicate field names are not supported. For helper fields, prefix the field name with a double underscore (__). The double underscore is a naming convention that prevents helper fields from being generated in the output and downstream applications.
  2. In the bottom panel, define your field.

    • Leave the Field type with the default Text Field.

    • Optional: Select an Output type. The supported types are: No type, Text, Float, Integer, List, Image, Table, Dict. Defining a type lets you filter output display by the selected output type.

    • To use the Target Comparison feature, select a Target name to map the new field to a field in the targets file.

    • Optional: Add a Field description.

  3. Optional: To enable Target Comparison, move the Run with targets slider to the right.

  4. In the bottom panel, enter the refiner formula, and click Run Field.

  5. The results show for each field. If the Target Comparison feature is enabled for the run, the mapped fields are indicated with a purple bar in the records list and in the fields pane.

  6. Click Save.

Unsaved Changes displays below the Save button if refiner program changes are not saved.

When you start typing a refiner function, in-app documentation appears. To view the full formula list, select Help > Formula list.

Running fields

Fields in the field panel are processed in order with Run All:

  • Click Run All to run the refiner functions on all fields.

  • Click Run Field to run only the selected field.

Right-click a field to:

  • Move the field to the top
  • Move the field up
  • Move the field down
  • Move the field to the bottom
  • Duplicate the field
  • Create a field above
  • Create a field below

Extending refiner

Refiner might not support all your extraction requirements, particularly if you want to integrate specific business logic. You can extend capabilities by writing user-defined functions and referencing the script directory in the Settings panel.

Reference

Use refiner more efficiently by understanding its display and navigation options.

Layouts

Layouts provide different ways to view documents, records, fields, and output in refiner.

  • Document Shows document viewer and field list.

  • Split Shows output table, document viewer, and field list.

  • Table Shows output table.

  • Custom You can drag to show or hide panels and save the layout.

When the document viewer is shown, you can toggle between the document image and the extracted OCR text.

View options

Use the View menu to filter views.

  • Show hidden fields toggles the display in the output table. Hidden fields follow the field name prefix convention of a double underscore (__).

  • Show annotations for selected fields only toggles the display for annotations.

Keyboard shortcuts

To view in-app keyboard shortcuts, select Help > Keyboard shortcuts.

NameShortcutDescription
SaveCommand+S / Control+SSave the program
Run Current FieldCommand+Enter / Control+EnterRun the current field only
Run (all fields)Command+Shift+Enter or Control+Shift+EnterRun all fields
Formula ListCommand+/ or Control+/Display searchable formula list
Next RecordDown Arrow KeyGo to the next record row
Previous RecordUp Arrow KeyGo to the previous record row
Next FieldRight Arrow KeyGo to the next field
Previous FieldLeft Arrow KeyGo to the previous field

Troubleshooting

Tips on isolating and resolving problems with refiner.

  1. Refresh the page.

  2. Reselect the IBOCR/IBDOC folder with File > Open Folder to make sure the path is still valid.

  3. When selecting the input folder, make sure that the folder contains valid .ibdoc (IBDOC) files that do not contain refined_phrases. The input folder is typically in the project out/s2_map_records folder.

  4. Try creating a new refiner program by right-clicking the IBOCR/IBDOC folder that you want included. Create a new refiner program to verify the file system and the .ibdoc files. Make sure that the upstream resources are in the expected location.

You might get a An unexpected error occurred warning if files aren’t in the designated location.
  1. Open the JavaScript console, take a screenshot, and attach to a bug report. Provide details about what you were doing and enough information to help us reproduce the problem.

  2. Refresh the page.

You can see the error message directly in the cell, and that message can give you a clue.

Common errors:

  1. Make sure your refiner formula does not have double quotation marks ("), use a single quote instead (').

  2. Make sure your parentheses are well-matched.

  3. Make sure your regular expressions are valid and do the right thing. You can use an external website such as regexr.com to test your expressions.

  4. Make sure you’re providing the correct values for the parameters that the refiner functions accept. Use the in-app documentation for function usage information.

  5. If any UDFs are involved, make sure there are no errors.

To isolate problems in UDFs, you can log messages using the logging module:

1import logging
2logging.info('my message here')