Cleaning results

If results for a given field aren’t formatted as needed, you can clean data using quick clean options. For organization members, additional cleaning options include a natural language prompt or a custom cleaning function.

Quick clean

Quick clean options include:

  • Changing character casing to all uppercase, all lowercase, or sentence case.

  • Removing characters by specifying characters to remove, with no comma separator.

  • Reformatting date by selecting from available formatting options.

    In numeric-only dates like 06/01/2024, use the Input field as needed to specify whether the original value lists the month or the day first.

Quick clean options use Python functions to reformat data. There is no model processing and no unit cost.

Cleaning prompt

Commercial & Enterprise

Use cleaning prompts to standardize results after extraction when quick clean options aren’t sufficient for your formatting needs.

Prompt-based refinement takes the raw output of your extraction or reasoning prompt and applies instructions you specify.

Follow these best practices to write effective cleaning prompts.

  • Separate extraction from formatting. Extract the data first, then apply cleaning to standardize the output. Avoid including formatting requirements in your original field prompts.

  • Specify formatting patterns with clear instructions and examples, if needed.

    Return the phone number in the format (XXX) XXX-XXXX. Remove any extensions or additional text.

  • Include before and after examples for varied input formats.

    Standardize address format to street, city, state ZIP. For example:
    123 Burnside St, Portland OR 97201 β†’ 123 Burnside St, Portland, OR 97201
    456 Elm Avenue, Charlotte, North Carolina 28105 β†’ 456 Elm Ave, Charlotte, NC 28105

  • Handle missing or invalid data.

    If the field contains a valid dollar amount, return it in the format $X.XX. If no amount is found or the value is invalid, return β€œNo amount specified.”

  • Normalize reference fields as text.

    Convert each list item to the format: Item: [Item], Price: [Price], Qty: [Quantity] and separate each item with a semicolon.

Cleaning function

Commercial & Enterprise

For advanced cleaning requirements involving complex transformations, external data, or mission-critical results, write a custom cleaning function in Python.

When working with JSON data, use json.loads(previous_line) to parse it.

For example, you might use a cleaning function to standardize phone numbers from various formats:

1def clean_phone_number(previous_line, context):
2 """
3 Standardizes phone numbers to (XXX) XXX-XXXX format.
4 A simple example of a custom cleaning function that handles
5 formatting beyond what quick clean options provide.
6 """
7 # Skip processing if input is empty or None
8 if not previous_line:
9 return previous_line
10
11 # Extract only digits from the input
12 digits = ''.join(char for char in previous_line if char.isdigit())
13
14 # Format 10-digit numbers as (XXX) XXX-XXXX
15 if len(digits) == 10:
16 return f"({digits[0:3]}) {digits[3:6]}-{digits[6:10]}"
17
18 # Return original for non-standard numbers
19 return previous_line

Cleaning functions accept these parameters:

ParameterRequired?Description
previous_lineRequiredRepresents the value of the preceding cleaning line, or the extraction value if the custom function is the first cleaning line.
contextRequiredStores metadata about the document.
context['document_text']OptionalRetrieves the entire text of the document.
context['file_path']OptionalRetrieves the path to the uploaded file.
keysOptionalAccess custom variables and organization secrets. Use keys['custom']['<key-name>'] for custom keys and keys['secret']['<key-name>'] for secret keys.
<additional-field-name>OptionalWhen writing custom functions in automation projects, click Add argument to select additional fields in the class to use in the function. Because fields are extracted sequentially, referenced parameters must precede the current field in the editing panel.
If necessary, reorder fields using the up and down arrows that display in the field editor when you hover over a field.

Cleaning functions can return any value. The value is converted to a string when it’s passed to subsequent refinement lines or validation rules. If the cleaning function encounters issues, it must raise an exception.

For additional guidance about custom functions, see Writing custom functions.