Cleaning results
Cleaning transforms extracted data into standardized formats for downstream systems.
Agent mode
Agent mode typically handles basic formatting directly in field prompts, reducing the need for separate cleaning steps. Use cleaning for complex transformations or when standardizing data for specific downstream requirements.
Legacy mode
Legacy mode often requires cleaning to achieve consistent formatting and structure.
Quick clean
Quick clean options include:
-
Changing character casing to all uppercase, all lowercase, or sentence case.
-
Removing characters by specifying characters to remove, with no comma separator.
-
Reformatting date by selecting from available formatting options.
In numeric-only dates like 06/01/2024, use the Input field as needed to specify whether the original value lists the month or the day first.
Quick clean options use Python functions to reformat data. There is no model processing and no unit cost.
🎓 Visual tutorial: Cleaning results with quick clean
Cleaning prompt
Commercial & EnterpriseCleaning prompts refine field results using natural language instructions.
Follow these best practices to write effective cleaning prompts.
-
Include before and after examples for varied input formats.
Standardize address format to street, city, state ZIP. For example:
123 Burnside St, Portland OR 97201 → 123 Burnside St, Portland, OR 97201
456 Elm Avenue, Charlotte, North Carolina 28105 → 456 Elm Ave, Charlotte, NC 28105 -
Handle missing or invalid data.
If the field contains a valid dollar amount, return it in the format $X.XX. If no amount is found or the value is invalid, return “No amount specified.”
-
Normalize reference fields as text.
Convert each list item to the format: Item: [Item], Price: [Price], Qty: [Quantity] and separate each item with a semicolon.
Cleaning function
Commercial & EnterpriseFor advanced cleaning requirements involving complex transformations, external data, or mission-critical results, write a custom cleaning function in Python.
json.loads(previous_line) to parse it.For example, you might use a cleaning function to standardize phone numbers from various formats:
Cleaning functions accept these parameters:
Cleaning functions can return any value. The value is converted to a string when it’s passed to subsequent refinement lines or validation rules. If the cleaning function encounters issues, it must raise an exception.
For additional guidance about custom functions, see Writing custom functions.
