Cleaning results
If results for a given field aren’t formatted as needed, you can clean data using quick clean options. For organization members, additional cleaning options include a natural language prompt or a custom cleaning function.
Quick clean
Quick clean options include:
-
Changing character casing to all uppercase, all lowercase, or sentence case.
-
Removing characters by specifying characters to remove, with no comma separator.
-
Reformatting date by selecting from available formatting options.
In numeric-only dates like 06/01/2024, use the Input field as needed to specify whether the original value lists the month or the day first.
Quick clean options use Python functions to reformat data. There is no model processing and no unit cost.
Cleaning prompt
Commercial & EnterpriseIf quick clean options don’t work for your data, you can instead use a natural language prompt to clean the output. Prompt-based refinement takes the raw output of your extraction or reasoning prompt and applies whatever instructions you specify in the clean prompt. Effective clean prompts are clear, concise, and detailed.
Cleaning function
Commercial & EnterpriseFor advanced cleaning, you can write a custom cleaning function in Python.
json.loads(previous_line)
to parse it.For example, you might use a cleaning function to standardize phone numbers from various formats:
Cleaning functions accept these parameters:
Cleaning functions can return any value. The value is converted to a string when it’s passed to subsequent refinement lines or validation rules. If the cleaning function encounters issues, it must raise an exception.
For additional guidance about custom functions, see Writing custom functions.