Custom functions in flow
Flows let you create custom functions (or user-defined functions, UDFs) to implement custom functionality. There are four main types in a flow: map UDF, reduce UDF, pre-flow UDF, and post-flow UDF.
Map UDF
Use a map UDF to process each input file or record in a flow in parallel only when your processing logic operates on a single document.
Input variables
Output variables
Each output file dictionary in the out_files array represents one output file generated by the custom function. The custom function must return this dictionary as its return value for the output files to be passed to the next step in the flow.
Example
The following map UDF renames the input file.
Call this custom function in the Map UDF step by using the formula map_udf(INPUT_RECORD, STEP_FOLDER).
All input variables are also accessible via an object passed in as the _FN_CONTEXT_KEY keyword argument. See the code example below:
Reduce UDF
Reduce UDFs are used to combine the results of multiple output files from previous steps in a flow, and optionally apply additional logic to modify their values.
Input variables
Example
Generate a summary of the files processed and write it to a file named summary.json.
Call this custom function in the Reduce UDF step by using the formula generate_summary(INPUT_RECORDS, ROOT_OUTPUT_FOLDER, STEP_FOLDER, CLIENTS).
Pre-flow UDF
A pre-flow UDF is a hook that runs at the start of the flow before any of the steps have started execution. Pre-flow UDFs can be used to perform any necessary setup such as copying files into the input folder from another directory. You can add a pre-flow UDF in the flow editor by selecting Events > Pre-flow UDF.
Input variables
Example
The following custom function writes a summary file containing both the job ID and flow start timestamp.
Call this custom function in the pre-flow UDF hook by using the formula flow_info(CLIENTS, ROOT_OUTPUT_FOLDER, JOB_ID).
Post-flow UDF
A post-flow UDF is a hook that runs after the flow completes execution. Post-flow UDFs can be used to perform any post-processing tasks such as sending results to a downstream system or doing folder cleanup. You can add a post-flow UDF in the flow editor by selecting Events > Post-flow UDF.
Input variables
Example
In a post-flow UDF, you can consume results by reading the batch.ibflowresults file or using the API, and sending it to the downstream system. The following is an example you can use as a starting point to implement this integration.
Call this custom function in the post-flow UDF hook by using the formula send_results().
The example implements two functions: send_results and write_summary. A post-flow UDF is called every time a flow stops. This means the post-flow UDF is called when a flow completes and also when a checkpoint fails and the flow is stopped. The write_summary function reads the flow results file and checks if the flow has completed by checking the can_resume flag. If the flow has been completed, it proceeds to read the IBOCR records, construct a summary containing the extracted results and returns it. Complete the send_results function to send the returned summary to the downstream system.
