Cleanup
You now know how to use the AI Hub SDK to automate document processing workflows. This use case generates digital entities in AI Hub: input documents, batches, result files, and others. After your app run completes, all those entities hang around the AI Hub filesystem.
This use case shows you how to extend the programs you’ve already written so they clean up after themselves by deleting the files and data they generate. This strategy is helpful for complying with data retention policies and for general digital hygiene.
Add two cleanup steps to the program you wrote for the automate use case.
-
Delete the batch of input documents.
-
Delete the app run.
Cleaning up after the automate use case
To declutter the AI Hub filesystem after running an app, either delete individual files or delete the entire app run. In this section of the tutorial, start with the first approach by deleting just the batch of input documents. Then use the second approach by deleting whatever else was produced by the app run.
Deleting the batch
Think back to the program you wrote to run the Meal Receipt app. That program uploaded a batch of files with documents for the app to process. Unless you want to process that same batch with a different app later, delete the batch (including all the files within it) after Meal Receipt sends back results.
Add this code to the bottom of your existing automate_with_sdk.py file.
Deleting a batch is an asynchronous operation: call client.batches.delete() to ask for a batch to be deleted, then check the status of the delete job by calling client.jobs.status().
You might be surprised that the ID for the delete batch job is stored in a field called job_id, while other IDs you’ve dealt with are stored in fields called id. This brings up two important points.
- Like most SDKs, the AI Hub SDK has some naming inconsistencies.
- To learn what fields are available on an SDK method’s response, refer to the documentation for the appropriate method.
Run the modified automate_with_sdk.py to see new output showing that the batch is deleted as soon as Meal Receipt is done processing its documents.
Deleting the app run
After retrieving the Meal Receipt results, clean up the app run data with client.apps.runs.delete(). This removes the output, logs, and database records.
Add this code below the delete batch code.
This operation is similar to the batch deletion you used, except it returns data with a delete_output_dir_job_id field. The ID in this field lets you check the progress of the delete job.
You know that the client.apps.runs.delete() method deletes an app run’s output directory, but it deletes three other kinds of entities as well. Look at the method’s documentation and see if you can figure out what else it deletes. Hint: remember the code you added to delete the batch? Well, it wasn’t strictly necessary. (But it was a good learning exercise!)
It turns out the method also deletes the batch (which the documentation refers to as the app run’s input directory), any log files, and associated DB data.
To keep this sample code short, check the status of only the job that deletes the output directory. If you want to be more thorough, include separate loops to check the status of jobs that delete the batch and logs as well.
delete_db_data_job_id field on the response from client.apps.runs.delete().Confirming the automate cleanup
If you’ve been adding code to automate_with_sdk.py as it’s presented, your program deletes the batch and then deletes the entire app run. Confirm that the second step works by comparing the number of app runs that exist before and after the cleanup, using an SDK method called client.apps.runs.list().
Paste the snippet below into automate_with_sdk.py in two separate places.
-
Immediately before you delete the batch (line 82 in the complete program below).
-
Immediately after you delete the batch (line 106 in the complete program below).
Run automate_with_sdk.py. If the output shows that cleanup reduces the app run count by one, your cleanup logic works!
Here’s the complete automate_with_sdk.py program, with all cleanup steps added.
Cleanup conclusion
You’ve covered all the cleanup that’s necessary for this use case. You might be surprised at how much longer the automate_with_sdk.py program is after cleanup logic is added. As with exception handling, cleaning up after yourself is an important (if tedious) task for responsible programmers. Remember to leave plenty of time to add similar logic to your own SDK-enabled programs.
By adding new features to programs written earlier, you’ve experienced the common task of returning to code that you thought was complete but that now needs to be maintained. This task is easier when you’ve added thorough comments—such as you see in the complete examples here—to provide guideposts. It’s amazing how quickly uncommented code turns cryptic when you step away from it for a while, even when you were the original author.
The last page of this tutorial has a recap of what you’ve learned and guidance on where to go next on your AI Hub SDK journey.
