Cleanup
You now know how to use the AI Hub SDK to automate document processing workflows and analyze documents by having conversations with them. Those use cases generate digital entities in AI Hub: input documents, batches, conversations, result files, and others. After your app run completes or your conversation finishes, all those entities hang around the AI Hub filesystem.
This use case shows you how to extend the programs you’ve already written so they clean up after themselves by deleting the files and data they generate. This strategy is helpful for complying with data retention policies and for general digital hygiene.
Add two cleanup steps to the program you wrote for the automate use case.
-
Delete the batch of input documents.
-
Delete the app run.
Add two more cleanup steps to the program you wrote for the analyze use case.
-
Delete a single document from the conversation.
-
Delete the entire conversation, including all remaining documents.
Cleaning up after the automate use case
To declutter the AI Hub filesystem after running an app, either delete individual files or delete the entire app run. In this section of the tutorial, start with the first approach by deleting just the batch of input documents. Then use the second approach by deleting whatever else was produced by the app run.
Deleting the batch
Think back to the program you wrote to run the Meal Receipt app. That program uploaded a batch of files with documents for the app to process. Unless you want to process that same batch with a different app later, delete the batch (including all the files within it) after Meal Receipt sends back results.
Add this code to the bottom of your existing automate_with_sdk.py
file.
Deleting a batch is an asynchronous operation: call client.batches.delete()
to ask for a batch to be deleted, then check the status of the delete job by calling client.jobs.status()
.
You might be surprised that the ID for the delete batch job is stored in a field called job_id
, while other IDs you’ve dealt with are stored in fields called id
. This brings up two important points.
- Like most SDKs, the AI Hub SDK has some naming inconsistencies.
- To learn what fields are available on an SDK method’s response, refer to the documentation for the appropriate method.
Run the modified automate_with_sdk.py
to see new output showing that the batch is deleted as soon as Meal Receipt is done processing its documents.
Deleting the app run
After retrieving the Meal Receipt results, clean up the app run data with client.apps.runs.delete()
. This removes the output, logs, and database records.
Add this code below the delete batch code.
This operation is similar to the batch deletion you used, except it returns data with a delete_output_dir_job_id
field. The ID in this field lets you check the progress of the delete job.
You know that the client.apps.runs.delete()
method deletes an app run’s output directory, but it deletes three other kinds of entities as well. Look at the method’s documentation and see if you can figure out what else it deletes. Hint: remember the code you added to delete the batch? Well, it wasn’t strictly necessary. (But it was a good learning exercise!)
It turns out the method also deletes the batch (which the documentation refers to as the app run’s input directory), any log files, and associated DB data.
To keep this sample code short, check the status of only the job that deletes the output directory. If you want to be more thorough, include separate loops to check the status of jobs that delete the batch and logs as well.
delete_db_data_job_id
field on the response from client.apps.runs.delete()
.Confirming the automate cleanup
If you’ve been adding code to automate_with_sdk.py
as it’s presented, your program deletes the batch and then deletes the entire app run. Confirm that the second step works by comparing the number of app runs that exist before and after the cleanup, using an SDK method called client.apps.runs.list()
.
Paste the snippet below into automate_with_sdk.py
in two separate places.
-
Immediately before you delete the batch (line 82 in the complete program below).
-
Immediately after you delete the batch (line 106 in the complete program below).
Run automate_with_sdk.py
. If the output shows that cleanup reduces the app run count by one, your cleanup logic works!
Here’s the complete automate_with_sdk.py
program, with all cleanup steps added.
You’ve seen how to delete unneeded artifacts after using AI Hub’s automate feature. Now learn how to clean up after using its analyze feature.
Cleaning up after the analyze use case
When you’re done having a conversation, it’s a best practice to delete the conversations documents and support files from AI Hub.
Start by deleting a single document from a conversation while leaving the conversation in place. Then perform more thorough cleaning by deleting an entire conversation, including all documents that were uploaded to it.
Deleting one document from the conversation
To delete just the A330_specs.pdf
document from your conversation, add this code to the end of analyze_with_sdk.py
.
Line 2 calls the SDK method that deletes individual documents from a conversation.
Line 3 passes in the ID of the conversation by reaching way back to the start of the program and referring to the create_conversation_resp
variable. This holds the response to your call to client.conversations.create()
, and it includes the conversation’s ID.
Line 4 passes in document IDs to delete. Even though you’re only asking it to delete a single document, that document ID still needs to be wrapped in a Python list. The first_document_id
variable was defined much earlier in your program.
There are two aspects of this code that might strike you as curious.
-
You don’t store the response of
client.conversations.delete_documents()
in a variable, like you have with other SDK methods. -
There’s no loop to check the status of the delete request.
Both of these quirks make sense when you understand that client.conversations.delete_documents()
is a synchronous operation. Synchronous methods generally don’t return anything, so there’s no response to store in a variable and therefore no ID to use with status checks. If the method returns without throwing any errors, the document has been successfully deleted from the conversation.
To figure out if an SDK method is synchronous or asynchronous, look at the documentation for the method to see if it returns an ID. If there’s no ID, the operation must be synchronous.
Deleting the entire conversation
If you’re finished with a conversation, you can delete the entire conversation with a single operation that also deletes any documents you’ve uploaded to it.
Add this code to analyze_with_sdk.py
just below the last code you added. This deletes all traces of the conversation, including the one remaining document that you uploaded to it.
Because the method doesn’t return a value, there’s no way to check its status. Therefore, you know it’s synchronous.
Confirming the analyze cleanup
Your program now deletes one document from the conversation and then deletes the entire conversation (including its one remaining document). How can you confirm that it worked?
This code prints the number of conversations that you have created and haven’t yet deleted. Paste the code into analyze_with_sdk.py
in two separate places.
-
Immediately before you delete the conversation (line 96 in the complete program below).
-
Immediately after you delete the conversation (line 104 in the complete program below).
Now run analyze_with_sdk.py
. The output confirms that there is one more conversation before cleanup than there is after.
Here’s the complete analyze_with_sdk.py
program, with all cleanup steps added.
Cleanup conclusion
You’ve covered all the cleanup that’s necessary for these two use cases. You might be surprised at how much longer the automate_with_sdk.py
and analyze_with_sdk.py
programs are after cleanup logic is added. As with exception handling, cleaning up after yourself is an important (if tedious) task for responsible programmers. Remember to leave plenty of time to add similar logic to your own SDK-enabled programs.
By adding new features to programs written earlier, you’ve experienced the common task of returning to code that you thought was complete but that now needs to be maintained. This task is easier when you’ve added thorough comments—such as you see in the complete examples here—to provide guideposts. It’s amazing how quickly uncommented code turns cryptic when you step away from it for a while, even when you were the original author.
The last page of this tutorial has a recap of what you’ve learned and guidance on where to go next on your AI Hub SDK journey.