Cleanup | Instabase AI Hub Documentation

You now know how to use the AI Hub SDK to automate document processing workflows and analyze documents by having conversations with them. Those use cases generate digital entities in AI Hub: input documents, batches, conversations, result files, and others. After your app run completes or your conversation finishes, all those entities hang around the AI Hub filesystem.

This use case shows you how to extend the programs you’ve already written so they clean up after themselves by deleting the files and data they generate. This strategy is helpful for complying with data retention policies and for general digital hygiene.

Add two cleanup steps to the program you wrote for the automate use case.

Delete the batch of input documents.
Delete the app run.

Add two more cleanup steps to the program you wrote for the analyze use case.

Delete a single document from the conversation.
Delete the entire conversation, including all remaining documents.

Cleaning up after the automate use case

To declutter the AI Hub filesystem after running an app, either delete individual files or delete the entire app run. In this section of the tutorial, start with the first approach by deleting just the batch of input documents. Then use the second approach by deleting whatever else was produced by the app run.

Deleting the batch

Think back to the program you wrote to run the Meal Receipt app. That program uploaded a batch of files with documents for the app to process. Unless you want to process that same batch with a different app later, delete the batch (including all the files within it) after Meal Receipt sends back results.

Add this code to the bottom of your existing automate_with_sdk.py file.

1 print(f"deleting batch with ID {batch_id}")
2 
3 # submit an asynchronous request to delete the whole batch
4 delete_batch_resp = client.batches.delete(batch_id)
5 
6 # repeatedly check status of delete batch job
7 while True:
8     # pause to let the delete batch job make progress
9     time.sleep(3)
10     # check status of the delete batch job
11     status_resp = client.jobs.status(delete_batch_resp.job_id)
12     print(f"delete batch status: {status_resp.state}")
13     # break out of the loop if the delete batch job no longer running
14     if status_resp.state not in ["RUNNING", "PENDING"]:
15         break

Deleting a batch is an asynchronous operation: call client.batches.delete() to ask for a batch to be deleted, then check the status of the delete job by calling client.jobs.status().

You might be surprised that the ID for the delete batch job is stored in a field called job_id, while other IDs you’ve dealt with are stored in fields called id. This brings up two important points.

Like most SDKs, the AI Hub SDK has some naming inconsistencies.

To learn what fields are available on an SDK method’s response, refer to the documentation for the appropriate method.

Run the modified automate_with_sdk.py to see new output showing that the batch is deleted as soon as Meal Receipt is done processing its documents.

Deleting the app run

After retrieving the Meal Receipt results, clean up the app run data with client.apps.runs.delete(). This removes the output, logs, and database records.

Add this code below the delete batch code.

1 print(f"deleting app run with ID {run_resp.id}")
2 delete_run_resp = client.apps.runs.delete(run_resp.id)
3 
4 # repeatedly check status of the job that deletes the output directory
5 while True:
6     time.sleep(3)
7     status_resp = client.jobs.status(delete_run_resp.delete_output_dir_job_id)
8     print(f"delete output dir status: {status_resp.state}")
9     if status_resp.state not in ["RUNNING", "PENDING"]:
10         break

This operation is similar to the batch deletion you used, except it returns data with a delete_output_dir_job_id field. The ID in this field lets you check the progress of the delete job.

You know that the client.apps.runs.delete() method deletes an app run’s output directory, but it deletes three other kinds of entities as well. Look at the method’s documentation and see if you can figure out what else it deletes. Hint: remember the code you added to delete the batch? Well, it wasn’t strictly necessary. (But it was a good learning exercise!)

It turns out the method also deletes the batch (which the documentation refers to as the app run’s input directory), any log files, and associated DB data.

To keep this sample code short, check the status of only the job that deletes the output directory. If you want to be more thorough, include separate loops to check the status of jobs that delete the batch and logs as well.

Unlike the other types of entities, DB data is deleted synchronously, so there’s no need to check its deletion status. That’s why there’s no delete_db_data_job_id field on the response from client.apps.runs.delete().

Running an app through a deployment instead of running it directly lets you configure the deployment’s data retention settings to handle cleanup automatically.

Confirming the automate cleanup

If you’ve been adding code to automate_with_sdk.py as it’s presented, your program deletes the batch and then deletes the entire app run. Confirm that the second step works by comparing the number of app runs that exist before and after the cleanup, using an SDK method called client.apps.runs.list().

Paste the snippet below into automate_with_sdk.py in two separate places.

Immediately before you delete the batch (line 82 in the complete program below).
Immediately after you delete the batch (line 106 in the complete program below).

1 # print number of app runs
2 list_runs_resp = client.apps.runs.list()
3 num_app_runs = len(list_runs_resp.runs)
4 print(f"number of app runs: {num_app_runs}")

Run automate_with_sdk.py. If the output shows that cleanup reduces the app run count by one, your cleanup logic works!

Here’s the complete automate_with_sdk.py program, with all cleanup steps added.

Complete automate_with_sdk.py with cleanup

1 # prepare to use standard Python libraries
2 import sys
3 import time
4 
5 # prepare to use the SDK
6 # and an exception that the SDK throws when authorization fails
7 from aihub import AIHub
8 from aihub.exceptions import UnauthorizedException
9 
10 # authorize the SDK
11 client = AIHub(api_root="PASTE YOUR API ROOT HERE",
12                api_key="PASTE YOUR API KEY HERE",
13                ib_context="PASTE YOUR IB-CONTEXT HERE")
14 
15 print("creating an empty batch")
16 try:
17     # make an empty batch with a specific name in a specific workspace
18     create_batch_resp = client.batches.create(
19         name="receipt batch",
20         workspace="SDK-Tutorial")
21 except UnauthorizedException:
22     # exit the program while printing a user-friendly error message and
23     # instructions on how to fix the problem
24     sys.exit("ERROR: SDK not authorized. "
25              "Are the API ROOT, API KEY, and IB-Context values correct?")
26 
27 # store batch_id in an easy-to-read variable, since we'll use it several times
28 batch_id = create_batch_resp.id
29 
30 print("uploading two files to the batch")
31 
32 # upload the first file to the batch
33 client.batches.add_file(batch_id=batch_id,
34                         file_name="receipt-a.jpg",
35                         file=open("PATH/ON/YOUR/COMPUTER/TO/receipt-1.jpg", "rb"))
36 
37 # upload a second file to the batch
38 client.batches.add_file(batch_id=batch_id,
39                         file_name="receipt-b.jpg",
40                         file=open("PATH/ON/YOUR/COMPUTER/TO/receipt-2.jpg", "rb"))
41 
42 print("running the app")
43 # trigger an app run, specifying which app, who wrote it, and which batch it should process
44 run_resp = client.apps.runs.create(app_name="Meal Receipt",
45                                    owner="Instabase",
46                                    batch_id=batch_id)
47 
48 print("checking the app status until it finishes")
49 while True:  # loop until explicitly told to leave the loop
50     time.sleep(3)  # pause a few seconds between each app status check
51     status_resp = client.apps.runs.status(run_resp.id)  # get the app status
52     print(f"status: {status_resp.status}")  # update the user on the app status
53     if status_resp.status not in ["PENDING", "RUNNING"]:  # these statuses mean the app is still running
54         break  # the app is done, so stop looping
55 
56 print("fetching the app results")
57 results_resp = client.apps.runs.results(run_resp.id)  # get the app results
58 
59 for file in results_resp.files:  # iterate across all processed files
60     print(f"file name: {file.original_file_name}")
61     for document in file.documents:  # iterate across all documents in a file
62         for field in document.fields:  # iterate across all fields in a document
63             print(f"{field.field_name}: {field.value}")  # print the field name and value
64         print("---")  # visual separator between files
65 
66 print(f"deleting batch with ID {batch_id}")
67 
68 # submit an asynchronous request to delete the whole batch
69 delete_batch_resp = client.batches.delete(batch_id)
70 
71 # repeatedly check status of delete batch job
72 while True:
73     # pause to let the delete batch job make progress
74     time.sleep(3)
75     # check status of the delete batch job
76     status_resp = client.jobs.status(delete_batch_resp.job_id)
77     print(f"delete batch status: {status_resp.state}")
78     # break out of the loop if the delete batch job no longer running
79     if status_resp.state not in ["RUNNING", "PENDING"]:
80         break
81 
82 # print number of app runs
83 list_runs_resp = client.apps.runs.list()
84 num_app_runs = len(list_runs_resp.runs)
85 print(f"number of app runs: {num_app_runs}")
86 
87 print(f"deleting app run with ID {run_resp.id}")
88 delete_run_resp = client.apps.runs.delete(run_resp.id)
89 
90 # repeatedly check status of the job that deletes the output directory
91 while True:
92     time.sleep(3)
93     status_resp = client.jobs.status(delete_run_resp.delete_output_dir_job_id)
94     print(f"delete output dir status: {status_resp.state}")
95     if status_resp.state not in ["RUNNING", "PENDING"]:
96         break
97 
98 # print number of app runs
99 list_runs_resp = client.apps.runs.list()
100 num_app_runs = len(list_runs_resp.runs)
101 print(f"number of app runs: {num_app_runs}")

You’ve seen how to delete unneeded artifacts after using AI Hub’s automate feature. Now learn how to clean up after using its analyze feature.

Cleaning up after the analyze use case

When you’re done having a conversation, it’s a best practice to delete the conversations documents and support files from AI Hub.

Start by deleting a single document from a conversation while leaving the conversation in place. Then perform more thorough cleaning by deleting an entire conversation, including all documents that were uploaded to it.

Deleting one document from the conversation

To delete just the A330_specs.pdf document from your conversation, add this code to the end of analyze_with_sdk.py.

1 # delete the first document (A330_specs.pdf) from the conversation
2 client.conversations.delete_documents(
3     conversation_id=create_conversation_resp.id,
4     ids=[first_document_id])

Line 2 calls the SDK method that deletes individual documents from a conversation.

Line 3 passes in the ID of the conversation by reaching way back to the start of the program and referring to the create_conversation_resp variable. This holds the response to your call to client.conversations.create(), and it includes the conversation’s ID.

Line 4 passes in document IDs to delete. Even though you’re only asking it to delete a single document, that document ID still needs to be wrapped in a Python list. The first_document_id variable was defined much earlier in your program.

There are two aspects of this code that might strike you as curious.

You don’t store the response of client.conversations.delete_documents() in a variable, like you have with other SDK methods.
There’s no loop to check the status of the delete request.

Both of these quirks make sense when you understand that client.conversations.delete_documents() is a synchronous operation. Synchronous methods generally don’t return anything, so there’s no response to store in a variable and therefore no ID to use with status checks. If the method returns without throwing any errors, the document has been successfully deleted from the conversation.

To figure out if an SDK method is synchronous or asynchronous, look at the documentation for the method to see if it returns an ID. If there’s no ID, the operation must be synchronous.

Deleting the entire conversation

If you’re finished with a conversation, you can delete the entire conversation with a single operation that also deletes any documents you’ve uploaded to it.

Add this code to analyze_with_sdk.py just below the last code you added. This deletes all traces of the conversation, including the one remaining document that you uploaded to it.

1 # delete the conversation and all remaining documents
2 client.conversations.delete(create_conversation_resp.id)

Because the method doesn’t return a value, there’s no way to check its status. Therefore, you know it’s synchronous.

Confirming the analyze cleanup

Your program now deletes one document from the conversation and then deletes the entire conversation (including its one remaining document). How can you confirm that it worked?

This code prints the number of conversations that you have created and haven’t yet deleted. Paste the code into analyze_with_sdk.py in two separate places.

Immediately before you delete the conversation (line 96 in the complete program below).
Immediately after you delete the conversation (line 104 in the complete program below).

1 # print number of existing conversations
2 list_conversations_resp = client.conversations.list()
3 num_conversations = len(list_conversations_resp.conversations)
4 print(f"number of existing conversations: {num_conversations}")

Now run analyze_with_sdk.py. The output confirms that there is one more conversation before cleanup than there is after.

Here’s the complete analyze_with_sdk.py program, with all cleanup steps added.

Complete analyze_with_sdk.py with cleanup

1 # prepare to use standard Python libraries
2 import sys
3 import time
4 
5 # prepare to use the SDK
6 from aihub import AIHub
7 
8 # authorize the SDK
9 client = AIHub(api_root="PASTE YOUR API ROOT HERE",
10                api_key="PASTE YOUR API KEY HERE",
11                ib_context="PASTE YOUR IB-CONTEXT HERE")
12 
13 print("making a conversation with one document")
14 create_conversation_resp = client.conversations.create(
15     name="Airbus conversation",
16     description="Analyze Airbus technical specs",
17     org="PASTE YOUR IB-CONTEXT HERE",
18     workspace="SDK-Tutorial",
19     files=["PATH/ON/YOUR/COMPUTER/TO/A330_specs.pdf"])
20 
21 print("checking the first document's processing status until it finishes")
22 # repeatedly check whether the one document has been processed, only
23 # leaving the loop when AI Hub reports that processing is done
24 while True:
25     time.sleep(3)  # wait 3 seconds between status checks
26     status_resp = client.conversations.status(create_conversation_resp.id)
27     print(f"status of processing the first document: {status_resp.state}")
28     if status_resp.state == "COMPLETE":
29         break
30 
31 # get the single document's ID from the last response to a status check
32 first_document_id = status_resp.documents[0].id
33 
34 print("sending a question to the first document")
35 # send a synchronous question about the one document,
36 # and print the answer
37 converse_resp = client.conversations.converse(
38     conversation_id=create_conversation_resp.id,
39     question="What airplane family are these specs for?",
40     document_ids=[first_document_id])
41 print(f"answer: {converse_resp.answer}")
42 
43 print("uploading the second document to the conversation")
44 add_documents_resp = client.conversations.add_documents(
45     conversation_id=create_conversation_resp.id,
46     files=["PATH/ON/YOUR/COMPUTER/TO/A350_specs.pdf"])
47 
48 print("checking the second document's processing status until it finishes")
49 # repeatedly check whether the second document has been processed, only
50 # leaving the loop when AI Hub reports that processing is done
51 while True:
52     time.sleep(3)  # wait 3 seconds between status checks
53     status_resp = client.conversations.status(create_conversation_resp.id)
54     print(f"status of processing second document: {status_resp.state}")
55     if status_resp.state == "COMPLETE":
56         break
57 
58 # indicate which AI Hub conversation or chatbot you want to
59 # asynchronously send a question to
60 source_app = {"type": "CONVERSE",
61               "id": create_conversation_resp.id}
62 
63 # "query", "prompt", and "question" are all synonyms in this context
64 query = "List all Airbus planes by ascending wingspan."
65 
66 print("sending a question to both documents")
67 # send an asynchronous question that AI Hub will use both of the
68 # conversation's documents to answer
69 run_query_resp = client.queries.run(query=query,
70                                     source_app=source_app)
71 
72 print("checking the status of the query processing until AI Hub has an answer")
73 # repeatedly check whether AI Hub has generated an answer to
74 # your question, only leaving the loop when an answer is available
75 while True:
76     time.sleep(3)  # wait 3 seconds between status checks
77     query_status_resp = client.queries.status(query_id=run_query_resp.query_id)
78     print(f"query status: {query_status_resp.status}")
79     # if it generated an answer, status will be "COMPLETE"
80     if query_status_resp.status == "COMPLETE":
81         break
82     # if something went wrong, status will be "FAILED" and the program should exit
83     if query_status_resp.status == "FAILED":
84         sys.exit("ERROR: AI Hub couldn't process this query. Check the logs.")
85 
86 # print all the answers (there might be just one) that are stored in the response
87 # to the last status query
88 for result in query_status_resp.results:
89     print(f"answer: {result.response}")
90 
91 # delete the first document (A330_specs.pdf) from the conversation
92 client.conversations.delete_documents(
93     conversation_id=create_conversation_resp.id,
94     ids=[first_document_id])
95 
96 # print number of conversations
97 list_conversations_resp = client.conversations.list()
98 num_conversations = len(list_conversations_resp.conversations)
99 print(f"number of conversations: {num_conversations}")
100 
101 # delete the conversation and all remaining documents
102 client.conversations.delete(create_conversation_resp.id)
103 
104 # print number of conversations
105 list_conversations_resp = client.conversations.list()
106 num_conversations = len(list_conversations_resp.conversations)
107 print(f"number of conversations: {num_conversations}")

Cleanup conclusion

You’ve covered all the cleanup that’s necessary for these two use cases. You might be surprised at how much longer the automate_with_sdk.py and analyze_with_sdk.py programs are after cleanup logic is added. As with exception handling, cleaning up after yourself is an important (if tedious) task for responsible programmers. Remember to leave plenty of time to add similar logic to your own SDK-enabled programs.

By adding new features to programs written earlier, you’ve experienced the common task of returning to code that you thought was complete but that now needs to be maintained. This task is easier when you’ve added thorough comments—such as you see in the complete examples here—to provide guideposts. It’s amazing how quickly uncommented code turns cryptic when you step away from it for a while, even when you were the original author.

The last page of this tutorial has a recap of what you’ve learned and guidance on where to go next on your AI Hub SDK journey.