File limitations and processing
AI Hub enforces maximum resource limits for files. These limits ensure system stability and optimal performance for all users.
- Single file — 50 MB or 800 pages
-
Total single-upload size — 100 MB
-
Conversations and chatbots — About 500 files, depending on content complexity.
-
Automation projects — 500 files
-
Automation app runs — 1,000 files
However, because of file processing constraints, particularly dense or complex documents can fail before these limits.
Supported file types
These file types are supported for import:
.bat
, .bashc
, .c
, .cc
, .chtml
, .cmake
, .cmd
, .cpp
, .cs
, .css
, .csv
, .cxx
, .cy
, .dockerfile
, .doc
, .docx
, .eml
, .gdoc
, .go
, .gsheet
, .gslides
, .h++
, .hpp
, .html
, .java
, .jpeg
, .jpg
, .js
, .json
, .mht
, .mhtml
, .mkfile
, .msg
, .pdf
, .perl
, .php
, .plsql
, .png
, .pptx
, .py
, .pxi
, .pyx
, .r
, .rd
, .rs
, .rtf
, .ruby
, .tif
, .tiff
, .ts
, .txt
, .xls
, .xlsx
, .xml
, .yaml
, .yml
, .zsh
.gdoc
, .gsheet
, .gslides
) are displayed in the file explorer but the files are converted to PDF when imported.In commercial and enterprise automation projects with file splitting enabled, multipage files can include multiple documents. For best results in all other projects and conversations, use one file for each document.
App run results can be exported in CSV or Excel format.
Digitization details
When you upload files to AI Hub, the default digitization process includes these steps.
-
Email attachments are separated and treated as individual files. Inline images are treated as part of the email body.
-
Google Drive files are converted to PDF.
-
PDF layers are flattened to include all text and image elements.
-
Optical character recognition (OCR) is performed on both typed and handwritten text.
-
Page rotation, skew, and warp are corrected.
-
Signatures, checkboxes, and barcodes—both numeric and non-numeric formats—are detected, and appropriate markers are added to the text space.
Spreadsheet limitations
Excel spreadsheets and CSV files are subject to these limitations.
Upload limitations
-
Files must be less than 10 MB.
-
Files can contain one large table up to 400 columns. Excel files can contain multiple small- to medium-sized tables on one sheet (totaling 200 rows and 30 columns).
Extraction limitations
Total extracted results are limited to 80,000 cells, for example:
-
If extracting 400 columns, you can retrieve up to 200 rows (400 × 200 = 80,000).
-
If extracting 10 columns, you can retrieve up to 8,000 rows (10 × 8,000 = 80,000).
You can adjust the number of columns and rows as needed within the 80,000 cell limit.
Unsupported features
-
Advanced features such as macros and data validation.
-
Triangular or nested tables.
-
Tables with multi-row or frozen headers.
-
Tables with empty rows or columns.
Understanding file processing constraints
AI Hub processes files using a distributed system architecture where multiple files are processed simultaneously. Simultaneous processing is more efficient, but it also means that your file’s success depends partly on overall system load.
Files within stated limits might still fail because of:
-
Concurrent processing load — Large files being processed simultaneously can exceed available memory, even when individual files meet size requirements.
-
Content complexity — Files with compressed images, dense data, or complex layouts require more processing resources than plain text documents. Examples include PDFs with compressed images, dense spreadsheets with extensive data or complex formulas, multi-layered documents with embedded objects or charts, and handwritten or low-quality scanned documents.
-
Processing time constraints — Very large files with many pages can exceed processing time limits.
-
File format specifics — Certain formats (such as Excel files with extensive data or PDFs with embedded objects) are more resource-intensive to process.
Token limits
When uploading files, some areas of AI Hub enforce a file-based upload limit while others enforce a total token upload limit. Tokens are the fundamental unit by which LLMs process text. Each token represents a piece of text, such as a whole word, part of a word, or a character. This means the density of information in an uploaded document affects the number of tokens required to encode that information. For example, 500 sparsely populated documents might be encoded in fewer tokens than 50 densely populated documents.
Factors that affect the number of tokens required to encode a document include language and complexity of content. As a guideline, for English text, one token encodes approximately four characters.
API file limits
Some file upload limits can be bypassed by using the AI Hub API instead of the user interface:
-
Conversations — The 500-file limit doesn’t apply when creating conversations by API. You can upload any number of files up to the 4 million token limit.
-
Batch processing — The upload file to batch operation lets you upload and manage input files for automation app runs. Individual files in batches have a suggested maximum size of 10 MB, with multipart file upload available for larger files.
-
Adding files to existing conversations — You can add documents to conversations without UI upload restrictions.
Best practices for successful uploads
For best results, optimize your documents before processing them.
-
Split large documents—For documents over 20 MB, consider splitting into smaller sections when possible.
-
Use high-quality scans for images—Aim for a minimum scanning or capture resolution of 300 DPI.
-
Simplify complex spreadsheets—Remove unnecessary formatting, charts, or macros that increase processing complexity.
-
Process during off-peak hours—Large or complex files are more likely to process successfully when system load is lower.
Troubleshooting upload failures
If your file meets the stated limits but still fails to upload or process, try these fixes.
-
Upload during off-peak hours when system load is lower.
-
Simplify the document by removing unnecessary images, charts, or complex formatting.
-
Split large documents into smaller sections.
-
Check file content density for compressed images or complex data that might exceed processing capacity.
-
Verify file format compatibility — some file variants or corrupted files might not process correctly.