Supported languages
Language support in AI Hub varies based on processing mode. Agent mode supports a comprehensive set of languages across tiers. Legacy mode language support varies based on your digitization settings and subscription tier.
Agent mode
Agent mode uses next-generation models and supports more than 100 languages, including those with Latin and non-Latin characters. For the complete list of supported languages, see the Gemini language support documentation.
Legacy mode
Document digitization in legacy mode uses a variety of third-party OCR processors to achieve best results. Digitization settings are interdependent, so your digitization requirements—for example, print versus handwritten text—impact which languages are supported.
Default language set
By default, digitization is supported only for languages that use Latin characters (a, b, c…) in the Azure AI Vision Read container or Microsoft Read model, depending on your digitization settings and documents.
-
Default digitization settings — print text | handwritten text
-
With tables and checkboxes enabled — print text | handwritten text
Common languages within these lists that use non-Latin characters—and thus aren’t supported by default—include Arabic, Bengali, Chinese Simplified, Chinese Traditional, Greek, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Urdu.
Standard non-Latin language set
Commercial & EnterpriseThe standard non-Latin language set supports all languages in the Azure AI Vision Read container or Microsoft Read model, depending on your settings and documents.
-
Default digitization settings — print text | handwritten text
-
With tables and checkboxes enabled — print text | handwritten text
Advanced non-Latin language set
EnterpriseThe advanced non-Latin language set fully supports all languages in the Google Cloud Vision API.
Compared to the standard non-Latin language set, the advanced set adds supports for Armenian, Bengali, Greek, Gujarati, Hebrew, Kannada, Khmer, Lao, Latvian, Macedonian, Malayalam, Tagalog, Tamil, Telugu, Thai, Ukrainian, Vietnamese, and Yiddish.
