Supported languages

Language support in AI Hub varies based on processing mode. Agent mode supports a comprehensive set of languages across tiers. Legacy mode language support varies based on your digitization settings and subscription tier.

Agent mode

Agent mode uses next-generation models and supports more than 100 languages, including those with Latin and non-Latin characters. For the complete list of supported languages, see the Gemini language support documentation.

Legacy mode

Legacy mode applies only to projects that haven’t been updated to agent mode. All new projects and apps are created in agent mode by default.

Document digitization in legacy mode uses a variety of third-party OCR processors to achieve best results. Digitization settings are interdependent, so your digitization requirements—for example, print versus handwritten text—impact which languages are supported.

Default language set

By default, digitization is supported only for languages that use Latin characters (a, b, c…) in the Azure AI Vision Read container or Microsoft Read model, depending on your digitization settings and documents.

Common languages within these lists that use non-Latin characters—and thus aren’t supported by default—include Arabic, Bengali, Chinese Simplified, Chinese Traditional, Greek, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Urdu.

Standard non-Latin language set

Commercial & Enterprise

The standard non-Latin language set supports all languages in the Azure AI Vision Read container or Microsoft Read model, depending on your settings and documents.

Advanced non-Latin language set

Enterprise

The advanced non-Latin language set fully supports all languages in the Google Cloud Vision API.

Compared to the standard non-Latin language set, the advanced set adds supports for Armenian, Bengali, Greek, Gujarati, Hebrew, Kannada, Khmer, Lao, Latvian, Macedonian, Malayalam, Tagalog, Tamil, Telugu, Thai, Ukrainian, Vietnamese, and Yiddish.