Supported languages

Document digitization in Converse and Build uses a variety of third-party OCR processors to achieve best results. Digitization settings are interdependent, so your digitization requirements—for example, print versus handwritten text—impact which languages are supported.

Default language set

By default, Converse and Build support digitization only of languages that use Latin characters (a, b, c…) in the Azure AI Vision Read container or Microsoft Read model, depending on your project settings and documents.

Common languages within these lists that use non-Latin characters—and thus aren’t supported by default—include Arabic, Bengali, Chinese Simplified, Chinese Traditional, Greek, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Urdu.

Standard non-Latin language set

The standard non-Latin language set in Build supports all languages in the Azure AI Vision Read container or Microsoft Read model, depending on your settings and documents.

Advanced non-Latin language set

The advanced non-Latin language set in Build fully supports all languages in the Google Cloud Vision API.

Compared to the standard non-Latin language set, the advanced set adds supports for Armenian, Bengali, Greek, Gujarati, Hebrew, Kannada, Khmer, Lao, Latvian, Macedonian, Malayalam, Tagalog, Tamil, Telugu, Thai, Ukrainian, Vietnamese, and Yiddish.

Was this page helpful?