Supported languages
Document digitization in Converse and Build uses a variety of third-party OCR processors to achieve best results. Digitization settings are interdependent, so your digitization requirements—for example, print versus handwritten text—impact which languages are supported.
Default language set
By default, Converse and Build support digitization only of languages that use Latin characters (a, b, c…) in the Azure AI Vision Read container or Microsoft Read model, depending on your project settings and documents.
-
Default digitization settings — print text | handwritten text
-
With tables and checkboxes enabled — print text | handwritten text
Common languages within these lists that use non-Latin characters—and thus aren’t supported by default—include Arabic, Bengali, Chinese Simplified, Chinese Traditional, Greek, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Urdu.
Standard non-Latin language set
The standard non-Latin language set in Build supports all languages in the Azure AI Vision Read container or Microsoft Read model, depending on your settings and documents.
-
Default digitization settings — print text | handwritten text
-
With tables and checkboxes enabled — print text | handwritten text
Advanced non-Latin language set
The advanced non-Latin language set in Build fully supports all languages in the Google Cloud Vision API.
Compared to the standard non-Latin language set, the advanced set adds supports for Armenian, Bengali, Greek, Gujarati, Hebrew, Kannada, Khmer, Lao, Latvian, Macedonian, Malayalam, Tagalog, Tamil, Telugu, Thai, Ukrainian, Vietnamese, and Yiddish.