Skip to content

Language coverage

Which languages Anomalica supports, how they were chosen, and what coverage they provide.

Anomalica publishes in 30 languages covering approximately 80% of the world’s literate population. Languages are selected algorithmically by incremental coverage of literate people, using open data from the Unicode Common Locale Data Repository.

At each step, the language covering the most currently-uncovered literate people is chosen. The selection script, source data, and output are published in the project repository.

Teal bars show incremental literate population covered by each language. The copper line tracks cumulative coverage. Faded bars are languages not in the supported set.

Each language ranked by how many additional literate people it covers beyond all higher-ranked languages.

#LanguageTotal speakersIncrementalCumulativeCoverage

The full ranking of 60 languages and the selection script are available in the project repository.

Editorial adjustments

Three languages from the algorithmic top 30 are excluded:

  • Javanese (rank 21, 32M) - primarily spoken; most literate speakers read Indonesian, supported at position 8
  • Malay (rank 27, 17M) - mutually intelligible with Indonesian in written form
  • Nigerian Pidgin (rank 30, 14M) - primarily spoken, not a standard written language for reference works

One language is added:

  • Ukrainian (rank 33, 13M) - the platform does not support Russian without Ukrainian during an active conflict between the two countries

28 translations produce 30 displayed languages. Traditional Chinese is a mechanical character conversion from Simplified Chinese, and American English is a spelling conversion from British English.

Translation quality

AI translation quality varies measurably by language. The WMT24 shared task (the standard academic benchmark for machine translation) found that large language models now outperform conventional translation systems across all 55 languages tested, but with a clear quality gradient.

WMT24 CometKiwi scores for English-to-target translation (higher is better, scale 0 to 1):

Language pairBest LLM scoreQuality
English to Japanese0.762Strong
English to Spanish0.745Strong
English to Russian0.742Strong
English to Ukrainian0.732Strong
English to German0.723Strong
English to Chinese0.726Strong
English to Hindi0.657Moderate

Hindi, the highest-resourced language in the Indic family, scores 0.06 to 0.10 lower than European and CJK languages. Languages with less training data (Burmese, Uzbek, Marathi) can be expected to show a larger gap, though published benchmark data for these specific languages is limited.

Based on available benchmarks, the supported languages fall into three quality tiers:

  • Strong: English, French, German, Spanish, Portuguese, Russian, Chinese, Japanese, Italian, Polish, Korean, Ukrainian
  • Moderate: Arabic, Hindi, Turkish, Vietnamese, Indonesian, Thai, Bengali, Urdu, Persian, Swahili, Tamil, Telugu, Tagalog, Marathi
  • Limited data: Burmese, Uzbek

For the third tier, published translation benchmarks are sparse. Meta’s NLLB-200 model scores Burmese lowest of all 28 languages on the FLORES benchmark (chrF++ 30.9 vs French at 69.6), though larger models perform substantially better than NLLB.

Translation corrections can be submitted through the content repository. Corrections are extracted as durable directives that persist across future article regeneration.

Sources: WMT24 General MT Ranking, WMT24++ 55-language expansion, NLLB-200 FLORES metrics.

Language

30 languages covering 77% of the world's literate population

English English English (US) English (US) Spanish Español Portuguese Português Indonesian Bahasa Indonesia French Français Swahili Kiswahili Vietnamese Tiếng Việt Turkish Türkçe German Deutsch Italian Italiano Uzbek Oʻzbekcha Polish Polski Tagalog Tagalog
Mandarin 中文 Traditional Chinese 繁體中文 Japanese 日本語 Korean 한국어
Arabic العربية Urdu اردو Persian فارسی
Russian Русский Ukrainian Українська
Hindi हिन्दी Bengali বাংলা Thai ไทย Burmese မြန်မာ Telugu తెలుగు Marathi मराठी Tamil தமிழ்