Building of the Tower of Babel, a medieval illuminated miniature by the Master of the Duke of Bedford.
A few weeks ago, we sent out a call for focus languages for improving the lexicographic extension of Wikidata, and so for Abstract Wikipedia.

Over the last few weeks, we have received submissions from nine language communities: Bengali, Dagbani, Danish, Esperanto, French, Hausa, Igbo, Malayalam, and Russian. We want to thank all of these communities for their trust and all of the submitters for their work.

The Wikidata team at Wikimedia Deutschland and the Wikifunctions and Abstract Wikipedia team at the Wikimedia Foundation have deliberated and discussed the options over the last few days. Following our own criteria, we chose the following four languages as focus languages (plus one stretch language). Lydia Pintscher and Denny Vrandečić announced the languages at the closing event of the 30 Lexic-o-days 2021 on Wednesday, 14 April:

  • Bengali [bn] is an Indo-European language (belonging to the Eastern Indo-Aryan branch), spoken in Bangladesh and India, there mostly in the eastern Indian states of West Bengal and Tripura. Bengali is spoken by more than 220 million native speakers, making it the fifth most-spoken language in the world. The Bengali Wikipedia has more than 100,000 articles and more than 1,700 active contributors. Like a few other languages in the region, it is written using the Bengali-Assamese script, an abugida and Brahmic script. Bengali is easily the most widely-spoken language and the largest Wikipedia community among the focus language communities. (See more on Wikipedia: Bengali language.)
  • Malayalam [ml] is a Dravidian language, spoken mainly in the southern Indian state of Kerala. Malayalam is spoken by more than 30 million native speakers. The Malayalam Wikipedia has more than 70,000 articles and about 300 active contributors. Of note is also the Malayalam Wiktionary, which has more than 130,000 entries, outnumbering Wikipedia in Malayalam. Malayalam is written in the Malayalam script, which, like Bengali-Assamese, is an abugida and Brahmic script. The Malayalam community has been active on Wikidata, with numerous local identifiers and good usage of the data in Wikipedia. (See more on Wikipedia: Malayalam language.)
  • Hausa [ha] is an Afroasiatic language (belonging to the Chadic branch) and an official language in Nigeria, Ghana, and Niger. The estimates about native speakers range between 50 and 150 million. It is the most important indigenous lingua franca in West and Central Africa. Hausa Wikipedia has more than 8,000 articles and about 50 active contributors. Whereas Hausa was historically written in an Arabic alphabet, today it is mostly written in a Latin-based alphabet. The community has used Wikidata in infoboxes and has brought a number of modules and templates to Hausa. (See more on Wikipedia: Hausa language.)
  • Igbo [ig] is a Niger-Congo language (belonging to the Volta-Niger branch) spoken mainly in south-eastern Nigeria with about 45 million native speakers. Igbo Wikipedia has about 2,000 articles and about 50 active contributors. Igbo is using a Latin-based script, although it has both an interesting history with the Nsibidi ideograms and possibly an interesting future with the Ndebe script. The community is using the ArticlePlaceholder extension in the Igbo Wikipedia. (See more on Wikipedia: Igbo language.)
  • Dagbani [dag] is a Niger-Congo language (belonging to the Gur branch) spoken in northern Ghana. Dagbani has about 3 million native speakers. Dagbani Wikipedia is still in the Wikimedia Incubator, and already has more than 400 articles. It is written in a Latin-based alphabet. Whereas Dagbani didn’t fulfill some of our criteria, the community that applied was very enthusiastic. We regard Dagbani as our stretch goal: it will be instructive to learn whether we can achieve our goals even for such a small language community. (See more on Wikipedia: Dagbani language.)

Additionally, we will use English [en] as a demonstration language, in order to showcase the developed features to a wider audience.

We will continue to listen for input from all language communities and seek conversation with all communities. The point of the focus languages is to be able to reach out to some communities quickly, for testing out designs and prototypes, and to see if certain features also work beyond the languages the developers can test easily.

Thanks again to everyone who participated in the process and who helped to run it smoothly!