User talk:LZia (WMF)/Trip reports

From Meta, a Wikimedia project coordination wiki

On smaller languages (Wiki Indaba 2018)[edit]

Some statistics (which are from reasonably reliable sources and so should be the right order of magnitude) and a bit of commentary:

  • Ethnologue (the best source for data on most smaller languages) lists about 6900 living languages. About 6% have 1M+ speakers, and cover 94% of the world’s population. [1]
    • That’s a little over 400 languages. According to English Wikipedia, the top 100 languages have 5.6B speakers and cover 85% of the world’s population, so those numbers seem reasonable.
    • It isn’t clear, though, that either set of stats accounts for multilingualism across world languages. How many speakers of Spanish also speak English or Portuguese? There’s probably some multiple counting in there, but the top 100 to 200 languages do cover a lot of people.
  • In North America, there are about 165 indigenous languages, but only 8 have 10K+ speakers and about 75 are on the brink of dying out.[2] This is generally representative of the situation globally.
    • About half of all languages have fewer than 10K speakers, and about a quarter have fewer than 1K speakers, [3] and about half are expected to die out in the next century. [4]
    • Language revitalization is also very hard. There are many success stories, but also many failures.

Sadly, I don’t think we should make saving/revitalizing languages a core part of our mission, because it is time and resource intensive and comes down to the motivation and availability of the remaining speakers, those who want to save the language, and the kids who will have to want to learn the language. It’s a really complex problem—hence the skepticism in the “Criticism” section of the link above.

That said, we shouldn’t ever stop or even slow efforts where there is a passionate language community—or even one passionate individual—working to build knowledge repositories. And so, it would be enlightening to see what about our platforms could be improved to help smaller languages and smaller projects.

And while language and culture are often closely linked, the link is not inextricable. It is certainly possible to document cultural information in another language. Maybe Wikipedia is not the right venue—I know there are ongoing debates about adapting the norms of Wikipedia in some cases to accommodate information that is not documented in traditional “reliable” sources—but maybe Wikisource (or a different set of tools based on Mediawiki) would be a better alternative. Documenting cultural information in the original language, with a translation into a major world language, available for free on the internet could be a valuable resource for language revitalization efforts, and for preserving cultural information when language death is likely or inevitable. (However, issues of indigenous intellectual property can get complicated, and other people have thought more about preserving such cultural information—which brings me back to the idea of figuring out how our platform fails to help such efforts and making changes that would most improve the situation with some practical level of effort.)

Another approach that I’ve advocated for from time to time—and which I think feeds into improving knowledge equity—is doing research to understand where the most benefit is to be had for multilingual speakers. So, rather than trying to get information to everyone in their most preferred language (which is a practical impossibility for several reasons), try to make information available in the languages understood by the most people, and make it accessible to non-native but competent speakers. I wrote this up in more detail in my 2018 Dev Summit position statement.

A few other small things:

  • It would also help to define what counts as a small language for this discussion—10K speakers, 1K, fewer?—and what qualifies as a minimally active project—1 active contributor, 10, 100? (These aren't necessarily for you to answer, but just definitions that probably need to be agreed to in order to discuss anything in real detail.)
  • I’m also curious where the “2000 actively spoken languages” stat comes from. That’s interesting and would be useful to investigate with more context.

Thanks for writing up your notes. It was a stimulating read! —Trey Jones (WMF) (talk) 19:54, 27 March 2018 (UTC)[reply]