LinguaLibre/2022 Review

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Status: draft staled. (Irrelevant as long as tech side is not fixed). The page below is mainly a strategic review of existing assets (human resources, knownledge base, data), competitions, comparative advantages and SWOT.

On LinguaLibre, we are starting to design a broad 6 months long PR effort campaign. I wonder if such overall PR campaign based on mailing to popular languages and tech blogs/newsrooms, creation of base PR materials followed by translations, and writing of ~8 needed downstream grant requests could itself be funded by some 50k$ Grant fund ?

Current state

We have a lot to do, thousands languages, most of them are on the decline. We do not want to stay solely on major languages, we have to reach out to very small language early to print marginalized languages and their conservation into our DNA and brand.

Current state[edit]

Assets Human resources
Type Aspects or Lingualibre:Roles Score Importance People Hrs/mths Bus factor
Technology Core technology: Recording Studio. Backlog of:
  • workflow features,
  • valorizing our assets features
  • UX-damaging bugs : ratelimit, online only, …
6/10–correct Critical n.a. 10h - tests n.a
Technology Lists management system 7/10–correct Medium n.a. 10h n.a
Documented code Github repositories 9/10–good High 3 1h Pass
Tickets Phabricator tickets 8/10–good Medium 2 3h Pass
Wikipages Existing Help: pages 7/10–correct High n.a. 10h n.a
Human Resources Developers, back — MediaWiki, server 6/10–stable Critical 1+ 6h Fragile
Human Resources Developers, front — CSS, HTML, JS 2/10–insufficient High 2 3h Fragile
Human Resources Editors on maintenance, expansion (not discussions) 8/10–correct Medium 3 40h Pass
Human Resources Editors on welcoming, onboarding 5/10–basic Medium 3 5h Fragile
Human Resources Editors on strategy, grants requests writing 4/10–basic High 2 10h Fragile
Human Resources Editors on communication, outreach, PR 1/10–insufficient High 0 1h Fails
Institutional resources Staff Wikimedia France support 8/10–correct High 2 ? Fragile
Funding flow Funding for development : when critically needed. 4/10–basic High n.a. n.a. Fragile
TOTAL ~5 Fragile
Status score integrates importance, existing resources, fragility, and opportunity cost. Critical server side is well managed by volunteer and staff, but bus factor stays too weak, resulting in modest 6/10 score.

Risk/opportunity: We *must* use current calm to double the community and ensure long term HR and know-how's sustainability.

Distribution

LinguaLibre linguistic coverage is expanding, first with major Western languages, then toward other large languages and minority languages in Western countries, and last toward non-Western marginalized communities. This is a starting point.

Current distribution of human language families, with 9 dominant families.
Demographic[1] World Ratio Lili Coverage Supported language's profile Examples Community's presence
Major (>30M) 30 0.5% 10 33% Mostly major Western or Indian languages. FRA, SPA, BEN Solid: Several productive speakers. Sustained or periodic.
Large (1~30M) 350 5% 80 23% Mostly Western languages, other notable languages NLD, AFR, CAT Emerging: One productive speaker, few not-retained speakers. Fragile.
Marginalized (<1M) 6500 94% 40 <1% Mostly larger minorities in Western countries. ATJ, BRE, EUS Contact point: No productive speaker, one not-retained speaker. Below fragile.

Risk/opportunity: we must raise awareness, seed, demo, train to Lingualibre among true marginalized communities on the 7 Wikimedia regions. So know-how is moved away from West nerds and into true local, linguistically at risk communities.

PR outreach[edit]

On PR side, we observe the following fronts and opportunities :

Type Description M.L. Workload Budget Comment
Communication Overall outreach campaign seeding the idea of LinguaLibre into diverse community, geographically dispersed.
Cause new influx of advocates, speakers, event organizers, devs.
Ensures bus factor is not a risk anymore = secures project's sustainability.
n.a. 7 months 26k USD
Sub-projects
Outreach/External Email campaign to dozens of language blogs and newsrooms.
Email exchanges, coordination, interviews, copyedit with their authors
n.a. 4 months 16k USD
Outreach/Toolkit Improve base PR materials with: base emails ; base presentations ; Base flyers. n.a. 1 month 4k USD
Outreach/Medias Coordinating and hiring to create testimony video material (example) n.a. 1/2 month 2k USD – coordinator

2k USD – filming day

Could reuse existing short online videos on language diversity ?
Outreach/Crowdsourcing Create a KissKissBangBang crowdsourcing campaign on the model of WikiCheese. n.a. 1 month 4k USD Objectif is equaly PR by increasing awareness and raising some public money. Helps assess this avenue of autonomous future funding via crowdsourcing.
Outreach/Translations Coordinating translation of refined base documents from EN to major languages
Languages: ES, FR, AR, RU, HI, ZH, Indonesian, Swahili (language spheres with largest language diversity).
n.a. 1/2 months 0€ Wikimedians lead translations.

Associated funding requests[edit]

There also are opportunities in writing funding requests and initiating the following :

Type Description M.L. Duration (est.) Budget (est.) Comment
Coordination Overall strategy, coordination, planning and multiple fund requests.
Get things on rails and rolling at the required speed.
Strategies sub-project : technologies, proof of concepts, communication.
n.a. 8 months 32k USD This position is leading the writing of the fund requests below.
Technical improvements.
WMF Technical Fund 1 LinguaLibre backlog features requests and improvements n.a. 3 months 15k EUR API: Sparql. Front: CSS, JS, Vuejs, MediaWiki, PHP.
WMF Technical Fund 2 Anki e-learning plugin generator n.a. 1 month 5k EUR API: Sparql. Front: CSS, HTML, JS.
WMF Technical Fund 3 Integrated e-learning webpage for words n.a. 1~2 months 5~10k USD API: Sparql. Front: HTML, CSS, JS, VueJS.
WMF Technical Fund 4 Unilex 1000 languages lexical database update n.a. 1~2 months 5~10k USD Python.
WMF Technical Fund 5 Dashboard for linguistic coverage.
  • Visualize LinguaLibre coverage vs linguistic world heritage.
  • Emphasis marginalized communities as our core partners (+90%).
  • Emphasis need to outreach and serve those smaller languages communities.
  • Will help redefine LinguaLibre.
n.a. 1~2 months 5~10k USD API: Sparql. Front: HTML, CSS, JS, D3js.

Will help redefine LinguaLibre.

Total: 35-50k USD
Field outreach : training local community, recording 5000 words.
WMFR Micro-fi 1 Field outreach marginalize language(s) — Region W. Europe "France's Whistled Gascon (1)" 1 4 days 500 EUR Volunteer in that region, implies minimal travel, hosting costs.
WMF Community Fund 1 Field outreach marginalize language(s) — Region Lat. America "Peru Amerindians languages (3)" 3 2 months 6~8k EUR Partnership with Aquaverde.
WMF Community Fund 2 Field outreach marginalize language(s) — Region US/Canada "Canada Amerindians languages (3)" 3 2 months 6~8k EUR Partnership with Wikimedia Canada and Atikamekw's wiki.
WMF Community Fund 3 Field outreach marginalize language(s) — Region Africa 1+ 2 months 6~8k EUR No contact at the moment.
WMF Community Fund 3 Field outreach marginalize language(s) — Region CEE / Russia. 1+ 2 months 6~8k EUR No contact at the moment.
WMF Community Fund 3 Field outreach marginalize language(s) — Region ESEAP: E./S.E. Asia, the Pacific region 1+ 2 months 6~8k EUR No contact at the moment.
Others
WMF Community Fund 4 2022 Contribuling Conference n.a. 2 days 4k EUR Partnership with INALCO.
WMF Alliances Fund 1 Taiwan aboriginal languages (16) Wikimedian in residence 16 9 months 50k EUR Partnership with
TOTAL 24 21-23 months 95-120k
M.L.: marginalized languages supported, creating a solid range of demonstrations of in-community LinguaLibre's usage. Field outreach : proof of concept and seeding expeditions, implying contact, informed consent by minority, travel, possible linguistic research, training of locals, supervision of recording session, hosting fees.

Associated onboarding[edit]

The wave of new comers should be welcomed and onboarded properly.

Type Description M.L. Duration (est.) Budget (est.) Comment
Community engagement Guiding / onboarding wave of new comers per LinguaLibre:Roles:
  • (Speakers – follow up to lead in order to increase current low retention rate, with focus on speakers of minority languages.)
  • Dev – point them to the Github and Phabricator, profile their skillsets, guide them to suitable repositories/project.
  • Online advocate − onboard them into the PR team, may help lower the workload
  • Local coordinators in the target community – able to organize local events and training, those will increase long term impact.
n.a. 8 months (part times)
Campaign solidity

I do not believe the full effort drafted above can be achieved by volunteers with occasional evolvement. We can expect such team to achieve 1/4 to 1/3 of that plan (go 3-4 times slower). I wonder if this overall coordination plan –two coordinators for 6 months (favored) or one for one year (will do) to initiate multiple projects via grants– could itself benefit from a Wikimedia Fund we previously discussed ? Such ~50k€ central 2022 LinguaLibre Campaign coordinator.s would be more solid to get most or more of this wishlist to actually happen in 2022.

@Yug: Thanks for sharing these ideas with me to support LinguaLibre. Because this is a complex set of related but separate proposals, my suggestion for next steps would be to have the applying individuals or organizations who are proposing these projects to contact the Regional Program officer over e-mail depending on where they are physically based. You can find more information about each funding region here, and a listing of our team is provided here. I JethroBT (WMF) (talk) 15:03, 27 November 2021 (UTC)Reply[reply]
As a follow up anecdote, the volunteer-based LL PR campaign designed and lead by Marreromarco, which my overall PR+tech funding requests plan aimed to secure and solidify, has just been called off. Marreromarco has assessed the software side to be too basic to satisfy non-wikimedian public, and therefore, not worth his ambitious volunteer-powered PR campaign. In our case, tech and PR goes together. I'm still drafting a draft proposal for Marti. Yug (talk) 11:48, 29 November 2021 (UTC)Reply[reply]

Competition[edit]

Metrics[edit]

Project Licence Languages Members Recorded words Recorded sentences Written sentences Comment
Tatoeba.org Open 410[2] 56,406[3] n.a. 929,389[4] 10,192,845[5] UI: excellent and lively UI, to learn from
CommonVoices.org Open 90+ 200,000[6][7] 30,000,000[6][8] n.a. n.a. UI: site has clean and dynamic UI to learn from.
LinguaLibre.org Open 170+[9] 1100+[10] 750,000+ n.a. n.a. UI: "best opportunities for progress" .
Forvo.com NC UI: excellent and futurist UI, to learn from
Note: CommonVoices has no aggregated count available. Per language counts from which a sum can be made. One hour estimated equivalent to 2000 words.

Competitive advantages[edit]

Each site has different focus.

  • Tatoeba actually focus on written sentences and parallel sentences (translation) to feed learning applications.
  • Common Voice focus on audio sentences by various speakers, with very diverse audios being looked after, to feed Speech2Text and Text2Speech systems.
  • LinguaLibre focus on clean audio words to illustrate Wikimedia Wiktionaries (so far), but by design convenient for vocabulary applications and dictionaries (requires working datasets page).
  • (Forvo)

SWOT[edit]

SWOT analysis (strengths, weaknesses, opportunities, and threats) analysis is a method for identifying and analyzing internal strengths and weaknesses and external opportunities and threats that shape current and future operations and help develop strategic goals.

SWOT for Lingualibre's UI[edit]

SWOT en.svg
Strength Weakness/Lags
  • LinguaLibre is unique in its focus on words
  • LinguaLibre is unique in its native integration to Wikimedia ecosystem, especially Wiktionaries.
  • Lingualibre is catching up with Tatoeba in term of amounts (660k vs 930k)
  • Mediawiki allows community a flexible collaboration.
  • Lingualibre lag behind Tatoeba in term of linguistic diversity (145 vs 410 languages).
  • Lingualibre lag behind Common Voice in audio content (est. 30M words-equivalent vs 650k word).
  • LinguaLibre's UI (home page, stats), communication (home page content) is keeping LinguaLibre down compared to these (open content) competitors.
  • LinguaLibre's community is too small to leverage the wiki for events organization and else.
Opportunities Threats
  • LinguaLibre's can improve design by learning and duplicating competitors' best practices.
    • CSS snippet can be created, Help:SPAQRL can provide data.
  • LinguaLibre's can improve playfulness by learning and duplicating competitors' best practices.
    • Gamification can be increased via visual call for actions and forward competition.
  • Lingualibre's stagnating and raw user interface is not engaging enough.
  • Lingualibre's community could stagnate, and therefore go in relative decline compared to competitors.

Other assessments[edit]

Communication pages[edit]

Help pages[edit]

Category:Lingua Libre:Help pages' : overall review, recategorizing of identified orphans pages, basic improvements, needs assessment (below) done yesterday. Help pages would benefit from some care.

Needs merge :

Needs split:

Needs better inclusion into LinguaLibre:Stats/Languages or links:

Needs expansions ():

Comment:

  • orphan pages likely missed.
  • other namespaces not assessed.
  • maintenance ideas, improvements, templates could help

Gadgets scripts[edit]

LinguaLibre Gadgets are JS script enhancing the site by adding some features.

Future[edit]

See also[edit]

References[edit]