|Status: draft staled. (Irrelevant as long as tech side is not fixed). The page below is mainly a strategic review of existing assets (human resources, knownledge base, data), competitions, comparative advantages and SWOT.|
On LinguaLibre, we are starting to design a broad 6 months long PR effort campaign. I wonder if such overall PR campaign based on mailing to popular languages and tech blogs/newsrooms, creation of base PR materials followed by translations, and writing of ~8 needed downstream grant requests could itself be funded by some 50k$ Grant fund ?
- Current state
We have a lot to do, thousands languages, most of them are on the decline. We do not want to stay solely on major languages, we have to reach out to very small language early to print marginalized languages and their conservation into our DNA and brand.
|Type||Aspects or Lingualibre:Roles||Score||Importance||People||Hrs/mths||Bus factor|
|Technology||Core technology: Recording Studio. Backlog of:
||6/10–correct||Critical||n.a.||10h - tests||n.a|
|Technology||Lists management system||7/10–correct||Medium||n.a.||10h||n.a|
|Documented code||Github repositories||9/10–good||High||3||1h||Pass|
|Wikipages||Existing Help: pages||7/10–correct||High||n.a.||10h||n.a|
|Human Resources||Developers, back — MediaWiki, server||6/10–stable||Critical||1+||6h||Fragile|
|Human Resources||Developers, front — CSS, HTML, JS||2/10–insufficient||High||2||3h||Fragile|
|Human Resources||Editors on maintenance, expansion (not discussions)||8/10–correct||Medium||3||40h||Pass|
|Human Resources||Editors on welcoming, onboarding||5/10–basic||Medium||3||5h||Fragile|
|Human Resources||Editors on strategy, grants requests writing||4/10–basic||High||2||10h||Fragile|
|Human Resources||Editors on communication, outreach, PR||1/10–insufficient||High||0||1h||Fails|
|Institutional resources||Staff Wikimedia France support||8/10–correct||High||2||?||Fragile|
|Funding flow||Funding for development : when critically needed.||4/10–basic||High||n.a.||n.a.||Fragile|
|Status score integrates importance, existing resources, fragility, and opportunity cost. Critical server side is well managed by volunteer and staff, but bus factor stays too weak, resulting in modest 6/10 score.|
Risk/opportunity: We *must* use current calm to double the community and ensure long term HR and know-how's sustainability.
LinguaLibre linguistic coverage is expanding, first with major Western languages, then toward other large languages and minority languages in Western countries, and last toward non-Western marginalized communities. This is a starting point.
|Demographic||World||Ratio||Lili||Coverage||Supported language's profile||Examples||Community's presence|
|Major (>30M)||30||0.5%||10||33%||Mostly major Western or Indian languages.||FRA, SPA, BEN||Solid: Several productive speakers. Sustained or periodic.|
|Large (1~30M)||350||5%||80||23%||Mostly Western languages, other notable languages||NLD, AFR, CAT||Emerging: One productive speaker, few not-retained speakers. Fragile.|
|Marginalized (<1M)||6500||94%||40||<1%||Mostly larger minorities in Western countries.||ATJ, BRE, EUS||Contact point: No productive speaker, one not-retained speaker. Below fragile.|
Risk/opportunity: we must raise awareness, seed, demo, train to Lingualibre among true marginalized communities on the 7 Wikimedia regions. So know-how is moved away from West nerds and into true local, linguistically at risk communities.
On PR side, we observe the following fronts and opportunities :
|Communication||Overall outreach campaign seeding the idea of LinguaLibre into diverse community, geographically dispersed.
Cause new influx of advocates, speakers, event organizers, devs.
Ensures bus factor is not a risk anymore = secures project's sustainability.
|n.a.||7 months||26k USD|
|Outreach/External||Email campaign to dozens of language blogs and newsrooms.
Email exchanges, coordination, interviews, copyedit with their authors
|n.a.||4 months||16k USD|
|Outreach/Toolkit||Improve base PR materials with: base emails ; base presentations ; Base flyers.||n.a.||1 month||4k USD|
|Outreach/Medias||Coordinating and hiring to create testimony video material (example)||n.a.||1/2 month||2k USD – coordinator
2k USD – filming day
|Could reuse existing short online videos on language diversity ?|
|Outreach/Crowdsourcing||Create a KissKissBangBang crowdsourcing campaign on the model of WikiCheese.||n.a.||1 month||4k USD||Objectif is equaly PR by increasing awareness and raising some public money. Helps assess this avenue of autonomous future funding via crowdsourcing.|
|Outreach/Translations||Coordinating translation of refined base documents from EN to major languages
Languages: ES, FR, AR, RU, HI, ZH, Indonesian, Swahili (language spheres with largest language diversity).
|n.a.||1/2 months||0€||Wikimedians lead translations.|
Associated funding requests
There also are opportunities in writing funding requests and initiating the following :
|Type||Description||M.L.||Duration (est.)||Budget (est.)||Comment|
|Coordination||Overall strategy, coordination, planning and multiple fund requests.
Get things on rails and rolling at the required speed.
Strategies sub-project : technologies, proof of concepts, communication.
|n.a.||8 months||32k USD||This position is leading the writing of the fund requests below.|
|WMF Technical Fund 1||LinguaLibre backlog features requests and improvements||n.a.||3 months||15k EUR||API: Sparql. Front: CSS, JS, Vuejs, MediaWiki, PHP.|
|WMF Technical Fund 2||Anki e-learning plugin generator||n.a.||1 month||5k EUR||API: Sparql. Front: CSS, HTML, JS.|
|WMF Technical Fund 3||Integrated e-learning webpage for words||n.a.||1~2 months||5~10k USD||API: Sparql. Front: HTML, CSS, JS, VueJS.|
|WMF Technical Fund 4||Unilex 1000 languages lexical database update||n.a.||1~2 months||5~10k USD||Python.|
|WMF Technical Fund 5||Dashboard for linguistic coverage.
||n.a.||1~2 months||5~10k USD||API: Sparql. Front: HTML, CSS, JS, D3js.
Will help redefine LinguaLibre.
|Field outreach : training local community, recording 5000 words.|
|WMFR Micro-fi 1||Field outreach marginalize language(s) — Region W. Europe "France's Whistled Gascon (1)"||1||4 days||500 EUR||Volunteer in that region, implies minimal travel, hosting costs.|
|WMF Community Fund 1||Field outreach marginalize language(s) — Region Lat. America "Peru Amerindians languages (3)"||3||2 months||6~8k EUR||Partnership with Aquaverde.|
|WMF Community Fund 2||Field outreach marginalize language(s) — Region US/Canada "Canada Amerindians languages (3)"||3||2 months||6~8k EUR||Partnership with Wikimedia Canada and Atikamekw's wiki.|
|WMF Community Fund 3||Field outreach marginalize language(s) — Region Africa||1+||2 months||6~8k EUR||No contact at the moment.|
|WMF Community Fund 3||Field outreach marginalize language(s) — Region CEE / Russia.||1+||2 months||6~8k EUR||No contact at the moment.|
|WMF Community Fund 3||Field outreach marginalize language(s) — Region ESEAP: E./S.E. Asia, the Pacific region||1+||2 months||6~8k EUR||No contact at the moment.|
|WMF Community Fund 4||2022 Contribuling Conference||n.a.||2 days||4k EUR||Partnership with INALCO.|
|WMF Alliances Fund 1||Taiwan aboriginal languages (16) Wikimedian in residence||16||9 months||50k EUR||Partnership with|
|M.L.: marginalized languages supported, creating a solid range of demonstrations of in-community LinguaLibre's usage. Field outreach : proof of concept and seeding expeditions, implying contact, informed consent by minority, travel, possible linguistic research, training of locals, supervision of recording session, hosting fees.|
The wave of new comers should be welcomed and onboarded properly.
|Type||Description||M.L.||Duration (est.)||Budget (est.)||Comment|
|Community engagement||Guiding / onboarding wave of new comers per LinguaLibre:Roles:
||n.a.||8 months (part times)|
- Campaign solidity
I do not believe the full effort drafted above can be achieved by volunteers with occasional evolvement. We can expect such team to achieve 1/4 to 1/3 of that plan (go 3-4 times slower). I wonder if this overall coordination plan –two coordinators for 6 months (favored) or one for one year (will do) to initiate multiple projects via grants– could itself benefit from a Wikimedia Fund we previously discussed ? Such ~50k€ central 2022 LinguaLibre Campaign coordinator.s would be more solid to get most or more of this wishlist to actually happen in 2022.
- @Yug: Thanks for sharing these ideas with me to support LinguaLibre. Because this is a complex set of related but separate proposals, my suggestion for next steps would be to have the applying individuals or organizations who are proposing these projects to contact the Regional Program officer over e-mail depending on where they are physically based. You can find more information about each funding region here, and a listing of our team is provided here. I JethroBT (WMF) (talk) 15:03, 27 November 2021 (UTC)
- As a follow up anecdote, the volunteer-based LL PR campaign designed and lead by Marreromarco, which my overall PR+tech funding requests plan aimed to secure and solidify, has just been called off. Marreromarco has assessed the software side to be too basic to satisfy non-wikimedian public, and therefore, not worth his ambitious volunteer-powered PR campaign. In our case, tech and PR goes together. I'm still drafting a draft proposal for Marti. Yug (talk) 11:48, 29 November 2021 (UTC)
|Project||Licence||Languages||Members||Recorded words||Recorded sentences||Written sentences||Comment|
|Tatoeba.org||Open||410||56,406||n.a.||929,389||10,192,845||UI: excellent and lively UI, to learn from|
|CommonVoices.org||Open||90+||200,000||30,000,000||n.a.||n.a.||UI: site has clean and dynamic UI to learn from.|
|LinguaLibre.org||Open||170+||1100+||750,000+||n.a.||n.a.||UI: "best opportunities for progress" .|
|Forvo.com||NC||UI: excellent and futurist UI, to learn from|
|Note: CommonVoices has no aggregated count available. Per language counts from which a sum can be made. One hour estimated equivalent to 2000 words.|
Each site has different focus.
- Tatoeba actually focus on written sentences and parallel sentences (translation) to feed learning applications.
- Common Voice focus on audio sentences by various speakers, with very diverse audios being looked after, to feed Speech2Text and Text2Speech systems.
- LinguaLibre focus on clean audio words to illustrate Wikimedia Wiktionaries (so far), but by design convenient for vocabulary applications and dictionaries (requires working datasets page).
- SWOT analysis (strengths, weaknesses, opportunities, and threats) analysis is a method for identifying and analyzing internal strengths and weaknesses and external opportunities and threats that shape current and future operations and help develop strategic goals.
SWOT for Lingualibre's UI
- Lingualibre:Mailing – empty stub needing purposeful writing
Category:Lingua Libre:Help pages' : overall review, recategorizing of identified orphans pages, basic improvements, needs assessment (below) done yesterday. Help pages would benefit from some care.
Needs merge :
- LinguaLibre:Language codes systems used across LinguaLibre & Help:Langtags
- Help:Choosing a microphone & Help:Configure your microphone
- Help:Data structure into Help:Documentation opérationelle Mediawiki or a template ?
Needs better inclusion into LinguaLibre:Stats/Languages or links:
Needs expansions ():
- Help:SPARQL 2
- Template:User ratelimit
- orphan pages likely missed.
- other namespaces not assessed.
- maintenance ideas, improvements, templates could help
LinguaLibre Gadgets are JS script enhancing the site by adding some features.
- MediaWiki:Gadget-Demo.js - a words list generator and demo
- MediaWiki:Gadget-ExternalTools.js - I believe is the current wordlist generators... not sure.
- MediaWiki:Gadget-Upload local file.js
- mw:Wikimedia Apps/Reading list browser extension ― to create a learning list application (e-learning)
- WDQS editable — to allow logged in users to edits items via the result tables.
- LinguaLibre/2022 wishlist
- Phabricator board
- lingualibre:LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign
- "Summary by language size". Ethnologue. Archived from the original on 12 March 2019.
- Rapid approximated sum of participants
- Extrapolated from : 15,000 hours of voices (raw personal estimate)