Lingua Libre/2022 wishlist

From Meta, a Wikimedia project coordination wiki
Note: This page have been moved to Meta where the Visual Editor is usable and table are easier to maintain (Edit page with Visual Editor).

In late March 2022, Wikimedia France is establishing its budget for the period of July 2022 to June 2023.

Please share here what you think we should get done this year on Lingua Libre. Feel free to add projects of yours that would require funding, as well as bugs and foreseeable technical needs.

Please remember to link Phabricator tickets to the bugs and technical issues you raise. A maximum of 10 suggestions per person would be best.

Approach[edit]

Editorial vision for this raw wishlist

Phase 1, clarify needs and assess priorities: digest the raw wishlist into a clean document usable to strategize efforts.

  1. Strategic awareness: get several members to read, understood this full list and acquired global vision 👉🏼 Yug, Poslovitch.
  2. Clean up: clarify and harmonize these raw texts; divide vertically into identifiable missions and skill-based clusters ; divide horizontally into distinct and non-overlapping topics.
  3. Evaluate: estimate workload, cost, importance, urgency, priority.
  4. Map missions: keep an updated DAG map of those missions.

Phase 2, outreach documents: create derivative documents with selected missions, written in formats suitable for target key publics such as

  • board members (financial decision makers) : focus on strategic importance, wished outcome, cost estimates.
  • developers team: providing strategic importance, time estimate, access point to start coding.
  • marketing team: providing strategic importance, time estimate, direction to start document writing and campaigns.


Note on estimates
  • We assume access to skilled developers at affordable price : 250€/day, 1,250€/week, 5,000€/month. Single day sprint at 500€. Lack of such skillset-tarif would affect the project.
  • Project management coordination, technical requirements writing, call for hires, administrative work, intermediate and final review and testing are not included. Expect about 1 human year, 40k€.
  • Outreach to local minorities for field linguistic cost is not present.
  • Occasional travel, hosting, meal costs are not included. Expect 10k€.


Contextual dates and deadlines

To keep synchronized with lingualibre:Lingualibre:Events/Program.

  • Done 2022.03.11: Adelaide initiated the 2022-2023 Lingualibre wishlist.
  • Done 2022.06.01 (event): Updated wishlist shared, call for improvement.
  • Done 2022.06.05 (deadline): Celtic Knot <5mn prerecorded video.
  • Done 2022.06.12 (deadline): Wikimania 2022 submition deadline. See possible formats.
  • Done 2022.06.19 (event): Wider public forum in Toulouse of local communities associations with Lingualibre's stand and our Occitan Gascon team Lo Congres.
  • Done 2022.06.21-23 (event): LREC / Languages Resources and Evaluation Conference in Marseille. Opportunity to share our vision with researchers of language resources.
  • Done 2022.07.01-02 (event): Celtic Knot Conference 2022, for “communities working on a minority language on the Wikimedia projects”, See requested format
  • [ 2022.07.xx (event idea): several outreach and recording sessions at INALCO ]
  • Done 2022.07.08-10 (event): Wikimedia France's Wikicamp. Opportunity to share our vision with WMFr's community.
  • 2022.08.03-06? (event, W/T/F/S/Sunday): Wikimedia France Hackathon.
  • 2022.08.11-14 (event, T/F/S/Sunday): Wikimania Online. Opportunity to share our vision with the global WMF's community.
  • 2022.09.xx (deadline): Wikimedia Foundation community fund, Alliance fund applications deadlines. See {template:Grants table}.
  • 2022.11.19-20 (event): WikiConvention, Paris.

Wishlist[edit]

See also :phabricator:LinguaLibre.

Section 1 : RecordWizard[edit]

Submitted by Definition & Evaluation Estimated costs
User Project Title Description Priority Time Budget
RecordWizard
0x010C RecordWizard Critical fixes Mitigation of several major bugs : audio clicks… ★★★★★ ~1 month 5,000€
0x010C

Yug

RecordWizard Sharable click-and-record link :phab:T313575 Sharable RecordWizard URLs with parameters to pass settings such as locutor (Qid), language recorded (Qid), local wordlist used (title), etc. as URL parameters to prefill RW's form. Motivation: This allows experienced user to send non tech-literate speakers a click-and-record link. ★★★☆☆ medium ~2 weeks 2500€
0x010C RecordWizard Enhance the Tutorial step :phab:T266843 ? ★★☆☆ ~2 weeks 2500€
Yug Audio data Investigate Click bug T281041 Review recent users' recording to properly assess prevalence of audio defects. See also Property:P33 `type of audio file issue`. Need to hand review at least 200 files. ★★★★★ 1 week 1,250€
0x010C RecordWizard Automatic audio quality check T290010 ? ? ~1 month 5,000€
0x010C RecordWizard Automatic audio quality tagging :phab:T303680 ? ? ~1 month 5,000€
Yug RecordWizard RecordWizard working offline :phab:T313574 Ability for the RecordWizard to operate offline ★★★★☆ high ~2 month 10,000€
List loader
Yug RecordWizard List loader handles dictionary (github):phab:T212671 Marginalized communities with no wordlist available requires the creation of a minimalist bilingual dictionaries, translated from local macro language into our target minority language, in order to create that fist wordlist. The list loader could easily be resilient to load such bilingual dictionary. Format is # L1 → L2, see also Help:List translation. ★★★★☆ high 1 day 500€
Yug RecordWizard List loader handle metadata (github) Format for metadata to think. Ex: # rouge [pos:noun;french:mot;ipa:/ɹuːʒ/;…]. This has deeper implications. It requires to be human and machine editable, as it could be a place to allow humans to create dictionaries and machine readable data to wikidata lexeme. See also Handedict (ask Yug). ★★★☆☆ medium 1 day 500€
Yug RecordWizard List loader handle HTML comments, wiki <noinclude> (github):phab:T212671 HTML comments, noinclude contents is automatically removed. ★★☆☆☆ medium 1 day 500€
Poslovitch RecordWizard List loader filters lists by list type. :phab:T313478 List types: Frequent words lists ; Never recorded words lists ; Thematic lists ; Requested by community.

User-speaker can pick from tickable list what they want to focus on, and be suggested the right wordlists.

★★★☆☆ medium (sugar) ~1 month 5,000€
Yug RecordWizard List loader has priority system :phab:T313500 List loader can discriminate higher quality lists for a language. ★★★☆☆ medium 1 week 1,250€
Rdrg109 RecordWizard List generator > Based on Lexemes without audios :phab:T283802 Create a list generator based on Wikidata lexeme's words and sentences (Property:P5831 `usage example`) with no Property:P443 `audio pronunciation`. Note: On 2022/03/18, only 1 usage example has a pronunciation audio ; of the 129,942 English forms, only 340 have pronunciation audios (i.e. ~0.26%). More statistics here, discussion here. Note: In RW, step 3, the External Tools list loader could provide few built-in examples, including this one. Implies JS, OO.ui.js skills. ★★★☆☆ medium 1 week 1,250€

Section 2 : MediaWiki[edit]

Write your suggestions here
Submitted by Definition & evaluation Estimated costs
User Project Title Description Priority Time Budget
MediaWiki maintenance
0x010C MediaWiki Local MediaWiki enhancements Lingualibre's MediaWiki can be enhance's for better user experience : the main search bar, Special pages and wikicode-editing UI (Special:Search, Special:RecentChanges,...). Better UX would increase user retention. ? 1.5 month 8,000€
Poslovitch MediaWiki Update/Upgrade 1.35.5 MediaWiki 1.35 requires few security upgrades. The next LTS version (1.39) is expected on November 2022, which is recommended to keep up to date, compatible with MediaWiki extensions, and to keep our site and users safe. Upgrades also requires numerous small correction to LinguaLibre's core RecordWizard extensions. ? 2 months 20,000€
Poslovitch MediaWiki Extension install: MLEB MediaWiki Language Extension Bundle is a pack of extensions that should be updated "as a group" and not individually (and attempting to do so in December did not yield any success). As brought by T295250, updating the MLEB would allow the use of a "tvar" syntax (which I'm unfamiliar with) ★★★☆☆ medium 1 week 1,250€
Yug MediaWiki Extension : Translate Translate extention to update. ★★☆☆☆ 1 day 500€
Yug MediaWiki Extension install : Visual Editor Visual Editors would help to co-edit wordlists and documents with elder or less computer-educated collaborators. Field work has shown this demographic is over-represented among minority and endangered languages speakers willing to contribute their voice and lexical knowledge to Wikimedia. ★★★☆☆ medium 3 days 750€
Yug MediaWiki Extension install : Template Styles Template style would ease creation and maintenance of stylized templates, most notably navbox. This need arose recently ★★☆☆☆ 3 days 750€
Yug MediaWiki Extension creation : Languages gallery Create extension based on CommonVoices > Languages gallery https://commonvoice.mozilla.org/en/languages (Mozilla Public License) ★★☆☆☆ 2 weeks 2,500€
Poslovitch Wikibase Database performance improvement :phab:T312537 & ... The SPARQL endpoint is unpractically slow, which makes the current Sound library non-functional (too slow). Performance must be improved 100 fold. Making an intermediary duplicated database could solve this strategic weak point. ★★★★☆ high 1month 5,000€
0x010C Search engine / Sound library Responsive search engine / gallery :phab:T252321 Provide a proper, time responsive search engine in order to showcast our audio voices riches and attrack larger public. ★★★★★ high (critical) ~3 months 15,000€
Yug Search engine / Sound library Learning management system (basic) Provide a sustained personal learning experience. Visitor can favorite recorded words to add those to its personal learning list. Words have self-assessment system `saved/learning/mastered`. Motivation: endangered languages communities and general language learners repeatedly asked for learning tools, for language revitalization or language learning. Providing such added value will attract those speakers and learners. ★★★☆☆ medium 2 months 10,000€
Marreromarco Search engine / Sound library Learning management system (full) Provide a solid and elegant words-oriented and user-centered learning experience. Motivation: Same as above. This model is followed successfully by for-profit competitor Forvo. Failing to make this move, Lingualibre will stay a nerd and wikimedians only tool, therefore willfully choosing to fail its scale up and outreach. ★★★★★ high (critical) 6 months 30,000€

Section 3 : others[edit]

Write your suggestions here
Submitted by Definition & evaluation Estimated costs
User Project Title Description Priority Time Budget
Various
0x010C RecordWizard / Bot / Dedicated Extension Mass edit tools Tools to help experienced users do maintenance tasks: patrolling audios, batch-editing recording elements, batch-importing records… ★★★☆☆ medium ~3 months 15000 €
Poslovitch UI > Dataset page Datasets page revamp to be elegant. :phab:T313572 The Datasets index is unsightly while displaying our whole and valuable output, to be re-used. Revamp is necessary. See competition https://commonvoice.mozilla.org/fr/datasets ★★★☆☆ medium 1 week 400€ ?
Yug UI > Languages gallery Languages gallery page, elegant. :phab:T313397 Languages statistics should be queried (slow), then copied and stored, periodically. Some elegant HTML, CSS should then generate via JS a full language page, if possible with filter feature by language and ISO (VueJS or VanillaJS recommended). See competition https://commonvoice.mozilla.org/en/languages ★★★☆☆ medium 2 weeks 2,500€
Marreromarco UI > Request pronunciations form Form for Requested pronunciation notifies Native speakers Provide a form to submit words requests in a given language. Words are appended to a [[List:{iso}/Requested by community]]. Notifies volunteer native speakers. Motivation: It is very useful for language learners to request the specific word/phrase in which they have doubts about the Pronunciation. Forvo allows such function and users make very creative requests. It is also helpful specially for technical terms and proper names ★☆☆☆☆ low 2 weeks 2,500€
Poslovitch Docker Create a proper development environment :phab:T313573 Create a proper Integrated Development Environment (IDE) for the PHP, JS/VUEJS, CSS, HTML stack used by mediawiki. Such tools are central to 1) allow developers rapid diving into MediaWiki and MediaWiki extensions' codes; and 2) allow volunteer developper to ensure changes to the RecordWizard and other extensions do not risk to cause issues downstream. Motivation: since 2021, volunteer developers have insistently attempted to create such tool without success. ? 1 month 5,000€
Poslovitch Various Implement the Lists suggestions from July 2021's Hackathon Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability ? 3 months Unknown
Outreach
Marreromarco PR > General Public Relations Campaign > Blogs Promote LinguaLibre via posts An underused avenue to promote the project is to write posts on blogs, social media, magazines, newspapers, create YouTube videos, etc. LinguaLibre now has notable and peculiar stories like for Gascon (2019), Cantonese (2021), Sicilian (2022), Surui (2022), which could be shared more broadly. A PR Campaign is necessary in 2022-2023 to increase the number of active contributors and become a viable FOSS alternative to Forvo. ★★★★☆ high 6 months 6,000€ (Internship)
Marreromarco PR > General Public Relations Campaign > As learning tool Promote LinguaLibre as a learning tool Promote the website to attract language learners, with invitations to contribute their voices on missing languages.
Marreromarco PR > "Month of Voices" Lobby for a "Month of Voices" Propose to Wikimedia Headquarters the development of a "Month of Voices" in which LinguaLibre would be promoted on Wikipedia Articles in the Section of "Languages" at the left side of the Main Page. The idea was discussed previously: LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign. ? 6 months 6,000€ (Internship)
Peripherical projects (not our stack)
Marreromarco ? > Anki Integration with LinguaLibre An Anki Add-on would be helpful for language learners ★☆☆☆☆ low (?) 1 month 5,000€
Languageseeker Wikidata Lexeme, bot. Pull common linguistical data from Wiktionaries to Wikidata Some linguistical data (part of speech, pronunciation, conjugation, etc) is universal and would be useful to be able pull from Wikidata. However, most of it is currently manually entered on Wiktionaries. This would pull these common bits into Wikidata. Part of this project would involve developing a system for representing linguistical data in Wikidata. It will enable the disambiguation of heteronyms. ★☆☆☆☆ Out of scope 3 months 15,000€
Rdrg109 External plugin ? > gather sentences (?) Extracting sentences from any audio stream for their inclusion in Lingua Libre. Each extracted audio would correspond to a sentence. Each sentence could be added to lexemes as a "usage example". Having usage examples with pronunciation audios makes Wikidata lexicographical data more useful. With SPARQL, we could then answer questions of the style: Usage examples with pronunciation audios that were retrieved from interviews where the participant is a native speaker of that language. More information about this idea in this page. ★☆☆☆☆ out of scope ? 3 months Unknown (I have little experience with MediaWiki development so it will be more of a learning experience)
Yug Unilex Revive UNILEX data gathering UNILEX is an open license, Google's one shoot project who scrapped the internet via basic python scripts to build frequent words lists in 1001 languages. The technology used is basic and efficient. Wikimedia could help crowdsource this project's websites index, in order to provide more languages and with better wordlists. This would support field lexicography workshops for minority languages. See also Lingualibre:Events/2021 UNILEX-Lingualibre. ★☆☆☆☆ out of scope ? 1 week 1250€ Depends on ambition.

Directed Acyclic Graph[edit]

This is an exploratory work.
2022 LinguaLibre-DAG-project management.

Submit additional ideas[edit]

Add below to submit an additional wish.

  • Improve the audio review system.
  • Outreach > Present Lingualibre to WMfr as strategic for diversity, revitalization.
  • Outreach > Fundrising

Exploring wikibase migration[edit]

The following contents are identified as requiring specific migration efforts

  • Wikibase : records -- Commons wikibase
  • Wikibase : languages -- Wikidata wikibase
  • Wikibase : speakers -- ?
  • Wikibase : properties -- ?
  • Mediawiki : wikipages (Main, Help) -- Commons project space
  • Mediawiki : Lists (editable, MIT-like license) -- ?
  • Mediawiki : js scripts lingualibre:MediaWiki:Common.js -- delete or Lingualibre.org pure web website
  • Mediawiki : LTR and RTL support
  • Mediawiki : login system / Oauth
  • Services (= menu) -- Lingualibre.org pure web website

See also[edit]

References[edit]