Jump to content

Lingua Libre/2026/Phase 1

From Meta, a Wikimedia project coordination wiki

Similarly to coding sprints, let's push for « Funding Sprints ». For this next sprint, we have 2 strategic aims :

  1. obtain funds for the next Lingua Libre coding & developement cycles
  2. acquire know-how for auto-funding via those applications

On a more local level, we aim to fund coding cycles for :

Grant sprint to do in October

[edit]

Axis 1 : Lingua Libre App consolidation

[edit]
Ticket Titre J Budget Likely by Type
Consolidation
T399379 Server-side buffer upload : server hold the user's Oauth and files, uploads to Commons while respecting the ratelimit. 3 Pushkar
T313574 Offline recording (in-app buffer), user could record offline, when internet comes back, it uploads to the server. 2 Pushkar
Recalculate batch values. 1 Pushkar
Code health / Code qualit. 5 Pushkar
T405846 Create a server-side processing area which could allow server hosted sub-processing :
  • T213535 : sound normalization
  • T251638 : de-noising
  • T342086 : quality check
  • T213534 : compression ?
1 Pushkar
Subgroup totals

There is work for +3 missions worth 8 days of work each over 2026.

There is also the many smaller maintenance issues floating around.

Axis 2: Lingua Libre-based tool re-coding and improvement

[edit]
Ticket Titre J Budget Likely by Type
Lingua Libre adjacent tools
Global statistics : simple table-shaped statistic for :
  • Records by languages (DESC)
  • Records by users (DESC)
  • Montly top contributors (DESC)
  • Recent languages (if possible)
3 Aditya coding
Language dashboard with downloadable dataset, see LanguagesGallery. 3 Aditya coding
Map of locutors : show geography of contributions

Map filters : by language ; by user ; by quantity ; by word

2

3

Aditya coding
User statistics : dashboard for each user

- gamified design with personal stats and elegant icons - gamification with positive encouragement, lowly competitive

1

2 2

Aditya coding
Sound library / search among all data.

Add filters : by language ; by user ; by word.

5 Aditya coding
Brief final report 1 Aditya coordination
Subgroup totals 20

Which was supposed to be Aditya's GSoC25 :

  • Local dictionaries : frugal wiki-based bilingual vocabulary list, searchable, with minimal e-learning features.
    • Including code for a Wikimania 2026 hand gesture dictionary.
  • Download dataset page > Revamp

Axis 3: Lingua Libre Bot for wiktionaries

[edit]

This is critical to valorize Lingua Libre assets across more wiki and communities. Current valorization with 6 wikis only lags far below possibilities.

Ticket Titre J Budget Likely by Type
2025Q3 : Consolidation périphérique via GSOC25, Outreachy et autres
T386084 Lingua Libre Bot : investigation de la structure des wiktionaires supportées 4 1400€ Yug Analyse
T386084 Lingua Libre Bot : refactoriser en un code maintenable pour soutenir la diffusion des audios sur davantage de wiktionaires 11 4400€ 0x010C coding
Subgroup totals 5800€

Axis 4: SignIt finalisation

[edit]
Community Resources and Partnerships/India Rapid Project/Lingua Libre SignIt 2026
Ticket Titre J Budget Likely by Type
2025Q3-Q4 : Réserve à idées pour Outreachy/31, GSOC26 et autres
T386062 Signit : finalisation ? Kabir développement
T386062 SignIt : publication sur les web stores ? Yug coordination
Subgroup totals

Axis 5: Later projects

[edit]

This is the reserve for GSOC and later freelances.

Ticket Titre J Budget Likely by... Type
Possible Google Summer of Code 2026
T385383 Coordination Lingua Libre (Q3 si financé, possiblement deux projets) 5000€ Yug coordination
Tatoeba & Lingua Libre convergence 5400€? Intern coding
Common Voices & Lingua Libre convergence
Lingua Libre IOT : table interactive pour la valorisation des toponymes en langues régionales dans les musées historique et mairies de France et d'ailleurs. 5400€ Intern coding
Spell4Wiki & Lingua Libre 5400€? Intern coding
Lingua Libre et TTS : create a pipeline from Lingua Libre datasets to TTS language models. 5400€? Intern coding
Integrate Unilex 1001 words lists into Wikidata:Lexica
Subgroup totals

Funding

[edit]

Below are possible funding.

Rapid grants

[edit]

Webpages:

Todo:

  • Developer: (1) on WMF's grants application website (Fluxx), create a grant project page, (2) invite Yug as collaborator.

Cycle 3 (Deadline: November 1, 2025)

Cycle 3

November 1, 2025

Submission deadline

November 2, 2025 - November 29, 2025

Review

November 30, 2025 - December 13, 2025

Compliance check and decision announced

December 13, 2025 - January 9, 2026

Grant processing and payment

January 16, 2026

Earliest project start date

WMFR

[edit]

Rémy, director of WMFR, requested a table with net challenges to fund.

Microsoft program

[edit]
  • Possibilité d'un dépot de projet "Wikirésidence itinérante / open lexicographie ouverte / linguistique de terrain"
  • DGLFLF : réception du démonstrateur en juin ? -> voir avec Xavier.
  • Comparatif des projets d'enregistrements: Lingua_Libre/2022_Review#Metrics
  • Chapitre : Basque (EU), WikiSpeech (SE), Portugal (PT)
  • Partenaire : INALCO.
  • Deadline : 2025/11/11
  • Website : https://www.microsoft.com/en-us/research/academic-program/lingua-expanding-europes-voices-in-ai/
  • Abstract :
    • Idée 1 : Wikirésidence à l'INALCO. Objectif : audio documenter le maximum de langues de l'INALCO et de France. Avantage: disponibilité d'un studio radio professionnel.
      • Contacter INALCO

DGLFLF (France)

[edit]

WMCH

[edit]

ML contats

[edit]

People who may help as guides for the Lingua Libre + Whisper (TTS ML) projects :

See also

[edit]