Jump to content

Lingua Libre/Supports/pt-br

From Meta, a Wikimedia project coordination wiki
This page is a translated version of the page Lingua Libre/Supports and the translation is 7% complete.

Lingua Libre/Supports gathers all projects where Wikimédia France or other actors provided human supports to the Lingua Libre project. It aims to give a quick view of each initiative's human resources, objectives, end results and possible associated reporting or documents. The most original and relecant volunteer initiatives may also be documented below. For even smaller events, please visit Lingualibre Events page as a first entry point or contact the community.

Shtooka

Shtooka recorder
Nicolas Vion, Sponsor #2
Information
Start2004
Ended2016
SponsorsNicolas Vion
Roles
MainNicolas Vion (developer)
ContactNicolas Vion (developer), Yug

Since 2004(?), full stack developer and French student of Ukrainian language Nicolas Vion developed a rapid vocabulary recording system to provide audio to his personal learning resources.

At a time audio recording of words was tedious and dirty : the speakers mostly turning to Audacity, clicking on "Record" then "Stop" to define the start and end of the recording, resulting is irregular recordings which should then be renamed according to the word recorded.

Shtooka offered a hundred times faster and quite cleaner computer assisted way to record 1000s of words from a list provided by its user. The system included a form to define the language recorded, speaker name, gender, list name and other metadata common to all the words recorded. Sound level analysis identified audio level threshold to discriminate between irrelevant silences and when words were actually pronounced. Users could specify time margins keep at the start and end of the recorded words. The software produced regularly framed audios files, with one file by recorded items, human-friendly filenames and embedded metadata.

While very productive the program was a Desktop C/C++ software which was tedious to install and most likely required an advanced user to install it.

By the mid 2010s Shtooka was a niche software to download, install and use, mostly by language teachers and technophiles. An online service, SWAC Collection, allowed users to publish and share back to the online community about 300,000 recordings.

In 2013, Wikimedian contributor and Chinese teaching, vocabulary acquisition and e-learning PhD candidate noticed Shtooka open licence resources, meeting and befriending Nicolas. Yug enthusiastically used the new versions of the tools, providing feedbacks, bug reports, recording Chinese vocabulary in INALCO's recording studio for an adaptive learning web app CatIsSmart. Yug also promoted the tool to INALCO academics, Grenoble university, Wikimedia France and Wikimedians, looking for collaborations, funding toward Shtooka recorder or hire and the development of a more practical online version of the rapid recording tool. In 2015, Yug introduced Nicolas Vion to Rémy Gerbet.

2016 DGLFLF / Strasbourg

Lingua Libre PHP
Wikimédia France, Shtooka recorder
Information
WebpageGithub ; Lingualibre.fr
Start2016
Ended2016
SponsorsWikimédia France, Shtooka recorder
Roles
HierarchyRémy Gerbet WMFr, Adélaïde Calais WMFr
FacilitatorLyokoï
MainNicolas Vion (code, front, back, database)
ContactNicolas Vion (developer)

With secured funding from Strasbourg University and DGLFLF (?), Nicolas Vion was hired to create an online, collaborative version of Shtooka recorder. The Wikimedia-backed project was eventually branded « Lingua Libre ».

Rémy 2015


DGLFLF supported Wikimedia France for past years already. In 2014, Rémy is hired into a Service civique, working among other things on :

  • Listing regional offices or actors promoting local languages
  • 24 Aout 2014 : Wikimedia France and Jean-Louis Barreau from APLLOD (Association pour la Promotion des Langues via la Lexicographie et l'Open Data) open a partnership. A mockup for a recording tool is submitted.
  • 2015 : « Une enquête sur les pratique linguistes et numériques »
  • 2015-07-30 : the first DGLFLF patnership provides Wikimedia France with 15,000€ for both a seminar on French languages and a mobile apps promoting those languages. On September 23th, WMFR adds 10,000€ to lead this effort.
  • 23 janv. 2016 : « Congrès des Langues de France ». Discussions occurs on which kind of application to build. Yug bring Nicolas Vion, with the plan for a online word recorder based on Shtooka. Soon after, Rémy is hired by WMFr. DGLFLF funds are then forwarded toward Nicolas Vion and into transposing his desktop Shtooka Recorder into a web version, which became LinguaLibre.fr. On March 21st 2016, Nicolas Vion is hired by Wikimedia France and soon starts working. The first formal contributions go on April 20th, via https://lingualibre.fr, for a dedicated workshop Strasbourg with OLCA (Office pour la Langue et Culture Alsacienne). In May 2016, as the web apps proves trustworthy, Rémy creates fr:Project:Lingua Libre. In September 2016, Rémy creates fr:Projet:Langues de France focused on the related workshop and content created or to be created.

Soon after, the following community worked using LinguaLibre.fr to record audios :

On July 3rd, 2017, WMFR and partners lead a end-of-project ceremony at the Maison de l'Alsace, Paris.

2016 service civique on Francophonie with part-time on Lingualibre.

Occitan languages projects

See fr:Projet:Oc-a-thon (TBC).

2018 Wikimedia Foundation Grant

Lingua Libre 0x010C's recoding
Wikimedia Foundation, Sponsor #2
Information
WebpageGrants:Project/0x010C/LinguaLibre
Start2017-07 – Application writing
2017-09 – Grant approved
2018-06 – Beta test
Ended2018-07 – Final Report
SponsorsWikimedia Foundation
Budget30,600€
Roles
HierarchyWMF
Facilitator0x010C
Main0x010C
ContactYug

Designed and submitted by French wikimedian software engineer User:0x010C, this 8 months work rebuilt Lingua Libre with :

  • a Mediawiki, for documentation, lists storage and community space
  • a Wikibase, for data storage and open access services
  • an « Record Wizard » app, to load list, guide recording, write metadata to the wikibase (records, speakers), audio data to Wikimedia Commons, and account data to the local mediawiki via User:{username}/RecordWizard.json pages.

Wikimedia France and Nicolas Vion's 2016 Lingua Libre being a stand alone web app at Lingua Libre.fr, it totally lacked integration with the wikimedian ecosystem. While based on Nicolas Vion's earlier Lingua Libre and Shtooka in PHP and JS codes, the 3 blocks cited above required complete rewriting by 0x010C. This new version pioneered usage of Wikibase, while providing native wikimedian integration with a decisive "Publish to Commons" feature. Community list management was also greatly eased thank to mediawiki.

Lingua Libre Service Civique 2022.
Eavq/Eavqwiki
Wikimédia France, Sponsor #2
Lingua Libre
Information
WebpageBlog post by WMFr, 2021
Start2019
Ended2019(?)
SponsorsWikimédia France
Roles
HierarchyAdélaïde Calais WMFr
MainEavq
ContactEavq

2019 service civique on Lingualibre (Interview)

  • Contact with French regional languages groups
  • Contact with Lo Congrès (Occitan)
  • Recommandation for future actions : expand contribution tools, valorisation tools for language consumers (sonotheque), UI redesign
  • Coordination with 0x010C for 2019 RecordWizard's UI redesign
  • Relations with INALCO, initiating ContribuLing
  • Relations with Plateforme Atlas

Kits Lingua Libre (WMCA)

Grants/Requests/2020/Kits Lingua Libre
Kits Lingua Libre 2020.
Wikimédia Canada, Sponsor #2
Lingua Libre
Information
WebpageGrants/Requests/2020/Kits Lingua Libre
Start2020
Ended2020
SponsorsWikimédia Canada
Budget5814,85$
Roles
HierarchyThekidpossum (WMCA)
MainBrochon99
ContactBrochon99


2021 Wikivalley freelance

Prestation WikiValley 2021.
Wikimedia França
Wikimédia France, Sponsor #2
Lingua Libre
Information
WebpageDocumentation (lien à retrouver)
Start2021-02
Ended2021-06
SponsorsWikimédia France
Budget41058€
Roles
HierarchyAdélaïde Calais WMFr
FacilitatorVIGNERON (liaison)
MainSeb35 (code:29647€), VIGNERON (code, liaison:11411€)
ContactVIGNERON

Initially an upgrade from Mediawiki 1.32 to 1.35, the OVH datacenter fire required its extansion into a recovery and reinstall mission. External compentences where called over (WikiValley). Senior wikimedian User:VIGNERON served a community liaison, documentation writer and more.

2021 Campus Digital

VueJS recordings checker
Toulouse Digital Campus, Sponsor #2
Information
WebpageGithub
Startdezembro de 2021
Endeddezembro de 2021
SponsorsToulouse Digital Campus
Roles
MentorAdélaïde Calais WMFr
MainStudents
ContactYug (reviewed the code)

On December 2021, Digital Campus lead a one week hackathon around lingua libre. Their instruction was to offer a practical interface allowing to listen to existing recordings and to put a tag on them. To facilitate the exercise, they will not link it to lingua libre, just create a file in which the reviewed recordings and their tags are stored.

The proof of concept UI, coded in 4 days, has not been pushed further.

Lingua Librist in Residence - 2IF

Lingua Librist in Residence for DDF - 2IF
DDF
Institut International pour la Francophonie (2IF), Wikimédia France
Information
WebpageBlog post by WMFr
Startmaio de 2019
Endedsetembro de 2021
SponsorsInstitut International pour la Francophonie (2IF), Wikimédia France
Roles
MentorSebleouf, Noé
MainWikiLucas00
ContactWikiLucas00

4 months internship at Institut International pour la Francophonie (Lyon, France), as a Lingua Librist in Residence for the Dictionnaire des Francophones project.

  • The main objective was to integrate Lingua Libre audio recordings into the Dictionnaire des francophones, and to improve Lingua Libre for future recordings.
  • Participation in community and technical discussions, proposition for changes on the website, and organization of a two-day hackathon for Lingua Libre developers.
  • Classification of DDF entries associated with regions into approximately 250 different Lingua Libre lists.
  • Presentation of Lingua Libre at two international events during the summer: ContribuLing and Wikimania.

MSc. Computer Science class of Toulouse

Lingua Libre audio analysis via Asteroid

2020 Cantonese project

2020 Catonese project
Luilui6666
User:Yug, Sponsor #2
千方百计/No stone left unturn in Cantonese

巧克力/Chocolate in Chinese.

Information
WebpageLingua Libre/Supports
Start2020-05 – first recordings
Ended2020-07 – most recordings done
SponsorsUser:Yug
Budget300€
Roles
MentorYug (project lead)
MainLuilui6666
ContactYug
See production on -other (Q9186) and/or -yue.

The 2020 Cantonese project was a test-project lead by Yug and Hongkongese sound engineer Luilui6666, which aimed to test paid-contributions to audio document a target language via Lingua Libre tool.

A budget of 300€ for 10 hours (30€/h) was agreed upon. Few word list (HSK) and visio training on Lingua Libre recording studio were provided to the recordist. The recordist provided professional material and known how. Autopatrol userrights had to be requested. User:Yug sponsored this test with private funding.

The recordist worked occasionally from home, via short recording sessions, progressing steadily. This setting provided an appreciated complementary and free-planning income to Luilui6666.

The project got slowed down by a speeding-up recordings bug on longer sentences, which Luilui6666 reported and, with Yug's guidance, inspected and identified more precisely but through time consuming exchanges (~3h).

Nevertheless, about 6,000 recordings (see -other (Q9186) and/or -yue) were produced by this enthusiast and pro-active paid-contributor.

With a highly productive 6,000+ recordings completed within 3 months for a modest 300€, covering known core Sino-Cantonese vocabulary, together with satisfied parties, the project was considered highly successful and validating paid-contributions on Lingua Libre as an extremely productive avenue.

A smaller 2022 Chinese project was also carried out.

Mélody

Lingua Libre Service Civique 2022
Wikimédia France., Sponsor #2
Lingua Libre
Information
WebpageLingua Libre/Supports
Startsetembro de 2022
Ended17 de março de 2023 (6 meses)
SponsorsWikimédia France.
Roles
HierarchyAdélaïde Calais WMFr
Rémy Gerbet WMFr
FacilitatorYug
MainMélody Xu YANG WMFr
ContactMelody

The 2022-2023 Service Civique on Lingualibre is a 6 months mission within Wikimedia France, Paris, aimed to advance Lingualibre's outreach and partnerships.

A first phase would create supporting resources, testing and demonstrating communication avenues within the French ecosystem with the following axis:

  • Identify high-potential institutional partnerships for Langues de France
  • Design an outreach campaign, materials, with Wikimedia France's co-workers ; iterate with Lingualibre's community
  • Launch this campaign
  • Final report

These experiences will be leveraged to expand outreach to other communities demonstrating some activity and potential for growth.

Galeria de produções

2023 SignIt freelance

This Freelance was funded on budget from the French Ministry of Higher Education and Research's Wikimédian in résidence in Toulouse University. Hugo en résidence proposed a lean, agile, small scale freelance to former developper 0x010C to restore the video recording chain, successfully.

2023 Wikirésidence

Wikimedien en résidence 2023-2024.
Urfist Occitanie
MESR, Wikimédia France
Information
WebpageLingua Libre/Supports
Start13 de fevereiro de 2023
Ended12 de fevereiro de 2024
SponsorsMESR, Wikimédia France, URFIST
Roles
HierarchyAmélie Barrio, Mathieu Denel WMFr,
Rémy Gerbet WMFr
MainYug
ContactYug

The Wikiresidence at Université de Toulouse / URFIST Occitanie allowed User:Hugo en résidence to lead several pushes on Lingualibre, with a dual volunteer and official stance. A focus was put on SignIt, communication and formal collaborations (Occitan Whistle). In 2024, about 1.5 human-month were dedicated to leading the Lingua Libre GSoC24 described further below.

Data Evento Cidade Link
2023.05.28, 10:00–18:00 Forom des langues 2023 Toulouse – Démonstration IRL (stand)
https://commons.wikimedia.org/wiki/File:Forom_des_langues,_Toulouse,_2023-02.jpg
2023.05.28, 14:00–16:00 Toulouse Hack Toulouse – Information à un public cible
https://video.audiovisuel-participatif.org/w/c7631886-111c-4899-8c89-420797440c68
2023.07.29, 09:30–10:00 COSCUP 2023 Taipei https://commons.wikimedia.org/wiki/File:Lingua_Libre_SignIt_presentation-2023-COSCUP_Taibei.pdf
2023.08.18, 17:40-17:45 Wikimania 2023 Singapore https://commons.wikimedia.org/wiki/File:Lingua_Libre_SignIt_presentation-2023-Wikimania.pdf
2023.11.19, 10:30–11:00 Capitole du Libre Toulouse https://cfp.capitoledulibre.org/cdl-2023/talk/3ZQZTR/
2023.11.20, 14:00–15:00 CRL UT2J Toulouse https://commons.wikimedia.org/wiki/File:Lingua_Libre_URFI

ST-poster-fr.pdf

Whistled Occitan

Occitan Gascon is a French minority language with correct documentation and modest institutional support. In the frontline of this effort is Lo Congrès (https://locongres.org ), most notable with its multiple Occitan dictionaries, 25 000 Occitan recordings among which 4908 in Gascon. Hugo en résidence, DMontagne en résidence and Univòc64 lead since 2023 a follow up and complementary audio documentation effort on Aas whistled language. Other efforts include recording local villages names to reestablish and remind local names within their traditional territories, public communications at events and online.

Other supports

Poslovitch (1)

Lingua Libre Internship 2023
Wikimédia France, Sponsor #2
Lingua Libre
Information
WebpageLingua Libre/Supports
Start17 de abril de 2023
Endedagosto de 2023 (4 meses)
SponsorsWikimédia France
Roles
HierarchyAdélaïde Calais WMFr, Michael Barbereau WMFr, Rémy Gerbet WMFr
MainPoslovitch

The 2023 Computer Science Intership on Lingualibre is a 4 months internship within Wikimedia France, Paris. One sub-mission aimed to audit the feasibility of migrating Lingualibre's wikibase and its 800,000 items to the now available Wikimedia Commons wikibase.


Poslovitch (2)

Lingua Libre Internship 2023
Wikimédia France, Sponsor #2
Lingua Libre
Information
WebpageLingua Libre/Supports
Startsetembro de 2023
Endedabril de 2024 (4 meses)
SponsorsWikimédia France
Roles
HierarchyAdélaïde Calais WMFr, Michael Barbereau WMFr, Rémy Gerbet WMFr
MainPoslovitch

The Lingua Libre v3 proof of concept aimed to demonstrate Lingualibre on a minimal Python Django and JS (VueJS, Vite, NodeJS) stacks.

Healthy VueJS structure, unit tests, and best practice UML documentation were layed, as well as a leaner back end using MariaDB, successfully demonstrating the project's feasibility. An Ansible deployment system was also developed. Recommendations: More work needed to migrate other features and behaviors from the legacy code-base to this leaner stack.


Google Summer of Code 2024

These 2 projects designs and applications were initiated by Yug, with User:Poslovitch and Ishan Saini as technical co-mentors. Project application workload was about 4~6 workdays over a month mostly to understand the scope, how to proceed, where, and to write down the project description. Applicants reviews (~8 pers.), communication, issues assignments, assessment took about 6 workdays over 3 weeks. The coding period required 1.5 workdays a week from Yug and 2~4h/week for technical mentors. Kabir worked earlier in May & June, while Pushkar worked as encouraged in June, July & August 2024. Interns are funded by Google, Yug was mostly mentoring within the scope of his Open Science job funded by MESR France and URFIST Occitanie ; Ishan and Poslovitch were mentoring as volunteers, explaining the different workloads taken.

Lingua Libre Django migration

Lingua Libre GSoC24
Urfist Occitanie
Google Summer of Code, URFIST Occitanie
Information
WebpageLingua Libre/Supports
Start2024-02 – application
2024-06 – coding
Ended2024-09 – closure
SponsorsGoogle Summer of Code, URFIST Occitanie
Budget6,000€
Roles
HierarchyYug
MentorYug (project lead), Poslovitch (tech lead)
MainPushkar7077
ContactYug

In the field of Language diversity, Wikimedia Foundation and Wikimedia France have supported LinguaLibre.org, a single page VueJS application to rapidly record vocabularies of the world. Over 240 languages and 1.2 millions words have been audio recorded into Wikimedia sites through this open project. Current back end (wikibase, PHP, blazegraph) while interesting have shown limitations, mostly limited query speed, no API, stack opacity and duplication of data. A revamp have been engaged but requires further full stack work to be migrated into a maintainable code base, upgraded into an elegant service and pushed into production for willful native speakers.

Lingua Libre SignIt

Lingua Libre SignIt GSoC24
Urfist Occitanie
Google Summer of Code, URFIST Occitanie
Captura de tela.
Information
WebpageLingua Libre/Supports
Start2024-02 – application
2024-06 – coding
Ended2024-09 – closure
SponsorsGoogle Summer of Code, URFIST Occitanie
Budget4000€
Roles
HierarchyYug
MentorYug (project lead), Ishan Saini (co-tech lead)
MainGonFreeaks
ContactYug
ProposalSum upReportGithub logs.

Lingua Libre's mission has been extended to Sign Languages in 2019. Both a click-and-translate Firefox extension and a video recording studio have been developed. Both system UI exist in 35+ languages allowing the global documentation and learning of various sign languages. As Manisfest v2.0 extensions are being phased out, the project is under threat. A full revamp into manifest v3.0 and a modern extension structure would allow the project to be compatible with all web navigators. This project must navigate updated in browser web extension security constrains and new web extension API.

Indonesian languages recording project

See also WikiKata, Malaysia.
Indonesian languages recording projects
Lingua Libre
Wikimedia Indonesia, Sponsor #2
Information
WebpageWikiTutur on Wikikamus
Start2023
EndedOn going
SponsorsWikimedia Indonesia
Budget5,000$ / year
Roles
FacilitatorArdzun
MainUnknown
ContactArdzun (lead), Yug (support)

WikiTutur is a language preservation program by recording vocabulary pronunciation through the use of the Wiktionary and Lingua Libre. This project was previously run in Indonesia by volunteers from the Jakarta Wikimedia Community in 2023–2024 in collaboration with other local Wikimedia communities in Indonesia.

About 30+ Indonesian languages have been audio recorded, making it the most active Lingua Libre group as of early 2024. Project referent is Ardzun. Occasional second level support, guidances and languages creation is provided by Yug, mostly on Discord and Lingualibre.org.

Youth voices of Roussillon

Youth voices of Roussilon
Médiathèque of Canet-en-Roussillon, Sponsor #2
Information
WebpageUser:BiblioCanet66
Start2023
EndedOn going
SponsorsMédiathèque of Canet-en-Roussillon
Roles
HierarchyCulex
MainCulex
ContactCulex (lead)

Youth voice of Roussillon is an educational project using Wiktionary and Lingua Libre as contributive tools for local youth. Lead by the public mediathèque of Canet-en-Roussillon in a positive and encouraging atmosphere, it aims to valorize the students, allow them to discuss words, record them, document them on notorious digital commons. The project is lead by public library of Canet-en-Roussillon, via its director, who is also a Wikipedia administrator.

Wikimedian administrator User:Culex initiated the project in 2023 thank to his professional position and good relations with local schools. Leveraging its initial deep knowledge of Wikimedian projets, this local project was essentially lead by Culex without the usual secondary support.

Pushkar Winter '24-25

Lingua Libre Django Freelance 1
Wikimedia França
Wikimedia France, Sponsor #2
Information
WebpageLingua Libre/Supports
Start2024-10 – discussion
2025-01 – coding
SponsorsWikimedia France
Budget1,250€ (to verify)
Roles
HierarchyXavier Cailleau WMFr
FacilitatorYug
MainPushkar7077
ContactYug

Google Summer of Code 24's Django developer Pushkar is freelanced by Wikimedia France to code a week-long code-quality sprint. Immediate objectives were to

  1. demonstrate the feasibility of an extra-European Union freelance
  2. activate internationalization (i18n)
  3. split the code into a branding website (lingualibre.org) and the recording app (Lingua Libre App) repositories
  4. add other minor features.

Middle term objective was to progress further toward deploying Lingua Libre Django in 2025. Long term objective was to ease the distribution of coding workload between Wikimedian volunteers and freelancers, while also diversifying freelancing options. It fully integrates into the general objective of Lingua Libre Django which primarily aims to simplify the stack and lower the technical barrier for technical maintenance.

Active but non-funded coding support was provided by Yug and Aditya, see consolidation below.

Tasks list

T380121 : Lingua Libre development milestones, 2015Q1 coding by User:Pushkar7077.
T384903 : Lingua Libre development milestones, 2015Q1 coding consolidation by User:Yug and Aditya.

Pushkar Summer '25

Lingua Libre Django Freelance 2
Wikimedia França
Wikimedia France, Sponsor #2
Information
WebpageLingua Libre/Supports
Start2025-02 – discussion
2025-06 – coding
Ended2024-06 – closure
SponsorsWikimedia France
Budget~2,000€ (to verify)
Roles
HierarchyXavier Cailleau WMFr
FacilitatorYug
MainPushkar7077
ContactYug

Pushkar is freelanced a 2nd time by Wikimedia France to code a week-long 8 workdays sprint. We continued to consolidate the app for late 2025 deployment. Immediate objectives were to:

  1. switch to Oauth 2.0, an upgraded login system
  2. properly write metadata into SDC
  3. add other minor features.

Active but non-funded coding support was provided by Yug and Aditya, see consolidation below. Focus being on fixing UI glitches, consolidate i18n. Yug's mediawiki knowledge helped to add mediawiki API calls, gave SDC guidance and provided wikimedia-based data files.

Tasks list

T385385 : Lingua Libre development milestones, 2015Q2 coding by User:Pushkar7077.
T398352 : Lingua Libre development milestones, 2015Q2 coding consolidation by User:Yug and Aditya.

Yug '25 coordination

Lingua Libre Django Coordination
Wikimedia França
Wikimedia France, Sponsor #2
Information
WebpageLingua Libre/Supports
Start2025-02 – discussion
2025-09 – coding
SponsorsWikimedia France
Budget4200€
Roles
HierarchyXavier Cailleau WMFr
FacilitatorYug
MainYug
ContactYug

Mid-project summary.

Due to needs for a knowledgeable and versatile coordinator and the workload involved, WMFR freelanced Yug to accompany the coding cycles, manage translatewiki and other Wikimedia-specific processes. Growing pressure by funding actors (DGLFLF and WMFR's board) asking for live product caused an acceleration toward production level. Top objective is to deploy to production. Immediate objectives were to :

  1. project management with budgeting, task description, management of 0x010C & Pushkar7077 freelances, tests, occasional communication, reporting.
  2. upgrade MVP features to production-level, handling the complexity of real usages, app-wide reactive i18n with wikidata-powered labels, upload resilience, unique filenames, accessible keyboard navigation, etc.
  3. set up translatewiki crowd sourced translations
  4. assist deployments of app and homepage site
  5. add other minor features.

Active coding support was provided by Aditya, freelanced by Yug.

Tasks list (coding side)

T399374 : Lingua Libre development milestones, 2015Q3 coding consolidation by User:Yug and Aditya.

2025 Wikipages migration

2025 Wikipages migration
Wikimedia França
Wikimedia France, Sponsor #2
Information
WebpageLingua Libre/Supports
Start2025-02 – discussion
2025-12 – coding
Ended2026-01 – completed
SponsorsWikimedia France
Budget1800€ (16 workdays)
Roles
HierarchyXavier Cailleau WMFr
FacilitatorYug
MainYug
ContactYug

After 8 years of activity at https://LinguaLibre.org/wiki/, the decision was taken to migrate wikipages content to Commons.wikimedia.org. To do so, we created the LinguaLibre Inventory Tool tracking all wikipages that required migration. Knowledgeable review helped to select pages and to build templates and categories to better structure Lingua Libre's resources. Wikipages' XML were exported, processed, then imported with their logs. Numerous lists and categories were create via bot with systematic multi-parameters categorizations. While the whole effort create a better integrated resource and now supports a record of ~2,342 linguistic communities, Lingua Libre's ~170 Help:Lingua Libre wikipages still requires human-lead review and update. We encourage the community to explore and contribute with enthusiasm.

Migration

The MediaWiki migration user rights and tools come with certain limitations. Because the content is complex, we adapted our process as best as possible. Overall, the migration proceeded as follows:

  • Approximately ~5,000 wiki pages have been migrated (about ~170 pages and ~5,000 micro-translations).
  • About 1,600 lists generated by contributors were migrated.
  • About 1,900 List:{iso}/Unilex* lists were regenerated for Commons via bot for better tagging.
  • About 2,400 List:{iso}/Swadesh* lists were imported from Panlex, via the same tool
  • Structuring templates & about 2,500 categories were regenerated for Commons via bot rather than migrated.

Adaptations

New templates and categories have been coded to better integrate Wikimedia Commons' overall ecosystems. A bot was used to actively update all ~6.000 lists covering 2,342 languages with {{Lingua Libre list}}. This precisely categorizes our assets by language, content type, quality assessment, multilinguality.

Exclusions

  • The following pages were excluded:
    • Talk pages: Aside from 5–10 pages, most only contained the {{Welcome}} template; migrating these would require user consent.
    • User pages: These were excluded for the same reason, and to avoid conflicts with existing Commons User pages.
    • MediaWiki & Module namespaces: Script imports are forbidden by Commons/Wikimedia rules, and the CSS was not relevant to the Commons environment.
    • Lingua Libre:Chat_room: The XML archive is too large, causing the import tool to fail.
    • Help:SPARQL: This content required the MediaWiki Query extension, which is not available on Commons. The content is also less relevant there and has been kept on LinguaLibre.org for now.
    • Files:' These were already duplicates of files existing on Commons.
  • Other local cleanups: Dozens of wiki pages, templates, and categories were no longer relevant post-migration and were therefore deleted on LinguaLibre.org or excluded from the migration.

NEW

Wikimedia Hackathon 2025

Yug mentored the 3 young developers who are now the lead developers on Lingua Libre and Lingua Libre/SignIt to apply as first-timers to the WM Hackathon 2025's scholarship. Applying team is as follow :

  • Pushkar : focus on Lingua Libre Django deployment, polishing, and announcement. Note: the heaviest project, so we also explore a possible WMFR's support around 2025 Q1.
  • GonFreeaks : focus on Lingua Libre SignIt deployment (?) and announcement. Also sign languages extension via scrapping, with Yug.
  • Yug : support to the group, will help were needed. Team up with Kabir for signed video scrapping.

Result in early 2025.

50K Malayalam Words:A Lingua Libre Audio Corpus Project

50K Malayalam Words
India Rapid Project grants, Sponsor #2
Information
Start2025.09
Ended2026.02
SponsorsIndia Rapid Project grants
Budget1626.35 US$
Roles
MainBhagyaMohan
ContactBhagyaMohan
Netha Hussain (bot)

The 50K Malayalam Words is a pilote Lingua Libre project aiming for systematic audio recording of Malayalam words and phrases using the Lingua Libre recording tool, with the goal of contributing 50,000 high-quality audio files to Wikimedia Commons.

The project targeted and worked toward contributing 50,000 audio recordings of Malayalam words to Wikimedia Commons and Lingua Libre, directly addressing the underrepresentation of Malayalam in open digital audio datasets. These recordings are now freely accessible and reusable for applications such as speech recognition, language learning tools, screen readers, and NLP model training

Core workload

  1. Creating focus lists covering commonly used words, government/official terms, academic vocabulary, and everyday expressions
  2. Recording Malayalam word pronunciations using the Lingua Libre platform
  3. Uploading and categorizing all audio files on Wikimedia Commons under a Creative Commons license for free public reuse
  4. Coding a lingualibre-ml-wikt-bot (User:Netha Hussain) to spread the files to the local language's wiktionary

The project successfully recorded 50,022 words (c:Category:Lingua Libre pronunciation-mal), providing an open license, high quality audio dataset for Malayalam.

Lessons learnt

This pilote project was focused on learning by experience and made solid feedbacks, namely :

  • Feasibility of High-Volume Solo Audio Contribution : Managing 50,000 recordings solo is logistically demanding and risks burnout -> advise: Implement a structured daily schedule (300–400 words) and disciplined time-blocking.
  • Tools and Workflows for Efficient Contribution : Manual post-upload work is inefficient for bulk projects -> advise: Use Lingua Libre for automated Commons pipeline and prepare themed word lists in advance.
  • Impact of Individual-Led Projects on Open Knowledge Platforms : Under-representation of Indian languages in NLP and accessibility tools -> advise: Link recordings to Wiktionary and monitor reuse statistics to build cases for future funding.
  • Limits and Sustainability of Independent Work : High-volume solo work over 6 months leads to fatigue and lacks peer review -> advise: Secure financial grants to dedicate time and build in structured rest periods with milestone check-ins.
  • Platform Downtime : Mid-session crash cause data loss and require time-consuming re-recording -> advise: Record in smaller batch sizes to minimize the number of files at risk during technical interruptions.
  • Lingua Libre supports : Platform maintainers may be unaware of how downtime impacts high-volume workflows -> advise: Share workflow adaptations and outage reports via community discussion channels to improve platform reliability.

Ver também

Resources

List of Lingualibre temporary supports from Wikimedia France

  • Nicolas Lopez de Silanes – “Civical Service” in 2020
  • Pierre M. – 2023 dispatch on Lingua Libre

Material grants

Referências