User:Yug/Grants:Project/Yug/Wikitongues on

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Project Grants This project is funded by a Project Grant


Lingua Libre
LinguaLibre LOGO-04 (cropped).png
summaryBring the Lingua Libre project to the next level by moving it on a Mediawiki/Wikibase instance.
targetWikimedia Commons ; Wiktionaries
amount30 600 EUR (35 990 USD)
contact• wm(_AT_)
this project needs...
created on16:24, 20 July 2017 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Pronunciation audio files are a big help for readers to learn how to pronounce a word, but recording them is very time-consuming for volunteers and need many skills. Pronunciation audio files is of interest for at least 4 Wikimedia projects: Commons (obviously, see WikiProject Pronunciation), Wiktionaries (Template:audio on en, Modèle:prononciation on fr,...), Wikipedias and Wikidata (P443). But only 3% of the Wiktionary entries have an audio record, and it's even less for Wikipedia articles or Wikidata items. The Lingua Libre project began two years ago in order to facilitate such recordings and encourage contributions in minority languages into Wikimedia projects. It is supported by a team of contributors having different backgrounds and with the help of Wikimedia France.

A lot of progress has been accomplished and the tool is today functional and able to realize massive open audio recordings in many languages, but it's effectiveness nowadays is still limited. While the tool by itself is very efficient – it can record up to 1000 words per hour – the process afterwards, transferring the files to Wikimedia Commons and integrating them into the Wikimedia projects, is quite lengthy. Main reasons are the not existing integration with the Wikimedia projects (users need create a separate account on a separate website, there's no automatic import process, etc), a French-only interface and a not easily searchable sound database. These elements are wasting a lot of volunteers' time and energy, loosing many volunteers on the way.

What is your solution to this problem?[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

To bring Lingua Libre closer to the Wikimedia movement and thus facilitate and encourage contributions, we plan to move the tool to a wiki-based infrastructure. This solution seems very natural for several reasons, including:

  • it will be a well-known environment for experienced Wikimedians
  • it offers many solutions for the needs we have
  • it ensures the sustainability of the project (thanks to it's contributors, MediaWiki is a stable and constantly evolving platform), and the daily-management will be ensured by contributors themselves (like on Wikimedia projects)

To store rich and linguistically useful meta-data on each records (accent, dialect, gender of the speaker, native language or not, approximate location,...), we also plan to use the Wikibase extension allowing us to have an open, flexible and easy to explore database (those meta-data on files are not in the scope of Wikidata, and Structured Data is not willing to arrive on Commons before 2020). This will allow a much more convenient reuse of these freely licensed pronunciation record by Wikimedians or third parties, like linguists, language teachers or so on.

Beyond that, we will have to adapt our "recording studio" to work on this new infrastructure. We wish to transform it into a Mediawiki extension (something similar to the UploadWizard, but for direct sound recordings, a kind of RecordingWizard), so that other wikis using Mediawiki can benefit of our work.

Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

  • Create a recording Mediawiki extension, which can be used by any other wiki
  • Facilitate mass recording, upload of files to Commons and their reuse on other Wikimedia project
  • Let the general public, Wkimedians or linguists research and visualize the pronunciations sounds, using a rich metadata system
  • Encourage people from small languages community to step into the dynamic of contribution in the Wikimedian universe

Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
    Nowadays, Lingua Libre gather approximately 130 contributors and has been use to record around 10,000 words in 30 languages. We planed that one year after the start of the grant project (~4 month after it's end), we will have 400 contributors and 150,000 files recorded (doubling the number of records currently on the English Wiktionary) in 50 languages.
    Furthermore, at the end of the grant period, we will launch an online survey to capture the satisfaction of the contributors, asking them among others: "Do you find the pronunciation recording with Lingua Libre easy ?", hoping that at least 85% will agree with.
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)
    Allowing contribution to Wikimedia's project in any languages is the objective of Lingua Libre. By improving it, we want to increase participation in minority languages. The contributory approach through voice will allow new people to enter more easily into the Wikimedian universe.

Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

Briefsäckel (a post box in Alemannisch), recorded using the Lingua Libre tool.
  • Number of content pages created or improved:
    • 150 000 Pronounciation files uploaded to Commons
    • 70 000 Wiktionary entries improved with audio records

Project plan[edit]


Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

Lingua Libre actual recording studio in use
Develop a new recording Extension for MediaWiki

As Lingua Libre already has a recording tool (which we are very proud of), we will use it as a basis for this new MediaWiki extension, but a lot of development is still needed. We will set up a dedicated page containing a step-by-step procedure (a bit like the UploadWizard), letting the user choose the license, the recording method, the list of words/texts she/he will record and several meta-data information. The core of the tool will also be adapted to fulfil MediaWiki's upload process. The user interface will also require some adaptation to fit MediaWiki's environment and respect the guidelines (overhaul to use OOjs-ui, internationalization,...).

Migrate Lingua Libre on a wiki-based architecture

Setting up this wiki will require many developments in all corners. Using both OAuth for the authentication (for contributors to be able to use their Wikimedia account) and Wikibase (to store the meta-data) will require some configuration and adaptations for our use case. We will also need some specific on-wiki JS scripts to help people navigating and contributing (for example to generate random words lists to record in a given language, words lists containing the names of nearby places, override some Wikibase functionalities to be more easy to use in our case, generate on-the-fly statistics, etc.). To maintain a visual identity of our own, we will integrate a proper skin, based on mock-ups we already have (on top of an already existing skin, maybe timeless). Finally, we will have to import all the records and associated meta-data we have in our current database, so that previous work doesn't get lost.

Reuse on Wikimedia wikis

Once we have the infrastructure to perform our mass audio recordings, we still need to make Wikimedia projects benefit from it. Direct upload from the user to Commons is not possible, for two main reasons: Lingua Libre is an external tool (we so have Same-origin policy issues) and the browser audio recording API currently only produces .wav files (which are not allowed on Commons, and thus need a conversion). That's why the recorded files will have to pass through our server, before establishing an OAuth connection with Commons for the final upload. As we aim to provide the most easy-to-use workflow possible for contributors, this step will be transparent for them. We will also have to map the collected meta-data to fill descriptions and categories. For communities which will ask for it (currently the French Wiktionary, but ideally all the Wiktionaries and Wikidata, we'll see with them what they think of it), we will develop bot-tools that will pick up freshly uploaded files and put them on the corresponding entry (when it's relevant). With that, end-users can focus on speaking words, all the following annoying tasks will be done automatically.

Recording session during the French Wikiconvention 2016
Data exploration and visualization

Having a lot of useful meta-data and locking them up is useless. We want to make a focus on meta-data by setting up a dedicated SPARQL endpoint to let people create their own visualizations. We will also create a couple of turnkey queries (display the pronunciation of a given word depending of the location shown on a map,...) in a user-friendly interface for the general public.


Outreach, communications and learning are the key points that will allow a small community to federate around this project. We will carry out several actions for this purpose:

  • Weekly reports by mail and on a wiki page will inform interested contributors of the progression of the project
  • We will keep village pumps of several wikis informed each time we succeed significant steps (something like 3 or 4 times during the grant period)
  • Specific communication will be made to invest small languages wikis a bit before the end when it starts to become really functional
  • Have talks and workshops during several (Wikimedia-related or not) events to promote and encourage contributing

Note that all the code written with the help of this grant will be available under the GPL license and accessible on a public git repository.


How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

The project is scheduled to run 27 weeks (6 months and a half) from December 2017 to June 2018, only programming tasks as an independent developer are counted in the budget, nothing is requested for the project management/administrative part.

Description Days Total
(200 EUR/d)[1]
Develop a new recording Extension for MediaWiki (RecordWizard)
1 - Take back and adapt the existing Lingua Libre's recording studio 4 800
2 - Change the interface to use OOjs-ui 7 1400
3 - Turn it into a MediaWiki extension 14 2800
4 - Improve the recording studio with alternative recording methods 9 1800
5 - I18n 2 400
6 - Develop a guided tour (using the GuidedTour extension) to help newcomers 5 1000
Total 41 d 8200 €
Migrate Lingua Libre on a wiki-based architecture
7 - Setup of a dedicated MediaWiki and Wikibase instance, with the newly-created recording extension 3 600
8 - Use OAuth for the authentication 5 1000
9 - Define the RDF Schema 4 800
10 - Create a dedicated MediaWiki skin 12 2400
11 - Develop specific on-wiki JS scripts to facilitate the navigation and the modification of items 18 3600
12 - Initialize the wiki with all the necessary basic wikibase properties and items 2 400
13 - Import all our existing sound records in the new database 4 800
Total 48 d 9600 €
Reuse on Wikimedia wikis
14 - Setting up OAuth to allow uploading sounds to Commons 10 2000
15 - Develop bot-tools to add the uploaded sounds to articles on wikis that asked for (currently the French Wiktionary) 15 3000
16 - Contact other communities to extend the reuse of these sounds 4 800
Total 29 d 5800 €
Data exploration and visualization
17 - Setup a SPARQL endpoint 10 2000
18 - Create turnkey SPARQL queries 7 1400
Total 17 d 3400 €
Travelling expenses
19 - Presentation and workshops at Wikimania 2018 (two person) / 3000
20 - Presentations and workshops across France focused on local languages (Breton, Alemannic, Franco-Provençal) / 600
Total / 3600€
Total project 135 d 30 600 €
Total in EUR
30 600 €
Total in USD
$35 990[2]

  1. According to a specialized website [1], the average price of a junior fullstack developer in France is 287 euros per day. As this project represents several months of work, I ask for 200 euros per day (including 23% professional charges payable in France).
  2. on the basis of the exchange rate the 24 October 2017

Community engagement[edit]

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

Lingua Libre already has a small community of Wikimedians and linguists involved. All these developments will be carried out in collaboration with them, in an iterative pattern. This will allow them to give feedback on a regular basis, and thus make them actively participate in reflections on the growth of the project.

Furthermore, we will inform some communities (Wiktionaries, Commons, etc.) on a regular basis on the major achievements done on this project to increase community awareness.

Interested volunteers may take the following roles:

  • Become a Lingua Libre ambassador in a specific community
  • Help translating the website in as many different languages as possible
  • Test the new features, report bugs, give feedbacks and make suggestions

Get involved[edit]


Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

Role Name Notes
Grantee 0x010C,
a.k.a Antoine Lamielle
Wikimedian volunteer since 2014, sysop on the French Wikipedia, regular Commonist. He is a software engineer, working on many on-wiki technical stuffs (maintaining and developing bots, gadgets, templates, modules).
Advisor Nicolas Vion First developer of Lingua Libre, also known for being the core developer of the Shtooka project (which also deals with massive audio recording).
Advisor Lyokoï French Wiktionary admin, connoisseur of regional french languages. Co-founder of Lingua Libre, works on the project architecture and his promotion.
Advisor Yug Community coordination, international workshops (hackathons, wikimania), chinese lexicography.
Advisor Xenophôn Community coordination, national workshops.
  • Jayreborn (talk) -- Volunteer i have already contributed to LinguaLibre and i like to contribute more for TAMIL.
  • Volunteer CoolCanuck (talk) 02:04, 28 October 2017 (UTC)

Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?


Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).