Grants:Project/Intelligibility transcriptions

statusnot selected

Intelligibility transcriptions

summaryPlease help us make the best speaking skills instructional software available to everyone for free. Currently, English as a foreign language (EFL) students can use thousands of free web and stand-alone software applications for learning reading, writing, and listening. But speaking skills instruction is limited to expensive, cumbersome, and often inaccurate commercial software for pronunciation assessment. We need your help to collect transcriptions of audio utterances to complete a free, superior alternative for which we have already built software, and make it available to all English learners.

targetwiktionaries of the Indo-European languages, transitioning to tonal languages in follow-on work

type of grantresearch

amount25000 USD

type of applicantorganization

grantee• Srikanth Ronanki

advisor• James Salsman

contact• team@sphinxcapt.org

affiliateSphinxCAPT.org, the CMU Sphinx / Google Summer of Code Computer Aided Pronunciation Training Team

this project needs...

volunteer

give feedback

join

endorse

created on03:40, 13 March 2017 (UTC)

Friendly space expectations

Abstract: We plan to produce free, interactive language pronunciation assessment and remediation software which may be able to improve students’ pronunciation of words six times faster than commercially available products. Millions of people worldwide currently wish to improve their pronunciation in order to gain access to better jobs and succeed at more opportunities to speak in public, on teleconferences, or to groups. Unfortunately, companies which charge for this service often frustrate students by putting too much emphasis on inconsequential mistakes. So this year we are building on the open source software we have released in our past Google Summer of Code efforts to produce the most efficient, full-featured pronunciation training software, with your help.

Project idea[edit]

What is the problem you're trying to solve?[edit]

Optimizing pronunciation assessment performance and the accuracy of remediation. We hope to obtain an interface like this Adobe Flash demo (source files; ActionScript code) to failover in case this nicer WebRTC/GetUserMedia recorder isn't working with these live student WebRTC demos. And maybe we can get tech help for mobile apps that don't need an internet connection to score pronunciation, but can always use a server to improve the quality of their pronunciation score and remediation feedback if they can reach it. We can put that on Tool Labs, but we need to raise the money to collect the data to build the server and the apps.

We plan to do single words on Wiktionary and have a link for people who want to register and/or use their OAuth to keep track of which diphones they are above or below average on to select practice words and phrases. If we are very successful, we will also include support of grammatical, morphological, diphone-based and ability-based criteria for adaptive instruction. The grantees include the GSoC co-mentor Ronanki, and may include a TDB GSoC student.

What is your solution?[edit]

Here is an example of the kind of form we intend to pay people to fill out. Most of the fields will be automatically filled out after registration. ("Gender" may be renamed "biological sex" and will be optional like most of the fields.)

Please see these two papers which independently arrived at the same solution that we prefer:

Loukina, et al. (September 2015) “Pronunciation accuracy and intelligibility of non-native speech,” in InterSpeech-2015, the Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association (Dresden, Germany: Educational Testing Service.)

Kibishi, et al. (May 2014) “A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers,” ReCALL (European Association for Computer Assisted Language Learning.) doi:10.1017/S0958344014000251.

Please see also these references. A 2002 paper by A. Raux and T. Kawahara may have arrived at similar solution, but the details remain unclear after communication with an author.

Project goals[edit]

Improve speaking skills by improving resources, tools, and obtaining data to assist people in improving their speaking skills.

Project impact[edit]

How will you know if you have met your goals?[edit]

We will use integrated pre- and post- tests to measure resulting increased instructional productivity as per https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation/#draft-gsoc-2017-proposal

Do you have any goals around participation or content?[edit]

We are already supported by the Google Summer of Code project, and have experience supporting Foundation work with the Google Summer of Code. We plan to start with pronunciation assessment content on the English Wiktionary, show other volunteers how to localize the system for their language, and then move on to the tonal languages.

We agree with the three shared metrics, and prefer the third, the number of content pages created or improved, across all Wikimedia projects. Our project is likely to be judged exceptionally well by all three metrics.

Project plan[edit]

Activities[edit]

Please see: https://summerofcode.withgoogle.com/organizations/6234667528224768/

which links to: https://cmusphinx.github.io/wiki/projectideas#intelligibility-remediation-for-pronunciation-assessment

We hope to give a demo at the International Speech and Communication Association's seventh Workshop on Speech and Language Technology in Education (SLaTE 2017) in Sweden late August after Wikimania, where we want to ask international experts to support integration with Wiktionary.

Budget[edit]

We need to use Amazon Mechanical Turk (but can use other services such as Google AdWords to recruit workers and a custom Mechanical Turk substitute to administer their work, if requested) to collect listeners' transcriptions of students' audio utterances we already have, to use of logistic regression or similar techniques to predict which pronunciation mistakes interfere with comprehension, and which are merely inconsequential. The more transcriptions we can collect, the better we can make such predictions, so the better and more efficient our remedial feedback will be for English learners. It costs about $0.50 USD per transcription for high quality work from Mechanical Turk, and we need to collect about 50,000 transcriptions covering 1,000 Basic English words and phrases for accurate predictions. So we are trying to raise $25,000 towards this project by the time GSoC work starts in May. If we collect less money, we can offer fewer phrases, and if we collect more, we can improve the accuracy of our intelligibility predictions, saving more time for learners.

Community engagement[edit]

We have asked the English Wiktionary community for the conditions under which they would put microphone inputs in the pronunciation sections in Wiktionaries with an additional widget after the "Audio" items such as:

Try saying: [Record 🔴] [Stop ⬛] [Play ▶️] [Evaluate ❓] [Say in phrase 🔗]

Ideally, the "Evaluate" button will produce audio feedback with an option to also view pertinent visual information. The link to say the word in a phrase may use OAuth or other registration to keep track of the user's word and diphone proficiency for adaptive instruction in general vocabulary. There is likely to be a small audio level meter between the Record and Stop buttons.

All of the collected transcriptions along with the audio utterance files on which they are based will be released to the public domain for anyone to use and made available in the CMU Sphinx source repositories. The collected data and resulting software will be announced to donors at least two weeks prior to public release.

Get involved[edit]

Participants[edit]

James Salsman, Srikanth Ronanki, and the rest of the CMU Sphinx / Google Summer of Code Computer Aided Pronunciation Training Team

Community notification[edit]

Endorsements[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

This could add a lot of helpful functionality for the Wikimedia community! 184.99.245.212 20:03, 4 April 2017 (UTC)
I will be taking part and guiding this project towards usability on Wikipedia and other Wikimedia projects as well as aiding with questions regarding anatomy. I feel this could benefit our projects in the future, and really help make Wiktionary relevant for language learners. CFCF 💌 📧 23:31, 4 April 2017 (UTC)
I will be working towards development of the pronunciation intelligibility evaluation tool which will simultaneously monitor learners' motivation level. I believe that this tool will complement Wiktionary's attempt to provide all the relevant linguistic tools to people who want to either learn languages or use its structure for further research. Brij.
The project proposes an extremely important set of functions. I support the idea and would like to contribute in the future Gorinars (talk) 05:02, 5 April 2017 (UTC)
This project will provide valuable insights for language learning and other tasks. I would also be interested in applications of this work for improving synthesized speech. -Sarah
This project will be highly beneficial to users trying to learn a new language and improve their speaking skills but are unable to find a tool to adequately help them. Wiktionary itself will become far more useful to the community through this project. -Sahith
This software would be a nice free tools available to million of users who wish to improve their pronunciation as well as an assessment tool to measure the same for candidates. SAURABHIMA (talk) 16:13, 5 April 2017 (UTC)
This project will create a solid foundation for a new kind of educational content available in the public domain. 185.5.8.146 20:29, 6 April 2017 (UTC)