Talk:Community Wishlist Survey 2022/Generate Audio for IPA

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

The following Wikimedia Foundation staff monitor this page:

In order to notify them, please link their username when posting a message.
This note was updated on 09/2023

Project Announcement and Feedback[edit]

Contributors who engaged with this Wish's proposal

Rollo Rosewood Akathelollipopman Eptalon Noé Xavier Dengra Akathelollipopman Noé Pigsonthewing Ainali Modest Genius Pigsonthewing 1234qwer1234qwer4 Nachtbold Xaosflux Femkemilene Wskent Bischnu Akathelollipopman Vis M Yodin Matě MrMeAndMrMe UV Daud I.F. Argana Huji Sdkb Ottawajin Lectrician1 Tmv Tranhaian130809 Celerias Meiræ Spiros71 NguoiDungKhongDinhDanh Javiermes Aca Dexxor Ed6767 Lollipoplollipoplollipop Omnilaika02 ToBeFree

Thank you for all of your feedback and for engaging with the original proposal for this wish. I wanted to make you aware that we have begun our work on this wish and, if your capacity allows, we would love any input you have on our Open Questions as well as our initial investigations into the engines.

Here's a corpus of IPA audio we have tested. Please let us know if you have any words you would like to test in this testing corpus. We will work on adding those words to our corpus!
Here's technical investigation of the IPA options and the languages supported by each option.

Thanks again for engaging with this impactful wish and for participating on the wishlist.
Best, NRodriguez (WMF) (talk) 18:01, 20 May 2022 (UTC)Reply[reply]

Contributors who engaged with this Wish's proposal

Nw520 Pelagic Wostr Gusfriend Ali Imran Awan TheInternetGnome Minorax Man77 NightWolf1223 HynekJanac L235 Libcub Teratix Penalba2000 JAn Dudí Lrkrol Sadads Bencemac Mbkv717 Stwalkerster Dave Braunschweig Trey314159 Labdajiwa Thingofme Pppery Hià Paradise Chronicle Serg! Camillu87 Geertivp Amorymeltzer Aimwin66166 Rotavdrag Paucabot WikiAviator Daniel Case Wutsje Ninepointturn Bilorv Pi.1415926535 DarwIn Feoffer Tomastvivlaren Kpjas SD0001 Lambsbridge Paul2520 Waldyrious Bestoernesto Michael Barera Vulphere Ericliu1912 Emaus KnowledgeablePersona Beta16 Bodhisattwa Pbsouthwood DaxServer Cybularny Quiddity Sunpriat Gaurav Jl sg Evrifaessa Valerio Bozzolan Brainulator9

NRodriguez (WMF) (talk) 18:08, 20 May 2022 (UTC)Reply[reply]

Open Questions[edit]

Can you help us build out the corpus of IPA words we will use to test the different libraries?[edit]

  • Has any tonal languages been included? I don’t think I see Swedish or any Chinese language, for example, but maybe there are some tonal languages in the corpus that I don’t recognize. Also, is the current corpus including unusual consonants or vowels? I have tested eSpeak myself and know that it cannot handle Cantonese (it cannot pronounce the syllabic m; I tried to figure out how to fix it but there’s really no documentation). Al12si (talk) 14:44, 12 November 2022 (UTC)Reply[reply]

Do you know of any open source libraries that we should consider while we investigate our options?[edit]

Do you see any risks to introducing the video files inside the reader experiences?[edit]

  • "Video"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:21, 27 May 2022 (UTC)Reply[reply]
    I believe this is regarding the software extension used to play media files. There's a specific task for making the player display in a desirable way, at phab:T122901 (versus the full audio-player as currently used at d:shibboleth, or the icon+"listen" links as used at w:Shibboleth).
    The only risk I see is making sure the design is good: I.e. everyone (incl. screenreaders?) can access the audio-clip without leaving the page, but also still have access to the file/license info if desired. (@TheDJ:FYI) HTH. Quiddity (talk) 17:27, 27 May 2022 (UTC)Reply[reply]
  • I think the main issue with this feature is that it could display a false standard accent, making English projects sound more USA-centred, French projects sound more France-centered, Spanish projects sound more Madrid-centered and so on. A scripted sound can be prototypical, with approximate sounds for each consonants and vowels, an audio can't, audio fixes one version, with subtile traits such as length, highness, openness of vowels, pitch and others. There is no generic or neutral pronunciation. One way to deal with this issue may be to display several audio for each IPA, with regional distinctions. In addition with a preset for users to have in first their own local use, it may be interesting and less oppressive. Anyway, I am interested by this feature and I really hope you will make your UX tests public -- Noé (talk) 15:55, 7 November 2022 (UTC)Reply[reply]

Let us know any other thoughts you may have on the initial problem statement...[edit]

The Wikivoyages have phrasebooks. They don't use IPA – see voy:en:Wikivoyage:Phrasebook article template#Pronunciation guide for the English version; the other languages are similar – but it might be a useful source of words, and it's possible that getting IPA-based audio would encourage people to add IPA there. In the past, we've talked about both the value of IPA to some readers and need for audio (specifically, being able to hear the IPA without loading another page or covering up the text you're reading). Whatamidoing (WMF) (talk) 18:15, 30 May 2022 (UTC)Reply[reply]

Google Cloud dependency?[edit]

Is it the case that this feature is dependent on closed-source software in the Google Cloud, or is it independent and self-hosted? HLHJ (talk) 16:56, 15 October 2022 (UTC)Reply[reply]

Currently, yes. The open source solutions we found only supported a handful of languages, and didn't sound remotely as accurate as Google's TTS service. Rest assured this all done through the backend, and even then through a proxy, so no user data ever gets to Google. Longer-term we hope to switch back to open source once language support and quality is good enough. That is being tracked at phab:T317274. MusikAnimal (WMF) (talk) 03:13, 17 November 2022 (UTC)Reply[reply]


@MusikAnimal (WMF) and @Whatamidoing (WMF) and @NRodriguez (WMF), can you please fill in/update Community Wishlist Survey 2022/Generate Audio for IPA#Release timeline ? —TheDJ (talkcontribs) 12:35, 23 November 2022 (UTC)Reply[reply]

@TheDJ: I've made a start and will do some poking ~TheresNoTime-WMF (talk) 20:43, 23 November 2022 (UTC)Reply[reply]

Am I missing something?[edit]

Am I misunderstanding something? I have just tried this in my af.wiktionary sandbox and the markup:

<phonos ipa="ˈbɜːrmɪŋəm" text="test" lang="en-GB" />

is pronounced as "test"

Both of these alternatives:

<phonos ipa="'bɜːrmɪŋəm" text="" lang="en-GB" />
<phonos ipa="'bɜːrmɪŋəm" lang="en-GB" />

generate an error: "The generated audio appears to be empty. The given IPA may be invalid, or is not supported by the engine. Using the 'text' parameter may help.".

How can a user ensure that the IPA is parsed and pronounced? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:04, 2 February 2023 (UTC)Reply[reply]

@Pigsonthewing: the "text" parameter is not a label, but is the written word in the language that is specified in the lang= paramter. See also mw:Help:Extension:Phonos. What is the word you are trying to produce, I can try to show you an example. — xaosflux Talk 19:53, 2 February 2023 (UTC)Reply[reply]
@Pigsonthewing think I figured it out, see testwiki:Birmingham, is that what you were trying to achieve? — xaosflux Talk 20:02, 2 February 2023 (UTC)Reply[reply]
Thank you, but no. My point is that the template is not - apparently - parsing the IPA, but the value of the "text" parameter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:10, 2 February 2023 (UTC)Reply[reply]
@Pigsonthewing I think the documentation needs a lot of work and opened phab:T328705 about it. — xaosflux Talk 20:48, 2 February 2023 (UTC)Reply[reply]

Could we have a response, here, please, from User:NRodriguez (WMF), User:Whatamidoing (WMF), User:MusikAnimal (WMF), User:TheresNoTime-WMF, or one of the other WMF folk working on this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:26, 26 February 2023 (UTC)Reply[reply]

Try ˈbɜːmɪŋəm or (even though en-GB is based on a non-rhotic accent) ˈbɜːɹmɪŋəm. The list of accepted phonemes is here and <r> is not one of them. Nardog (talk) 01:25, 27 February 2023 (UTC)Reply[reply]
Accepted by whom? The IPA I quoted above was copied from en:Birmingham. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:51, 27 February 2023 (UTC)Reply[reply]
By Google's text-to-speech engine, which Phonos relies on. So the description of Phonos as IPA-to-audio is somewhat misleading—it's really text-to-speech that sometimes accepts IPA as a bonus. The Google TTS supports IPA as input for only a subset of all supported languages (18 out of 53 to be exact). It also accepts not IPA but Pinyin and Jyutping for Mandarin and Cantonese. I've been advocating for renaming ipa="" and making it optional and supporting other phoneme schemes (Pinyin, Jyutping, and X-SAMPA), but they haven't made it clear they're doing it, which is super weird because doing so allows them to support with no extra cost 35 more languages, which include the 2nd, 6th, 7th, 8th, 9th, and 10th most widely spoken languages. Nardog (talk) 13:42, 27 February 2023 (UTC)Reply[reply]
@Pigsonthewing: as Nardog mentions, the voice models provided by Google (our currently-selected text-to-speech engine) only support certain phonemes and as such will "fall back" to reading the text parameter if an unsupported phoneme is provided in the IPA.
Unfortunately, we don't know how Google's voice models are implemented, but the current standard seems to be VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech)[1] — as we skip the step of phonemization (converting text to phonemes) by directly supplying the phonemes in the IPA, we need to ensure we input only phonemes which the voice model has been trained on. Additionally, when a model is trained, we don't always know the exact use certain phonemes are assigned — ə for example, is often used in at least 3 conflicting ways.
When building a tool such as this, we are limited by both the international phonetic alphabet (something I had only recently learnt from an impromptu chat with computational linguist Dr. Angus Andrea Grieve-Smith can be considered to "fall short of the ideal consistent representation that was sold to people"[2]) and the publicly available voice models.
As an aside, I recently spoke to Alan Pope, on whom a fairly robust voice model has been trained[3] — his blog post on the matter is a wonderful read for anyone interested in this part of the process! Of note is his voice models' supported phonemes.
I hope this goes a little way to highlighting the complexity, and resultant limitations, of what we're trying to do and I'd be more than happy to answer any further questions you may have. — TheresNoTime-WMF (talk • they/them) 14:39, 27 February 2023 (UTC)Reply[reply]
P.S. Way out of scope here, but wouldn't it be awesome to train our own voice model using a dataset provided by LinguaLibre? — TheresNoTime-WMF (talk • they/them) 14:54, 27 February 2023 (UTC)Reply[reply]
Though it is a common misconception that the IPA is "the ideal consistent representation"—so common that my enwp user page dedicates a section to it—it was never sold as such by the IPA (the association) itself. It was already telling you to "leave out everything that can be explained once for all" in 1904!
Out of curiosity, can you tell me what the three conflicting ways ə is used by Google? It might simply be that they correctly understand what a phoneme is: an abstract category encompassing multiple sounds (aka phones) in complementary distribution. But if not it has implications on template implementation when it's rolled out to major wikis. Nardog (talk) 15:56, 28 February 2023 (UTC)Reply[reply]
Maybe we should say the opposite, that Wikipedia doesn’t know what a phoneme is. The telling thing is that on Wikipedia most IPA is notated as phonetic, not phonemic. I have no idea who made this decision and why. Al12si (talk) 01:30, 23 March 2023 (UTC)Reply[reply]
Yes, ok, it's complex, but the Wish is called Generate Audio for IPA and the team claimed that they were working on that when attempting to cover the total failure of the Wishlist system some months ago. Theklan (talk) 21:42, 10 July 2023 (UTC)Reply[reply]


Not what was required[edit]

The proposal was for an IPA-to-audio renderer. It is apparent that what is being built is largely a plain-text-to-audio renderer. This is not what was requested, nor what is required. Rendering a text value will not allow anyone to know whether the IPA is correct, nor what the IPA is intended to sound like. It will not allow comparison of two different IPA representations of the same text lexeme. If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:06, 22 June 2023 (UTC)Reply[reply]

@Pigsonthewing You mentioned It is apparent that what is being built is largely a plain-text-to-audio renderer. Is this bold conclusion solely from the update posted today 22 June 2023? Or it's from something you have observed so far including the pilot wikis? Please let me know, so this can be cleared up.
This project is still about Generating Audio for IPA. ––– STei (WMF) (talk) 13:59, 22 June 2023 (UTC)Reply[reply]
Both today's update and the section above this one. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:31, 22 June 2023 (UTC)Reply[reply]
And also the current usage and examples. The deployment status is not about IPA rendering, is about an inline player, which is another wish. Theklan (talk) 21:32, 10 July 2023 (UTC)Reply[reply]
That never made sense anyway. The vast majority of IPA transcriptions are phonemic or allophonic transcriptions, which are language-specific and convey only selective information about exact articulatory configurations, omitting specifics that are either predictable according to the phonology of the language or irrelevant to the discussion at hand (see Handbook of the IPA, pp. 29–30). That means speech synthesis that directly derives audio from symbols is not an option (I guess unless you painstakingly recreate all the omitted parts in input for the audio to accompany each simpler, more legible transcription). So the only way that's humanly possible is language-specific text-to-speech. And it so happens that the only kinds of text-to-speech that don't sound horrendous are machine-trained ones, which typically accept IPA as input for only a portion of the supported languages (Google's, which CommTech initially went by, supports it for less than a half of all supported languages).
Then there are competing conventions. As the Handbook (p. 30) points out, /iː/ and /ɪ/, /iː/ and /i/, and /i/ and /ɪ/ are all valid ways to represent the vowels in heed and hid that are all "in accord with the principles of the IPA". So you can't tell whether /i/ is supposed to sound like the vowel in heed or hid just by looking at it. That means, even if you know what language is being transcribed, you can never tell if the resultant audio is correct without knowing the underlying context and conventions.
The very premise of the CWS wish was an untenable one, which is why I didn't vote for it and I suspect why (AFAICS) nobody who is actually a frequent editor of IPA transcriptions did. But CommTech didn't know that when they began working on it. Nardog (talk) 16:45, 22 June 2023 (UTC)Reply[reply]
"If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined.". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:10, 22 June 2023 (UTC)Reply[reply]
You asked, as a reader, for a feature that made reading IPA redundant. You proposed automatic generation of audio from IPA, which is infeasible, as the means to accomplish it. That doesn't mean there aren't other means that can make reading IPA redundant for readers, like human editors manually inputting a prompt to generate audio, judging its quality, and adding it. Nardog (talk) 19:05, 22 June 2023 (UTC)Reply[reply]
I did not. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:26, 22 June 2023 (UTC)Reply[reply]
You didn't what? And whether my summary of your proposal is accurate or not, voters and CommTech certainly seem to have interpreted it that way. Nardog (talk) 01:59, 23 June 2023 (UTC)Reply[reply]
Sorry, but we are discussing here about a wish called "Generate Audio for IPA", which is not being done. Also, in the discussion we had this year about the lack of wishes fulfilled, the WMF team said that the "Generate Audio for IPA" was coming. Which is not. Theklan (talk) 21:34, 10 July 2023 (UTC)Reply[reply]
I've been watching this ad-nauseam over the last couple of months... and there are a few editors here who are disproportionally represented and attempting to influence what this feature should or should not be. I heavily advise inviting those who voted for the feature to give their opinion on what they want, with the information and experience that has been collected, as otherwise what has been built will likely not be accepted by those who asked for it.
Secondly, while personally I fear this is going to turn into a tool to fight the American vs British vs Canadian English wikiwars, I think it's important to realise that the general public probably won't care at all about IPA. It's my opinion that they only need a pronunciation and the whole IPA business can be removed from the lead as far as they are concerned. So even if we have gathered better feedback from more than the 4 people on this page, it is probably worth it to ask the general public what THEY want.
All in all, this seems a very good demonstration of why the Community Wishlist survey should be limited to smaller projects instead of these massive complicated projects that generally make it into the top 10 and why editors should not be doing product development. —TheDJ (talkcontribs) 13:42, 28 June 2023 (UTC)Reply[reply]
the general public probably won't care at all about IPA That's exactly why I advocated for making Phonos about generic text-to-speech rather than strictly about IPA-to-audio, which they turned down on the grounds that it was "not in the roadmap". It's alarming to me that they're still saying it's "about Generating Audio for IPA" despite the fact, according to this page, the project is supposed to address readers' inability to read IPA markup so generic TTS that supports more languages would clearly be a better solution. I hope they only mean that the CWS project is about IPA-to-audio and the Language team picks it up to make something that makes more sense. Nardog (talk) 16:36, 28 June 2023 (UTC)Reply[reply]
  • I voted for this, and think the primary benefit is that readers may want to know how a word should be pronounced. Many projects have spent considerable effort annotating these words with IPA - so an IPA-->sound solution could be useful, but I think the core benefit to the reader is just being able to hear the word without contributors recording and uploading audio files manually for each word to be announced. So perhaps an IPA rendered isn't being delivered, and maybe one day it could be - but working on a text-to-audio rendering solution isn't useless. — xaosflux Talk 14:26, 29 June 2023 (UTC)Reply[reply]
    It doesn't have to be either-or. If whatever engine you're relying on supports IPA for some languages, go for it, but it makes no sense to then preclude all other supported languages from being heard. Nardog (talk) 17:37, 29 June 2023 (UTC)Reply[reply]

@User:NRodriguez (WMF): Please see mw:Help talk:Extension:Phonos. The announcement has faulty examples. The "help page" is misleading. What are "some engines"? What is this extension supposed to do? The predominant effect I can see are inappropriate error messages and useless tracking categories. Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer. Taylor 49 (talk) 21:27, 16 September 2023 (UTC)Reply[reply]

Please scrape this "Phonos" immediately[edit]

Yesterday I swichted the pronunciation template at Swedish wiktionary to Phonos. I had to partially revert the change due to dysfunctionality. Most likely I will remove it completely. I propose to completely scrape Phonos. Reasons:

  • it's dysfunctional: if "ipa=" is fed in but "file=" not then it causes an error and puts the page into a tracking cat, it cannot "read" IPA
  • it does not provide anything beyond the capabilities of the old templates
  • the look/layout is bad and hard to improve
  • it uses "Google API" phab:T317274 (I do not want to end up with public WMF wikis accessible from ChromeBook only and only after logging into "your" Google account after having consented to Google's TOS, also the attitude "let's bet on proprietary software until free software is avaialable and good enough" is inherently wrong, it has been applied again and again during the past 25 years, and the outcome was again and again bad (MNG vs Macromedia, Theora vs Q264, ...), there is no need to have public WMF wikis dependent on (and paying to) Google)
  • it converts Vorbis files to MP3 phab:T346508 (there is really no reason to do so, waste of resources, and promotion of proprietary "technologies")
  • the documentation is incomprehensible, the announcements cross-posted too all wikis have faulty examples, it's obscure what the "PhonosInlineAudioPlayerMode" does or how to enable or disable it
  • difficult to invoke from LUA, has to be lauched through hacky "extensionTag" leaving behind "striptease markers"

@User:Nardog @User:Pigsonthewing @User:Xaosflux @User:TheresNoTime-WMF @User:TheDJ @User:Theklan @User:Al12si @User:Whatamidoing (WMF) @User:NRodriguez (WMF) @User:STei (WMF) @User:MusikAnimal (WMF) @[[User:Noé 1]] @User:HLHJ @User:Samwilson @User:Quiddity: I mean it should get deprecated on all WMF wikis, and deactivated on all WMF wikis soon later. Taylor 49 (talk) 15:46, 17 September 2023 (UTC)Reply[reply]

As the Status Updates section makes clear, installations of Phonos on WMF wikis are in the inline audio player mode so ipa= is not available, and the Language team plans to expand the offering of open language services with Text-to-Speech, creating a stable technological foundation for projects such as the IPA Audio Renderer, which indicates it won't rely on Google when/if the IPA-to-audio generation becomes available. Nardog (talk) 15:49, 17 September 2023 (UTC)Reply[reply]
These comments mostly make me want to just not work on MediaWiki. —TheDJ (talkcontribs) 17:02, 17 September 2023 (UTC)Reply[reply]
Hi, apologies for the misunderstandings! As noted above, IPA rendering is still coming and without use of a proprietary API. It's worth mentioning however that all requests to Google were made on the backend, so there's no TOS for you to agree to, nor was anyone's data ever shared with Google. Even the backend request itself goes through an anonymized proxy.
We apologize if our status updates and the relevant Tech News announcement were unclear or misleading. Both link to mw:Help:Extension:Phonos, which we hope sufficiently describes how inline audio player works. I realize a lot of the other information on that page is written with the assumption IPA transcription works, but this is because is intended for audiences in and outside Wikimedia, so while we don't have IPA rendering yet, third-party wikis can still enable it. For added clarity, I've added a note to the top of the page explaining the current situation at Wikimedia.
See the replies at phab:T346508 on why we are using MP3 – namely that it has wider support than other formats and is now non-proprietary.
I'm not sure why use of the Lua extensionTag is considered hacky. Phonos should be no different than any other extension-supplied tag such as <ref>...</ref>. I will note we originally had implemented Phonos as a parser function, but ran into issues like phab:T317112 that forced us to move it to a tag. In our case, we only work with unprocessed wikitext, so a tag makes more sense than a parser function, anyway.
We spent considerable time building Phonos, so I don't think it should be scrapped. It provides unique functionality even while only in inline audio player mode, and we're confident our friends on the Language team will deliver with an IPA renderer given their expertise in this area.
We appreciate your feedback and patience on this project. Warm regards, MusikAnimal (WMF) (talk) 18:23, 17 September 2023 (UTC)Reply[reply]
I disagree with your message, @Taylor 49. Having an inline player that plays sounds directly on the page, without opening another file, is a great advancement, and will make the reader's experience way better. I agree that the documentation is complex and misleading, and it doesn't do what it was wished (and that's a huge hole). Nevertheless, I hope it will do it in the future, and I hope that the future is near. Theklan (talk) 23:51, 17 September 2023 (UTC)Reply[reply]
Indeed "having an inline player that plays sounds directly on the page, without opening another file" is a great benefit ... but this privilege existed already before Phonos. Taylor 49 (talk) 13:37, 18 September 2023 (UTC)Reply[reply]
Yes, you can add a file and play it, but it will have a quite large play bar, which is not practical when adding it inline. Theklan (talk) 13:05, 19 September 2023 (UTC)Reply[reply]
As above, this short answer is that the current phonos tool is NOT an IPA engine, and you shouldn't try to use it for that purpose. That doesn't mean it is useless. — xaosflux Talk 23:58, 17 September 2023 (UTC)Reply[reply]