Jump to content

Grants talk:IdeaLab/A "Listen" Button

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 8 years ago by Dispenser in topic Wikispeech

Work needing to be done[edit]

  1. Getting community consensus for the idea
  2. Getting the WMF to add the button. We could probably do it as a gadget initially
  3. Figure out how to get the text-to-speech software to run on Wikipedia articles. Will likely need to trim out tables and other complicated stuff

Doc James (talk · contribs · email) 04:33, 25 January 2015 (UTC)Reply

Widening the scope[edit]

I suggest to widen the scope and make this as a Mediawiki extension so that it can be on all Wikimedia projects and not just Wikipedia. This is such an awesome idea that probably will be very useful for OER on Wikibooks and Wikiversity. Ainali (talk) 08:34, 25 January 2015 (UTC)Reply

Yes excellent idea :-) Doc James (talk · contribs · email) 14:56, 25 January 2015 (UTC)Reply
MediaWiki extensions need to be reviewed by staff before being deployed which will delay the project into 2019. Dispenser (talk) 20:44, 7 March 2015 (UTC)Reply


I've looked into doing this on the Toolserver in 2008 and on Labs in 2013/2014. We have twoseveral major markets (excluding children too young to read):

  1. The Blind and dyslexic - They need speed and can work through the garbled speech. They also have their own TTS software with a voice they're familiar with.
  2. General public - They demand quality, non-garbled speech with a gamut of voices since they don't like that "dumb Boston woman" or find the Australian man unintelligible. These people want commercial non-free voice packs, not the festival's default speak & spell voice.
  3. Children - Fall in the two camps above mostly depending on age/skill. Also highly politicized ("I don't want you speaking like that") with lots of money spent/embezzled. (Added Dispenser (talk) 23:32, 7 February 2016 (UTC))Reply

The first market a waist of time to pursue leaving us with the second market that only make sense if we can get high quality non-free running somewhere. We could try convincing WMF to allow non-free voice packs. Dispenser (talk) 20:39, 7 March 2015 (UTC)Reply

There is a huge market that you are forgetting. Those who cannot read in the developing world using mobile phones. They do not have an expensive phone. And do not know how to figure it out. They need something simple.
Than there or those like me who want to listen while either driving or running who do not want to figure it out either. I think if we did the first part maybe someone would develop a free voice pack. Doc James (talk · contribs · email) 20:54, 7 March 2015 (UTC)Reply
The foundation only cares for the developing world as much as it can sell it to its moneyed donors. The text-to-speech accessibility feature in touch screened phones do cover most the use cases for desperate people that we don't have to reinvent the wheel. Considering data costs in these places they'll likely prefer it too. And speaking of reinventing the wheel, Pediaphon is turning nine. Dispenser (talk) 21:35, 7 March 2015 (UTC)Reply
Most of those in the developing world do not have touch screen phones. And there is Wikipedia Zero. Thanks for the link. First time I have seen pediphone. Doc James (talk · contribs · email) 21:37, 7 March 2015 (UTC)Reply
I'm told the $35 Firefox phone with TTS accessibility is "affordable" (Avg. salary in India is $1,570/yr). I doubt the per minute cost of telephoning an Asterisk server and the menu system will work well. Supposedly all of India uses WhatsApp to save on texting. Wikipedia Zero's a threat to net neutrality. The fact it hasn't been brought to AT&T's sponsored data is tell. Dispenser (talk) 22:14, 7 March 2015 (UTC)Reply
In rural area farmers get limited free 1GB internet access for 2$/month through special BSNL Krishi/Farmers Card. AbhiSuryawanshi (talk) 04:41, 9 March 2015 (UTC)Reply
Thanks. Appreciate your perspective as someone who lives there :-) Doc James (talk · contribs · email) 05:08, 9 March 2015 (UTC)Reply
Accent might be problem in India. We need Indian English accent. AbhiSuryawanshi (talk) 06:08, 9 March 2015 (UTC)Reply
For what it's worth, as a screen reader user, I concur with Dispenser's comments above. There may well be a market for this, but people like me aren't in it. Graham87 (talk) 12:12, 5 May 2015 (UTC)Reply
I agree those with vision problems and already on the Internet have found solutions to this problem that are better than we will be able to produce. And agree that this is not the market.
Patient.co.uk have a listen button as seen here http://www.patient.co.uk/doctor/gout-pro which they say is pressed about 1% of the time
Doc James (talk · contribs · email) 12:26, 5 May 2015 (UTC)Reply

others who do this[edit]

If this gets worked on it'd be worth looking at other sites that do similar things. Off the top of my head, some of the EBSCOhost full text databases have a similar feature -- you can listen to articles in an American, British or Australian accent or download the mp3. It appears they are using Texthelp Browsealoud (proprietary, presumably?). It looks like this: [1] -- phoebe | talk 19:20, 29 May 2015 (UTC)Reply

Visually impaired ?[edit]

"Visually impaired users can make use of en:screen readers, but they may not be as accurate as a human vocal recording. This is particularly true of articles relating to science, mathematics, linguistics, and other areas commonly requiring unusual or unfamiliar pronunciation, or the use of symbols."

I don't get it. The same could be said of any open source text-to-speech solution. It's the same technology.--Anders Feder (talk) 08:12, 5 July 2015 (UTC)Reply

@Doc James: Sounds like from Spoken Wikipedia. Dispenser (talk) 18:20, 6 July 2015 (UTC)Reply
Yes like spoken Wikipedia. Human recording will be played if available. Otherwise machine reading will be played. Doc James (talk · contribs · email) 19:52, 9 July 2015 (UTC)Reply

I've just removed this sentence. Visually impaired users will find no benefit at all from a audio version. With ther screen reader they can navigate the content, interact, etc. And have a much better experience than an audio version. Dodoïste (talk) 13:50, 30 September 2015 (UTC)Reply

Voice costs[edit]

Tracked in Phabricator:
Task T126179
How much to purchase a voice pack
Apparently[2], prices are set ($650-1200/hr) to be competitive with professional voice talent.
How much if we produce our own professional voice pack?
According to an interview from The Verge it takes about 3-4 months for recording, so probably US$100,000.
How about a proof of concept with a high quality voice?
WMF has a team with the proprietary tools for iOS development and that's got a dozen of English voices (4x US, 3x Australian, 3x UK, 1x Ireland, 1x South African, ~200 MB more for the 'Enhanced' edition).

Dispenser (talk) 21:07, 27 October 2015 (UTC)Reply

Created Phabricator ticket for iOS. Dispenser (talk) 00:41, 8 February 2016 (UTC)Reply

Web Speech API[edit]

Google standardized W3C Web Speech APIs in 2012. Demo and 2016 Mozilla blog post. Tested supported in Chrome 50 (Desktop), iOS 8+, and Firefox as of 2016 (set media.webspeech.synth.enabled to true in about:config + restart). Doesn't work with Android-x86 5.1 default browser. I still stand by my original point, but this is easier without needing to involve the WMF. Dispenser (talk) 20:06, 23 March 2016 (UTC)Reply


Worth to be mentioned, there is an extension being developed that should solve this problem. Read more at mw:Wikispeech. Ainali (talk) 23:32, 23 March 2016 (UTC)Reply

Thank you for posting that. I see several problems and bad assumptions in the "pilot study" (I know Germany and other European countries lumped dyslexia with full mental retardation, so maybe that explains the author's viewpoint/experience). I would strongly advise the author to use the Web Speech API instead of making yet another open source TTS engine. No always online connection required and every operating system has a TTS engine available. Dispenser (talk) 20:47, 24 March 2016 (UTC)Reply