Jump to content

Community Wishlist/Wishes/Software for turning articles into spoken Wikipedia audios using novel AI voice tech

From Meta, a Wikimedia project coordination wiki
Software for turning articles into spoken Wikipedia audios using novel AI voice tech Open

Edit wish Discuss this wish

Description

I'm proposing a Web UI tool for creating spoken Wikipedia audios using modern AI voice (text-2-speech that sounds nearly natural).

Earth
Arch Mission Foundation
Aristotle
Elephant communication
Heraclitus
Two-level utilitarianism
2022 in science#August (here still some issues with abbreviations and brackets)
Linux
Easter Island (similar to this Fall of Civilizations podcast) – wider player workaround for the current outdated player where one can hardly jump around/back
Guns, Germs, and Steel (includes a comprehensive summary of the book, similar to Blinkist)

Currently, one has to do all the ~16 steps of c:Help:Spoken Wikipedia using AI so that it for example does not narrate "[1]" for refs or image captions and tables and for things like adding a category like this and so on.

Lots of people listen to podcasts or audiobooks – millions on a daily basis.[1][2] Wikipedia articles are often really interesting but not that many people read a lot in general or on screens. Turning articles into spoken Wikipedia audios would be very impactful and of interest to many readers / listeners. It could up to double Wikipedia's audience.

Please don't confuse the novel AI voice tech with unnatural-sounding screenreaders which can't be used anyway because one needs to transform the article content in various ways as described in the linked guide. One also can't just click on play and listen to an audio-adjusted version of the article on the go. I have personally turned from listening to podcasts to listening to Wikipedia articles because this method made recent versions of articles I'm interested in available with good-quality-listenable audio and think many others would also like to listen to such spoken WP audios when it comes to subjects they're interested in. For example, few people fully read broad major long articles in full like Earth on electronic screens but if one can listen to it podcast-style many may do, including Wikipedia editors who could use this to notice issues with article contents (improving WP quality). Of course it would also be useful for blind people and people with vision/reading problems. There is a template people can use to report any yet-unfixed issues with the audios. I created a WikiProject at Wikipedia:WikiProject Wikipedia spoken by AI voice. The audios created so far are available here from where you can download them into your podcast player. Nearly none of these are shown in their Wikipedia article because as of January 2025 nobody is participating in this project so people can't find and use these audios. Despite of that, they were somehow still played over 100 000 times when excluding the top 4 audios.

The current method most of these audios are currently created by means that the vast majority of articles even in English Wikipedia do not have a spoken WP audio and if it does, it's outdated by years or a decade (e.g. the spoken article for Evolution in ENWP is from 2005). The quality of text-2-speech has improved so much recently that using a separate term 'AI-generated voice' instead of T2S seems warranted, it does sound natural and you can listen to examples on the right. Note Build the necessary technology to make free knowledge content accessible in various formats. in the strategy here.

I originally meant to only propose a Web UI that would speed up the manual creation by making it so that one doesn't have to do things like using the Stylus Firefox addon to alter the Wikipedia article CSS to turn the article into a narration view (similar to print view) to manually copy the contents etc (enabling to create these audios more quickly).

Now I'd also like to propose that at some point the spoken audio files are created automatically and that the tool after a transition phase during which issues are solved is largely used to improve these, e.g. to update an audio if there were recent major changes to its article or if there were misnarrations in the audio that needed to be fixed. The tool would be created first so people can improve the audio creation process over time until audios created with it are generally high-quality and don't have issues (such as narration table headers but not the table). Once this autoconversion process is in good shape (i.e. only very rare minor issues), that part of the tool could be used to create the audios automatically at scale. Several other websites also have their articles available in audio format using AI such as The New York Times but there it's arguably less useful than for long Wikipedia articles. It's a large opportunity that shouldn't be wasted.

Ideas for things the tool could do automatically
  1. adding timestamps for every section so one can easily quickly jump to any particular section of the Wikipedia article – see related wish Video & audio chapters (jump to timestamp).
  2. adding the audio to the Wikidata item for spoken text audio (P989) with the date of the version that is narrated and its language
  3. adding categories like c:Category:Spoken Wikipedia articles using English-language speech synthesis
  4. including a permalink to the Wikipedia article that is narrated as well as a wikilink to the article (at its very latest version)
  5. adding information about which things were included (like a particular table or math equations)
  6. adding TimedText (this could be used for the section-timestamps) and maybe even a tool that highlights the currently-read part of the Wikipedia article for some kind of audiovisual reading mode in the Wikipedia app
  7. having quotes read by a second voice and play some recognizable audio whenever a section title (or subsection header) is read so things are clearer
  8. making indented lists (nested lists) understandable (e.g. by adding some numbers like 1. and 1.1, 1.2)
  9. detecting and flagging (or removing?) contents in the text-body that should (likely) not be there and not be narrated such as (; ) or for now things like Khmer: ជនជាតិខ្មែរ
  10. adding standardized credit lines & license tags (see the examples) & giving is a standardized parseable title like Wikipedia - Article name (spoken by AI voice)
  11. spelling out some common abbreviations like i.e. refers to "that is", e.g. to "for example", i.a. to "inter alia" or "among other things", M often "million", and so on
Proposed audio player for spoken Wikipedia (desktop version). This one is for spoken Wikipedia audios one can listen to like podcasts; when reading a section of the article one could make the audio also jump to that part of the audio via a button.

The main point of this is that it would create these audios at scale, able to deal with common issues like tables in articles automatically to ultimately make all of Wikipedia listenable like podcasts. Its late-stage development would occur alongside template creation/adjustment to add e.g. CSS classes to enable removing certain content.

Also needed is a proper audio player that for example has a button to skip back by 5 or 10 seconds – that is a separate related proposal: A proper audio player (one possible view on WP desktop on the right but most people will probably listen to these on mobile via either the Wikipedia or the Commons app).

Related wish (3): A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons

Assigned focus area

Create new consumer experiences for learning from / engaging with Wikipedia content

Type of wish

Feature request

Wikimedia Commons, Wikipedia

Affected users

Wikipedia content consumers, Wikipedia contributors

Other details

  • Created: 13:45, 16 October 2024 (UTC)
  • Last updated: 03:22, 26 February 2025 (UTC)
  • Author: Prototyperspective (talk)