Community Wishlist/Wishes/Software for turning articles into spoken Wikipedia audios using novel AI voice tech
Description
I'm proposing a Web UI tool for creating spoken Wikipedia audios using modern AI voice (text-2-speech that sounds nearly natural).
Currently, one has to do all the ~16 steps of c:Help:Spoken Wikipedia using AI so that it for example does not narrate "[1]" for refs or image captions and tables and for things like adding a category like this and so on.
Lots of people listen to podcasts or audiobooks – millions on a daily basis.[1][2] Wikipedia articles are often really interesting but not that many people read a lot in general or on screens. Turning articles into spoken Wikipedia audios would be very impactful and of interest to many readers / listeners. It could up to double Wikipedia's audience.
Please don't confuse the novel AI voice tech with unnatural-sounding screenreaders which can't be used anyway because one needs to transform the article content in various ways as described in the linked guide. One also can't just click on play and listen to an audio-adjusted version of the article on the go. I have personally turned from listening to podcasts to listening to Wikipedia articles because this method made recent versions of articles I'm interested in available with good-quality-listenable audio and think many others would also like to listen to such spoken WP audios when it comes to subjects they're interested in. For example, few people fully read broad major long articles in full like Earth on electronic screens but if one can listen to it podcast-style many may do, including Wikipedia editors who could use this to notice issues with article contents (improving WP quality). Of course it would also be useful for blind people and people with vision/reading problems. There is a template people can use to report any yet-unfixed issues with the audios. I created a WikiProject at Wikipedia:WikiProject Wikipedia spoken by AI voice. The audios created so far are available here from where you can download them into your podcast player. Nearly none of these are shown in their Wikipedia article because as of January 2025 nobody is participating in this project so people can't find and use these audios. Despite of that, they were somehow still played over 100 000 times when excluding the top 4 audios.
The current method most of these audios are currently created by means that the vast majority of articles even in English Wikipedia do not have a spoken WP audio and if it does, it's outdated by years or a decade (e.g. the spoken article for Evolution in ENWP is from 2005). The quality of text-2-speech has improved so much recently that using a separate term 'AI-generated voice' instead of T2S seems warranted, it does sound natural and you can listen to examples on the right. Note Build the necessary technology to make free knowledge content accessible in various formats. in the strategy here.
I originally meant to only propose a Web UI that would speed up the manual creation by making it so that one doesn't have to do things like using the Stylus Firefox addon to alter the Wikipedia article CSS to turn the article into a narration view (similar to print view) to manually copy the contents etc (enabling to create these audios more quickly).
Now I'd also like to propose that at some point the spoken audio files are created automatically and that the tool after a transition phase during which issues are solved is largely used to improve these, e.g. to update an audio if there were recent major changes to its article or if there were misnarrations in the audio that needed to be fixed. The tool would be created first so people can improve the audio creation process over time until audios created with it are generally high-quality and don't have issues (such as narration table headers but not the table). Once this autoconversion process is in good shape (i.e. only very rare minor issues), that part of the tool could be used to create the audios automatically at scale. Several other websites also have their articles available in audio format using AI such as The New York Times but there it's arguably less useful than for long Wikipedia articles. It's a large opportunity that shouldn't be wasted.
- Ideas for things the tool could do automatically
- adding timestamps for every section so one can easily quickly jump to any particular section of the Wikipedia article – see related wish Video & audio chapters (jump to timestamp).
- adding the audio to the Wikidata item for spoken text audio (P989) with the date of the version that is narrated and its language
- adding categories like c:Category:Spoken Wikipedia articles using English-language speech synthesis
- including a permalink to the Wikipedia article that is narrated as well as a wikilink to the article (at its very latest version)
- adding information about which things were included (like a particular table or math equations)
- adding TimedText (this could be used for the section-timestamps) and maybe even a tool that highlights the currently-read part of the Wikipedia article for some kind of audiovisual reading mode in the Wikipedia app
- having quotes read by a second voice and play some recognizable audio whenever a section title (or subsection header) is read so things are clearer
- making indented lists (nested lists) understandable (e.g. by adding some numbers like 1. and 1.1, 1.2)
- detecting and flagging (or removing?) contents in the text-body that should (likely) not be there and not be narrated such as
(; )
or for now things likeKhmer: ជនជាតិខ្មែរ
- adding standardized credit lines & license tags (see the examples) & giving is a standardized parseable title like Wikipedia - Article name (spoken by AI voice)
- spelling out some common abbreviations like i.e. refers to "that is", e.g. to "for example", i.a. to "inter alia" or "among other things", M often "million", and so on
- …

The main point of this is that it would create these audios at scale, able to deal with common issues like tables in articles automatically to ultimately make all of Wikipedia listenable like podcasts. Its late-stage development would occur alongside template creation/adjustment to add e.g. CSS classes to enable removing certain content.
Also needed is a proper audio player that for example has a button to skip back by 5 or 10 seconds – that is a separate related proposal: A proper audio player (one possible view on WP desktop on the right but most people will probably listen to these on mobile via either the Wikipedia or the Commons app).
Related wish (3): A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons
Assigned focus area
Create new consumer experiences for learning from / engaging with Wikipedia content
Type of wish
Feature request
Related projects
Wikimedia Commons, Wikipedia
Affected users
Wikipedia content consumers, Wikipedia contributors
Other details
- Created: 13:45, 16 October 2024 (UTC)
- Last updated: 03:22, 26 February 2025 (UTC)
- Author: Prototyperspective (talk)