Talk:Wikispeech

This page has earlier discussions at mw:Talk:Wikispeech

Redirect[edit]

I think the soft-redirect was a better idea, to reduce duplication. Information on this project is already sparse enough. Nemo 19:49, 14 August 2016 (UTC)[reply]

Gujarati language text to speech[edit]

Talk:WikiBlind_User_Group#Request_for_text_to_speech_in_Gujarati

Blue Rasberry (talk) 17:26, 21 December 2020 (UTC)[reply]

Thanks for notice Bluerasberry! Eric Luth (WMSE) (talk) 16:23, 15 January 2021 (UTC)[reply]

Mozilla Common Voice[edit]

Mozilla is developing a TTS called Mozilla Commonvoice with crowdsourced speech data, is this project related? Do they share the speech data? They also have an Android app for users to provide data more easily, could it be reused for Wikispeech? --62.98.122.121 13:26, 21 March 2021 (UTC)[reply]

Common Voice is a project made with the aim of gathering as many audio recordings as possible, in order to have a massive and representative dataset (for each language). Mozilla uses Common Voice's data to train their second NLP project, called DeepSpeech, a STT engine. I'm not aware of any TTS project by Mozilla for the moment. Regards — WikiLucas (🖋️) 17:40, 21 March 2021 (UTC)[reply]

Hello @WikiLucas00:, thank you for the feedback and clarification on what Mozilla already have launched. I fail to determine if there is no interest for their database in the frame of Wikispeech. Maybe Mozilla is not focussing on a text to speech engine, but it doesn't prove their data useless. They gather many sentences that are pronounced by many people. That sounds like a great data set to train some AI, including for text to speech, doesn't it? Psychoslave (talk) 14:57, 26 March 2021 (UTC)[reply]

Some feedback and question[edit]

Hello everybody,

I discover this great project only now, thanks to @Denny who sent a message on the Telegram canal.

The FAQ say that you had the opportunity to do a thorough investigative study before [you] started. Is there a report associated with this study? I think it could help people interested in the topic to better understand the background.

It also indicates that you consult Disability organizations in Sweden. That seems a very good point for the project success. It makes me ask wonder, are there already people with disabilities directly involved in the project? If that is not the case, is that already planned?

The project seems to want to address two main public, people with disabilities, and people with illiteracy issues. For the later one, I think a complementary approach would be to make the Wikimedia environment a host of useful tools and pedagogic material to learn (and teach) how to read. Are you aware of such a complementary project?

For now, if I well understand, there is no platform to record and look at collected material, is that correct?

The pages of the project speak several time of existing commercial solutions, but I didn't saw any statement about existing FLOSS solution, like Orca. I guess you already know about it, but it's better to make sure. Maybe you even had some assessments on it that conducted you to ignore it completely as far as this project is concerned. Or maybe you actually plane to collaborate with Orca’s team, so you could help each other to improve your solution. I didn't find any information that would let me know if any of this hypothesis might approach reality. Could you light me on that point please? 🙏

I see in the video presentation that @Lyokoï: — or someone who really looks close to him 😂 — followed the session. So I guess that you already know Lingua Libre (LL). Actually, updating the Wikispeech/Wikispeech_2019 page, I see that @Sebastian Berlin (WMSE): added today that you did know about that, as well as about Mozilla and the Commons Voice (CV) project. So all that sounds like awareness of of potential partnerships is good on these sides.

I understand that the overall project has its own well defined goals that are very different from LL and CV. I think I simply miss a good perspective of the project to better understand this point, as I fail to see how the specific sub-project of a new collecting platform would be indispensable here. What significant different features would it provide, that would make the new platform development cost worthwile, as opposed to adapt/reuse things from CV or LL?

Last question, the project doesn't specify any license. It just seems to use "freely licensed speech data" wherever the topic is approached. LL is CC-by-sa-3.0 if I remember correctly, and CV is CC-0. What about this project?

So in a nutshell, here are my questions:

Is there a report associated with this study?
are there already people with disabilities directly involved in the project, or a plane to do so?
Are you aware of a complementary wikimedia project to help people to learn basic reading skills?
Is there already a platform to record and look at collected material?
What is your point of view on Orca or any other FLOSS TTS solution?
What features would be better implemented in a bright new project rather than integrated in existing platforms such as CV and LL?
Do you already decided which license will cover the collected material?

Thanks for all that you already accomplished, and thanks in advance for your reply. Psychoslave (talk) 16:49, 26 March 2021 (UTC)[reply]

Hi @Psychoslave. Firstly huge apologies for not noticing this question before. Despite it being a fair bit later I still hope the answers will be useful.

1. The report of the pilot study for the speech data collector can be found at wmse:Fil:Bilaga – Rapport för Wikispeech taldatainsamlarens förstudie.pdf (in Swedish). The report for the pilot study for the text-to-speech component can be found at wmse:Fil:Wikispeech - Bilaga 1 Huvudrapport.pdf (in Swedish).

2. There is currently a request for funding being considered around this. Should that receive funding then there is a plan for a person with visual disabilities to be directly involved in the project.

3. I am not aware of such a complimentary approach.

4. There is no platform available to test yet. for speech recording. The text-to-speech platform can be used on Wikipedia today through an on-wiki Gadget (on Swedish Wikipedia) and the underlying script could be copied across to English or Arabic Wikipedia. See e.g. mw:Help:Extension:Wikispeech#As_gadget_or_user_script.

5. Orca was not explicitly on our radar. This is because our initial approach is that we explicitly wanted a solution which did not require the user to install any software on their end. In part because such an approach is a limitation on users who primarily access our sites through mobile devices (with limited software support) or through computers where they cannot instal software (e.g. library computers). In the backend Wikispeech makes use of pre-existing FLOSS TTS-solutions and it's built to be able to swap these out as new solutions arise. Currently the default TTS is MaryTTS, but one aspect of the project currently seeking funding is to look at switching this out for more modern solutions.

6. We have been in contact with both LL and CV during the last project (but not directly since then). Looking at both what source of resources they are trying to gather and how they have approached it. Neither of those projects are static of course but I'll try to outline the main differences as they where when last we looked at this.

CV: In the case of CV the types of recordings that they are interested in doing are primarily for voice recognition. As such they are seeking short recordings which can happily include background disturbances and poorer audio quality. This is great for training voice recognition but not as much for producing a text-to-speech voice. The conclusion here was however that any data produced by the Speech Data Collector should also be exportable for use by the CV-team.
LL: The LL platform was primarily aimed at short recordings of individual words and names. For text-to-speech training this is valuable in that it serves as a basis for determining the pronunciation of individual words. To train a voice however it is less useful since the transition between words is important. Even if LL allows you to record sentences the plan is for the Speech Data Collector to also add annotations to the speech (to assist the training) and to use manuscripts specifically designed to be speech data dense (so that fewer recorded sentences are needed to train a new voice).

7. The license for the collected speech data (and any lexicographical data) will be CC0.

/ André Costa (WMSE) (talk) 08:32, 7 November 2023 (UTC)[reply]

Status?[edit]

What is the status of the project? I don't see any dates in the timeline. It seems rather dead to me. So9q (talk) 16:08, 9 October 2023 (UTC)[reply]

Hi. Thanks for getting in touch. Wikimedia Sverige has been working on the code in a more limited capacity the last years. Main focus has been addressing issues raised by users on Swedish Wikipedia (where it is available as a Gadget), maintaining compatibility with MediaWiki (as deployed on Wikipedia) and building out the functionality for improving the pronunciations (editing the lexicon). We have sent in a funding application which, if successful, should allow us to focus on Wikispeech again, including a much needed focus on making it easier to add support for new languages and adding new voices. /André Costa (WMSE) (talk) 13:25, 12 October 2023 (UTC)[reply]

Thanks for the update. So9q (talk) 09:06, 13 October 2023 (UTC)[reply]