Talk:Community Wishlist/W111
Add topicVery similar proposal
[edit]Please see Community Wishlist/Wishes/A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons which is very related. Maybe you could link to it or something – by now I found a tool with which it takes only seconds (or minutes if you want a super-accurate 60 min video transcript) to add transcripts to videos and I'll update the other item. Prototyperspective (talk) 16:10, 29 July 2024 (UTC)
- Well, it is highly related, but it's not the same. Some languages have automatic subtitling. Those are the few, as most languages, and especially those that would benefit from an easier subtitling and translating system, don't have anything that can make speech-to-text. I'm pretty sure that the tool you have found works great in a handful of languages... not many more. Theklan (talk) 22:24, 29 July 2024 (UTC)
- It works great with too many languages to redub a given video by hand and get all the different videos uploaded instead of having something like multiple audio tracks. It works well with more than a handful languages, not unlikely over 100. What I proposed is also a tool for easy subtitling, for example one could the transcript be translated by two different tools then create a diff where the user can decide which translation is better and if necessary (often it isn't) correct any issues. Moreover, the tool would then set the subtitles/transcript as checked or as needing checking (but already displayable like these). Prototyperspective (talk) 13:32, 4 August 2024 (UTC)
Caution
[edit]I'm pretty set against this if we have to develop the application that does the subtitle matching etc. It is just going to create another complex software application that we would have to maintain. Unfortunately, I have found that subtitling doesn't seem very popular in the opensource community. The software that I was able to find, seems Windows focused and isn't really suited for a web application. If we can find a quality piece of 3rd party software that does this for us however, then I think it would be a valuable addition. —TheDJ (talk • contribs) 14:59, 17 October 2024 (UTC)
- Can't we use/repurpose the same system we use for translating in Meta as a solution? Theklan (talk) 06:18, 18 October 2024 (UTC)
- That doesn't really help you for subtitle matching though. Additionally it has a very high risk of the author making mistakes in the syntax, which definitely would require some additional checks, or there is a high risk of it going unnoticed for a very long time. —TheDJ (talk • contribs) 07:49, 18 October 2024 (UTC)
- You are right, it would only partially solve one of the parts: translating subtitles. About the syntax, subtitling has only two parts: the time code and the text itself. If we can get rid of the time code, there's very limited chance for syntax errors. Theklan (talk) 12:36, 18 October 2024 (UTC)
- That doesn't really help you for subtitle matching though. Additionally it has a very high risk of the author making mistakes in the syntax, which definitely would require some additional checks, or there is a high risk of it going unnoticed for a very long time. —TheDJ (talk • contribs) 07:49, 18 October 2024 (UTC)
- Agree with you. However, such a software already exists and it works best (only?) on Linux. I submitted several issues to the open source software but actually auto-transcription works quite well except that one has to correct names and numbers (e.g. 1983 instead of nineteen eighty-three) and one could add a category for subtitle-checking needed. The missing piece is the integration of the tool into Commons so that instead of running this locally in difficult ways it's available to all users for convenient quick use from Commons itself – that software is missing and it could also be a gadget. If such a tool is used it could also do things like auto-adding a hidden category like "Videos with subtitles that need checking". Prototyperspective (talk) 08:55, 18 October 2024 (UTC)
- @Prototyperspective "such a software already exists" link please ? Because I've been searching for years for something to do this that we can simply spin up as a toolforge web tool. —TheDJ (talk • contribs) 10:41, 16 December 2025 (UTC)
- Glad you're asking – I'm referring to the SoniTranslate tool but I mentioned it several times in wishes and on Commons so I thought you already heard about it. Transcription and translation of subtitles works very well and it's a bit depressing to see still basically nobody else using it despite of the big potential to subtitle videos on Commons at scale (at least the ones where the c:Category:Videos by language subcat does not match the language of the Wikipedia version where it's used and it's also not in the respective c:Category:Videos with subtitles subcat). Dubbing videos also works quite well and I've created a separate wish about the main remaining difficulties with that. Prototyperspective (talk) 13:50, 16 December 2025 (UTC)
- @Prototyperspective "such a software already exists" link please ? Because I've been searching for years for something to do this that we can simply spin up as a toolforge web tool. —TheDJ (talk • contribs) 10:41, 16 December 2025 (UTC)