Jump to content

Talk:Community Wishlist/Wishes/A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 7 months ago by HLHJ in topic Related wish

  Please remember to:

Thank you

[edit]

@Prototyperspective could you revise this wish by more clearly articulating the problem you wish to solve? A lot of consideration is made for the potential solutions, but it's still a bit unclear what the exact problem is. JWheeler-WMF (talk) 15:01, 26 July 2024 (UTC)Reply

Appreciate the feedback and suggestions to improve it. The wishlist seems to be about improvements in general not only bug fixes and similar things. As for the problems addressed here, I thought it was made clear in parts like There users do these transcriptions all by hand which takes a lot of time while… but maybe I should restructure to put the problems first and make it clearer. I'll make some edit, maybe that improves this proposal. Prototyperspective (talk) 15:10, 26 July 2024 (UTC)Reply
Thank you - leading with the problem helps others empathize with the struggle for Commons contributors and may even signal other opportunities for solutions or prioritization.
The edits you made really helped me better understand the user's pain point and see how solving it would make a big difference JWheeler-WMF (talk) 15:26, 26 July 2024 (UTC)Reply
Also - how similar is this wish to yours? Community Wishlist/Wishes/Bedre system for å legge til og oversette undertekster (translation forthcoming) JWheeler-WMF (talk) 15:53, 26 July 2024 (UTC)Reply
Good find. 1) Mine is far more detailed and more advanced 2) that one only points out the problem without proposing any solution. I would suggest asking that user to read my proposal and see if they have anything to add to it and if they are okay if these two items are merged.
The closest thing to actually proposing anything such as machine transcription features of any kind is It seems natural that this should be improved to increase accessibility of the files but only the user's description of the problem as too time-consuming and highly manual implicitly suggests that a more automated solution what is called for, an outline of which is of course missing as well. These are different but I suggest to merge that other item into this one. Prototyperspective (talk) 21:34, 26 July 2024 (UTC)Reply
I can second that this is a problem for me. Repeating what I said on the Commons Village pump, I tried to transcribe this excellent series documenting the last economic European home weaving, but it took forever and is incomplete. I can't imagine an AI tool would correctly register the German regional dialect, but if it just got the timings, that would save me maybe 3/4 of the time. A way to play videos at double or triple speed would also be a great help. Prototyperspective kindly suggested an external tool from HuggingFace, but I think I'm quite unlikely to do this, as I've got enough to do with the native tools without spending time installing more. In the past, I've transcribed videos in order to add them to an encyclopedia article. I've more-or-less given up on transcribing things for now, but a good native tool that speeded the process might make it a thing I did again. There are a lot of good, encyclopedic videos in German.
PeerTube reportedly now has facilities for auto-creating transcripts done using an MIT-licensed LLM, with a UI giving easy access to multiline transcripts.[1] HLHJ (talk) 18:30, 1 December 2024 (UTC)Reply
Thanks for the constructive input. I think it wouldn't be so bad in transcribing despite of the dialect. Now I also see why changing playback speed could be useful: when checking AI transcription which are mostly correct it's can make things far more efficient. I also linked the tool in this proposal.
It's not on HuggingFace however, it's meant to be installed locally and can't be used well with HF. Thanks to your comment I noticed I linked the huggingface one, I'll replace or add some info next to it that this should only be used for testing and only with short clips. Yes there are lots of good-quality German videos in c:Category:Videos by Terra X for example/especially which is where I used this tool so far.
So I created the subtitles for the next part of this documentary about home weaving using this tool in a ~30 seconds + ~50 minute wait time (probably can be much faster depending on your computer specs or if the vid is shorter) + ~1 minute adding translations using the tool and added them to the TimedTexts, you can see yourself how accurate it is there and which kinds of rare errors it still makes (and I think it works much better in English): c:File:Bäuerliche Leinenweberei - 3. Aufbäumen, Anknüpfen und Schlichten der Webkette.webm. Prototyperspective (talk) 20:14, 1 December 2024 (UTC)Reply
Wow, thanks! I've gone through part of both the German and English transcripts. The errors made by the autotranscription are not what I expected. It is pretty good, more than good enough to be useful without human editing. It picks up the narrator perfectly (he speaks measured High German, in the standard dialect, and in a form more like written than spoken German; he's a professional reading from a preprepared script in a good circa-1980 studio; he's folied and his voice is very clear). If all the speech was like that it would hardly need editing at all.
When the villagers talk among themselves, it misses some of the utterances entirely, especially those by Frau Klos (the only woman; it doesn't hear her even when she speaks quite sharply, unless she speaks at length). It is remarkably bad at picking up things that are said while not facing the camera. Strangely, it picks up some utterances perfectly but gets the timing 5 seconds or more wrong, no overlap at all between sound and subtitle; this is a serious flaw.
Also serious; when it misses an utterance, you can't easily insert it, because the subtitles are numbered sequentially.
It deals well with technical terminology, and poorly with offtopic remarks, unusual remarks, and casual speech. Sometimes the transcribed text is obviously wrong, through that's usually when it's hard for me to hear the speaker, and the dialect is thicker. The dialect seems to be a problem only when it changes the syntax or drops terminal vowels, changing the phonemes; vowel and consonant shifts it handles well, perhaps because the varieties of English contain a fair assortment of them , too.
As Wikisource has multiple OCR algos, so it might make sense for Commons to have multiple autotranscription algos; I imagine this would not be much more difficult to implement, and they would inevitably have different strengths.
The translation algorithm choked on the technical terms, making silly context mistakes, most conspicuously translating technical terms as homophone common words, like calling the beams trees and the loom a chair ("Baum" and "Stuhl" mean both). It also mistranslated the mistranscriptions, unavoidable as some of them were pretty nonsensical. It is not good at colloquialisms, and it reorders the German text into conventional English word order even in phrases where the German word order would be more natural and colloquial, which is a very human sort of hypercorrection error, and makes me wonder if it was trained on translations made by humans who natively speak German and speak German-academic English very well (the English spoken by German academics is its own sociolect, to the extent that it's easy to pick out academics who have actually lived in English-speaking countries for a few years).
It would also be really useful to have the transcripts side-by-side, in a double-column view, when translating. Having the video beside the transcript would be mildly useful (I can switch tabs). Being able to loop-play the few seconds of a file that the subtitle is displayed for, by, say, clicking of a "play" button next to that timed subtitle, would make proofreading way easier. It is often necessary to listen to someone mutter a phrase repeatedly before you hear what they are saying. HLHJ (talk) 22:28, 1 December 2024 (UTC)Reply
[edit]

This is a related wish: Community Wishlist/Wishes/Add a subtitling tool for easy subtitling and translating. While I think that having automatic subtitles would be great, even for those languages where it is possible, I think that we first need to improve the platform, and then add a feature like this. Theklan (talk) 22:30, 29 July 2024 (UTC)Reply

This could totally include my side-by-side translation wish, from above. HLHJ (talk) 22:33, 1 December 2024 (UTC)Reply

The tool is for dubbing

[edit]

I have checked the tool you propose, and I can't get it working, but it seems to be for dubbing, not for subtitling, isn't it? Theklan (talk) 22:35, 29 July 2024 (UTC)Reply

Good to see it wasn't only me who was confused at first and that the help page I started drafting would likely be helpful. It works very well for transcribing, actually even better than dubbing as that needs more skills than thought first to prevent speech overlaps and ensure good speech quality. See Commons:Help:AI video dubbing for some help on how to use it (under construction), in short just enter the url of the video audio and then select output subtitles in the output. Prototyperspective (talk) 12:59, 30 July 2024 (UTC)Reply

Can this also replace files with translations?

[edit]

Elli: Can this also solve the problem mentioned in c:Commons:Deletion requests/Files found with File:Wikipedia on GLAM-Tour Kulturkooperationen für lokale Wikipedia-Gruppen.webm: Commons files without images or other media that instead contain subtitles for videos, each file for another language? These "odd" files are linked to in the file with the video. But it is not Commons policy to (mis)use files for this purpose. So a solution would be very welcome. JopkeB (talk) 14:53, 1 August 2024 (UTC)Reply

Strange files. You forgot to put an extra Commons: in the link. The files probably can and should be deleted as they should and are e.g. here. I don't know how it relates to this proposal however. Ways to use auto-transcription would write to those timedtexts files, I did that manually for a few files but a UI tool is needed. Prototyperspective (talk) 22:57, 1 August 2024 (UTC)Reply
That was my question to Elli: wether this is a solution. As I read the proposal: it might provide subtitles to videos, just as is needed in c:File:Wikipedia on GLAM-Tour Kulturkooperationen für lokale Wikipedia-Gruppen.webm. JopkeB (talk) 05:20, 2 August 2024 (UTC)Reply

Pairings where English is excepted

[edit]

Not a bad suggestion in itself. AI is becoming more and more precise when it comes to translation. However, depending on the language, there is still a risk of major translation errors. They often work well when one of the languages ​​is English. However, the technology often fails with other pairings, such as "Zulu to Japanese". So I think we should wait a few more years for the technology. --RaveDog (talk) 18:15, 4 August 2024 (UTC)Reply

Good point, but I think the solution would be to only enable it at first for large languages for which these tools work well like English to Spanish. One could later add new languages first in a different way that e.g. require more checking and having, it's best to implement this first on smaller scale and then see what the issues are so these can be addressed as or before it's implemented/adopted at larger scale. (same reply/solution as for your other comment). Prototyperspective (talk) 20:40, 4 August 2024 (UTC)Reply