Talk:Community Wishlist/Wishes/Automatically create videos based on media in Wikipedia articles
Add topicThis page is for discussions related to the Community Wishlist/Wishes/Automatically create videos based on media in Wikipedia articles page.
Please remember to:
|
![]() |
Relevant experiment & insights
[edit]Hi @Strainu, thanks for submitting this wish! My team at WMF (Future Audiences) has built an experimental Wikipedia-to-video tool that can remix Wikipedia articles into <1 min videos in order to bring Wikipedia fun facts to younger audiences on platforms like TikTok and Instagram Reels. To make even these relatively simple, short videos, we've had to do quite a bit of pre and post curation of the input and output – despite all the fancy new AI tools and capabilities that exist today, it's still nowhere near as easy as a push of a button to reliably go from text + images to a coherent video, so I don't think what you're asking for here is possible (yet). But I'd be happy to chat more with you about what we've built and learned, if you're interested! I'll reach out via email and see if we can find some time to talk more in the new year MPinchuk (WMF) (talk) 21:16, 19 December 2024 (UTC)
- Hello @Maryana, thank you for your response. I'd be happy discuss this idea further to understand what is possible at this point, maybe what tools you used etc. Questions that popped into mind reading your response and the page:
- Does not limiting the length change anything? TikTok is cool and such, but informational videos could also help. "I learn better from video" is something I hear a lot lately.
- What does an "incoherent" video look like? What makes it incoherent?
- How did your tooling work for various Wikipedia article lengths?
- Strainu (talk) 08:45, 20 December 2024 (UTC)
- Is it this tool (doesn't seem to be linked on the page you linked)?: Article to short video Prototyperspective (talk) 13:05, 3 January 2025 (UTC)
- @Prototyperspective The tool you linked was an early exploration of generated video, and here's the link to the tool we're currently using to make content for TikTok, Instagram, and YouTube: https://toolhub.wikimedia.org/tools/toolforge-video-answer-tool
- FYI, I talked with @Strainu last week and will summarize that discussion here for the benefit of other Wikimedians interested in this work:
- When we first started exploring this idea, we tried something along the lines of what Strainu was suggesting: feeding an entire article into a generated video engine to see what it would produce. (That's the tool @Prototyperspective linked to.) What we quickly saw is that without text/audio context, just showing a movie of the images associated with an article creates a confusing and disjointed slideshow. For example, using en:https://en.wikipedia.org/wiki/Cuban_macaw, we got back a video showing the drawing of the parrot, then a picture of a dead parrot in a museum collection, then a map... which is very confusing without additional context. With an AI summarizing the key points of the article and reading and/or adding that information as captions (i.e., talking about the parrot's distribution and habitat when showing the map), the output made a little more sense, but was still not ideal (i.e., the picture of the dead parrot didn't really correspond to the AI narration – it wasn't until the end of the video that the narration talked about this bird being extinct). Also, because Commons media varies in size and resolution, some images would render blurry or zoomed into the wrong part of the image (i.e., the head of the poor parrot was cut off).
- What we took away from this was that AI can do okay-ish at guessing what images to use and associate with what facts in an article, but it will get it wrong often enough that you really need a human to look over and adjust that output in order for it to make sense. An AI will also not know how to scale/zoom on the images it's choosing – this would also require some additional tweaking by a human. All of these curatorial choices would need to be specific to the article – i.e., you could prompt-engineer some of this into the AI tool for one specific article, but that wouldn't generalize to all the content/topics in Wikipedia.
- What we found in our second exploration (building & using the tool I linked above) is that if we stick to very short TikTok-length videos (i.e., ~30 seconds) focused on one key fun fact, we run into less of the above issues, but even then, we still need some human review/editing of the videos to adjust the images & text. But then we're also relying on some pre-production human curating in the form of someone (in our case, the Did You Know? community) identifying and summarizing that one key fact. And of course the vast majority of Wikipedia articles haven't gone through the DYK process and don't have that one "hook" already identified.
- And then there's the cost. Using third-party AI tools to generate the video & audio costs about a dollar a video. This is fine for making a few dozen short videos for TikTok/IG/YT (where we also don't have to pay for the cost of hosting these videos) but becomes prohibitively expensive if we're talking about generating these videos at Wikipedia readership scale (i.e., potentially millions of hits, even if it's just on a subset of Wikipedia articles). Training our own self-hosted models to do this would be orders of magnitude more expensive.
- Strainu's response to all of the above was that instead of autogenerating video, a tool like this could be made available to Wikimedians to create content. I know we have some people in our communities who would be interested in using a tool like this and doing all that human curation to make good videos for Wikipedia (like Strainu, and I'm guessing you @Prototyperspective
) – but I'm not sure how big that community is, how much of their time they would want to invest in this vs all the other onwiki work they do, and whether the broader Wikipedia community would be accepting of this kind of content appearing on Wikipedia. The tool we have now is made for off-wiki sharing (it creates very short videos in a fun/casual social media tone) and is English-only, so it's important to understand if this is the right area to invest more time/money/resourcing into, vs all the other things we could be doing to support editors and readers (including building other new consumer experiences like AI-generated audio, AI text summaries, etc.; or supporting all the wishes to improve existing editing and reading functionality). I'm very interested in hearing more thoughts on this, from @Prototyperspective or any other talk page lurkers here! MPinchuk (WMF) (talk) 17:26, 21 January 2025 (UTC)