Jump to content

User:Santhosh.thottingal/Wishlist

From Meta, a Wikimedia project coordination wiki

This is my personal wishlist on Wikipedia features and technology.

WikiWidgets[edit]

Templates should be replaced by WikiWidgets. WikiWidgets are data driven, language agnostic UI components. They provide a localized rendering for the data. The rendering could be anything that templates are achieved today. They can render tables, graphs, galleries, calculators, illustrations, animations, infoboxes, article previews - anything. They are interactive. Each widget also has an edit action, that provides an custom edit interface for the data. The edited data is then serialized to the central place of storage.

WikiWidgets are not just for MediaWiki pages. They can be used in any arbitrary webpages. Just like how you can embed a YouTube video, or reddit snippet, content from Wikipedia can be reused in any webpage. This means, one can embed a visualization from Wikipedia page to any webpage by including this widget. A wikimedia commons image can be reused by such widget and it will take care of attributions. The widget will have branding information, users can come to Wikipedia for editing it by clicking 'edit' button in WikiWidget.

Language Tools[edit]

Curating open knowledge in all possible human languages need software tools such as machine translation, optical character recognition for digitization, speech to text, text to speech, spell checkers, grammar checkers, input tools, fonts etc. Currently these systems are centralized, proprietary, limited by languages. A free knowledge ecosystem for future demands universally accessible knowledge curation tools for all languages. Wikimedia Foundation has integrated some of these to Wikipedia. Services like machine translation is provided by using APIs of proprietary services. But recently Wikimedia Foundation started self hosting these tools too. For example, MinT machine translation system and ocr.wmcloud.org.

Scaling these tools to open up for general public to use is often discussed internally in WIkimedia Foundation. However, that is a huge commitment in terms of computing capacity and operational costs. It also has the risk of centralization.

I would like to see a distributed deployment model for these tools. For example, the MinT machine translation system that supports 200+ languages is a free and opensource software that any one can download and run on their laptops or servers. We need to find partners in free knowledge ecosystem to host this tool in various parts of the world, in their servers. Then the computing capacity and operational costs are distributed and shared. The development efforts can be coordinated in free and open source model. Let the universities, governments, NGOs, user communities, chapters run these systems in their servers. This is what I envision as a materialized outcome of 2030 vision statement: "by 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge, and anyone who shares our vision will be able to join us"

Semantic search[edit]

Semantic search or natural language based search is a missing capability for Wikipedia. Investments in Wikidata and building knowledge graphs for that purpose is our related effort. Recent advancements in generative AI based interfaces has illustrated the usefulness and possibilities of natural language question answering. I believe Wikipedia should have a live, performant vector embedding store for its content. I would characterize this as an infrastructure enhancement. My own exploration on this front can be read from this 4 part blog post series

Design system[edit]

Back in 2019, I outlined the need for a design system for Wikimedia in this document. There has been great progress on this front: https://diff.wikimedia.org/2022/12/22/creating-the-wikimedia-design-system/