Use visual search frontend for Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is a proposal for a new Wikimedia Foundation Sister Project.
Status Under discussion
What is the proposed name for the project? Visual Search for Wikipedia
Proposed project tagline Image-centric results with keywords: clutter free search facilitating quicker browsing and enabling great disambiguation!
Project description
What is the project purpose? What will be its scope? How would it benefit to be part of Wikimedia?
Add an alternative search using images for each result along with differentiating keywords, modeled after www.scrappycito.com. See samples for "small dog", "Bob Jones", and "Taylor Swift" below. Text categorization is used for pages without images to determine generic image based on the topic.
How many wikis?
Will there be many language versions or just on one multilingual wiki?
It will first fully supported be for English. Other languages can be handled as is provided whitespace tokenization is sufficient. Language like Chinese or Japanese without whitespace tokenization will require custom preprocessing (n.b., presumably in place for current search). For text categorization, a native speaker will need to determine a mapping from a representative user category for each for a few dozen generic categories. This would require less than one week's work, including time for training the classifier. ScrappyCito, LLC would be doing this for the top 20 languages (e.g., based on wiki popularity) in the course of a year.
How many languages?
Is the project going to be in one language or in many?
This will support many languages; see previous section. Most languages will be supported out of the box, except the handful of languages without whitespace tokenizer need to use the one from the corresponding wikipedia's regular search. In addition, image selection for text-only pages requires a category mapping created by native speaker, as described above.


Technical requirements
If the project requires any new features that the MediaWiki software currently doesn't have, please describe in detail. Are additional MediaWiki extensions needed for the project?
Software with source for the server will be provided by ScrappyCito, LLC. Advice will be provided for customization.
Development wiki
Interested participants


It would be good for Wikipedia to use visual search front end. Note that a big incentive for this is that users will be drawn to Wikipedia to use this type of search rather than Google Search or Bing. This would be good because these search engines often show Wikipedia content for popular entities like sports stars or tourist attractions, which cuts down on Wikipedia traffic.

You will be able to use the visual search frontend I developed without charge for the duration of my patent in the works (i.e., license free). Here is a simple example with Wikipedia search on left with white background and Scrappy Search on right with tan:

Wikipedia vs. Scrappy search

The full example can be found at following URL:

   http://www.scrappycito.com/wikipedia-vs-scrappy-search-small-dog-breeds-en-wiki-site.png  

See http://www.scrappycito.com for the stable version and http://www.tomasohara.trade:9330 for the work-in-progress version with support for handheld devices and also better aesthetics (n.b., used in examples).

Two other examples illustrate added benefit of this visual search. First, disambiguation becomes based on images and keywords rather than just snippets of text. See

  http://www.scrappycito.com/wikipedia-vs-scrappy-search-bob-jones-en-wiki-site.png

In addition, alternative pages for the same entity become much more engaging:

  http://www.scrappycito.com/wikipedia-vs-scrappy-search-taylor-swift-en-wiki-site.png

I think this will be extremely popular with the Instagram crowd and younger users in general (e.g., younger than 30). To do similar searches, just add "site:en.wikipedia.org", as in following example:

   Lionel Messi site:en.wikipedia.org

The patent for this visual search will be owned by my company ScrappyCito, LLC. If the company gets acquired, I will require that they honor the license-free usage of the visual search system by Wikimedia for Wikipedia. (They will likewise be required to pass along this license-free usage requirement if they in turn are acquired). You will have access to the current source code for use in Wikipedia and other approved projects.

I am doing this both for exposure and because I want to help keep Wikipedia viable. What I can do is develop a prototype for the Simple English Wikipedia on my server and help with the deployment for the regular English Wikipedia on your servers once approved.

Proposed by[edit]

Tom O'Hara (https://meta.wikimedia.org/wiki/User:Tomasohara)

People interested[edit]

clash with mission[edit]

if this is your software, tom, and your server, just implement it and link to wikipedia just like google links to it. to propose a commercial sister project sounds strange given the vision and mission of the wikimedia movement - which is free to use commercial and non-commercial. this includes the contents as well as the software. also it is ad-free. --ThurnerRupert (talk) 18:31, 7 July 2018 (UTC)

  • Looking at the Scrappycito site it appears to be closed licensed. We are part of the open licence movement and we don't use software or content that has a closed licence. WereSpielChequers (talk) 09:00, 14 July 2018 (UTC)