Research:Claim Selection for WikiGrok
WikiGrok is an experimental MediaWiki feature used by the Mobile team to increase user engagement in mobile devices. The goal in WikiGrok is to provide an opportunity for engagement to users who are willing to lightly engage in contributions to Wikipedia over mobile devices.
To lower the threshold for participation, WikiGrok should be equipped with millions of questions that are easy to answer by humans on mobile devices and yet, not easy to answer by machines. This research aims to propose a methodology for finding a series of questions that the Mobile team can ask users for experimentation in the short-run as well as a methodology for finding such questions more systematically and in the long-run, and based on the result of the short-term experimentation.
Questions based on Wikidata Claims
We started by identifying the number of English Wikipedia articles (items) in the class tree of person, organization, event, work, place, and term, as some of the main Wikidata classes. The result is shown in the following table:
|class||Item code||no. items|
A sample API query used to compute the above number for class person is:
Given that "person" has the highest number of articles affected, we decided to focus on questions related to human, Q5, an instance of person. To this end, we considered all the items in class of human and their corresponding claims. We then counted the number of co-occurances of claims and identified all those claims that co-occure more than 1000 times in English Wikipedia. (Note that the choice of 1000 is arbitrary. At the point of doing the research, such a threshold provides us with 794 claim pairs to consider, excluding pairs that include instance of, P31. Using the list, we identified potential questions of interest. For example, we know that on English Wikipedia the co-occurance of politician (occupation) and lawyer (occupation) is 5900 times. One natural question to ask users on all politician pages is "Is this person a lawyer?". This is a question that a machine cannot answer easily, while a human reading the Wikipedia page of a politician should be able to answer relatively easily based on the information already available in the page or previous knowledge.