Research:Characterizing Wikipedia Reader Behaviour/Robustness across languages
- 1 Current Status
- 2 Background
- 3 Motivation and Scope
- 4 Participating languages
- 5 Design
- 6 Results
- 7 Presentations
- December 2, 2018: Submitted for publication
- August 29: We have finalized the analysis and will start documenting results shortly. The project is marked as completed. Outreach to points of contact and the broader community for knowledge transfer will start soon.
- March 1: The result of the first round of analysis is ready and we will be moving the data to meta shortly. The points of contacts should expect pings in the coming week about it. We are also working on a blog post for this first part of the analysis to make sure we reach a broader audience and help them be aware of the findings. We're quite excited to put this milestone behind us. Thanks for your patience everyone! :)
- October 24: We've had a few meetings where we discussed how to do the debiasing of the results using /all/ the features used in the previous study. While pretty much all the features are easy to compute across languages, topic extraction and building features based on topics doesn't scale very well as it needs manual labeling of clusters. We have a couple of ideas how to handle this smoothly and we are working on those. In general, we have put a self-imposed deadlines for ourselves that by February 2018 we have made significant progress on this research. This is, of course, a challenge ;) as the research is not part of the annual plans of Wikimedia Foundation, but nevertheless, we have plenty of evenings, weekends, and sometimes work hours to spare for this. :) Updates will not be very frequent due to this. If you have questions, you know where to find us. :)
- July 10: We have collected 254,431 responses from the surveys that run in 14 languages for one week. At the moment, we are working on two aspects of the research: handcoding and analysis of the "Other" responses in the motivation question, and the de-biasing of the prevelance of use-cases as described in the appendix of https://arxiv.org/pdf/1702.05379.pdf. The goal of the former is to learn if there are use-cases in one or more languages that are not captured by our current taxonomy of readers. The goal of the latter is to be able to generate results similar to section 4.1 of https://arxiv.org/pdf/1702.05379.pdf for all the 14 languages.
- June 29: The surveys are scheduled to stop in all languages between 2300 and 2359 UTC.
- June 22: The surveys have been released in 14 language editions: Arabic, Bengali, Chinese, Dutch, English, German, Hebrew, Hindi, Hungarian, Japanese, Romanian, Russian, Spanish, Ukrainian
Earlier in this research we used English, Persian, and Spanish Wikipedia readers' survey responses to build a taxonomy of Wikipedia use-cases along several dimensions, capturing users’ motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. We quantified the prevalence of these use-cases via a large-scale user survey conducted on English Wikipedia. In that study, we also matched survey responses to the respondents’ digital traces in Wikipedia’s server logs which enabled us in discovering behavioral patterns associated with specific use-cases. You can read the full study at https://arxiv.org/abs/1702.05379 .
Motivation and Scope
We are interested to learn if the results observed in English Wikipedia (both the prevalence of use-cases as well as the sub-groups identified) are robust across languages, or if and how user motivations and behavioral characteristics differ between Wikipedia language editions. To this end, we are going to run the same survey on English Wikipedia as well several other language editions.
We started out with four more languages, i.e., Arabic, Hindi, Japanese, and Spanish. The reason that we run the survey again on English Wikipedia again is that for comparisons across languages, we ideally need data points from the same points of times as the motivations of people may have changed in the past year. Collecting data from English Wikipedia will also allow us to assess whether the distribution over use-cases are robust over time. The choice of the four other languages was based on the following logic: We wanted to have at least one language that we know very little about the culture of its Wikipedia and its readers, we wanted to have at least one language with very large number of native speakers, we wanted to have at least one language with very large number of native speakers that is growing rapidly in Wikipedia content, and we wanted to complement the research on New Readers by having a non-English language that is under active research in that project. In addition, we reached out to the broader Wikimedia community to receive input and interest signals on whether it is useful for other language communities to have the results of a similar analysis. In doing so, we ended with 14 languages.
All surveys will run on both Desktop and Mobile platforms unless specified otherwise in the table below. All surveys start on 2017-06-22 and end on 2017-06-29 unless specified in the table below.
We intend to run the surveys for one week and all sampling rates have been computed to accommodate the collection of enough data to be able to claim statistical significance when results are reported. However, if for unexpected reasons the sampling rates do no accommodate for enough data to be collected, we may have to run the surveys for a longer period of time. This is an undesirable outcome for us as well as the users, we will go with this option only if we have to. For choosing the samples rates we projected the expected number of responses based on the previous surveys response rate. We categorized language editions into two groups: For language editions with a large viewer base (English, German, Japanese, Russian, Spanish) we aim at collecting enough responses to reproduce the full analysis of the previous survey, i.e., we will investigate the characteristics (with respect to browsing behavior) of viewers with certain motivations. For the other language editions, we aim for enough responses for a statistically robust quantification of the prevalence of certain motivations, but expect not enough responses to repeat the full study.
|Phabricator||Wikipedia Language||Point of Contact|
|task T168200||Hindi||User:Satdeep Gill|
|task T168197||English||User:LZia (WMF)|
The taxonomy of Wikipedia readers developed as part of Research:Characterizing_Wikipedia_Reader_Behaviour was used to assess the robustness of the taxonomy across languages.
- The surveys ran from 2017-06-22 (13:09 UTC) to 2017-06-29 (23:19 UTC).
- Wikipedias in the 14 languages listed in the table below participated.
- Desktop and Mobile
- sampling rates along with the count of responses received are
|Phabricator||Wikipedia Language||Sampling rate||Response count|
- Potential survey participants saw a widget with the message: "Answer three questions and help us improve Wikipedia." translated to their Wikipedia language. Upon accepting the survey invitation, the participants saw three questions (the sequence of the questions and the sequence of response options were randomly changed for each participant. All questions needed a response by the participant):
- Q1. I am reading this article to: get an overview of the topic; get an in-depth understanding of the topic; look up a specific fact or to get a quick answer; other (with a text field to explain what the other reason is).
- Q2. Prior to visiting this article: I was already familiar with the topic; I was not familiar with the topic and I am learning about it for the first time.
- Q3. I am reading this article because (please select all answers that apply): I need to make a personal decision based on this topic (e.g, to buy a book or game, to choose travel destination, etc.), the topic came up in a conversation, I am bored curious, or randomly exploring Wikipedia for fun, the topic was referenced in a piece of media (e.g. TV, radio, article, film book), I want to know more about a current event (e.g., Black Friday, a soccer game, a recent earthquake, somebody's death), I have a work or school-related assignment, other (with a text field to explain what the other reason is).
- Data collection occurred via Google Forms. The survey widget linked to a privacy statement designed for this survey.
The results of the first part of the analysis (computing the prevalence of Wikipedia reader taxonomy use-cases in the 14 languages and debiasing the results) is below.
From these graphs, we see that on average around 35 percent of Wikipedia users across the 14 languages come to Wikipedia for looking up a specific fact, 33 percent come for an overview or summary of a topic, and around 32 percent come to Wikipedia to read about a topic in-depth. There are important exceptions to this general observation that require further investigation: Hindi’s fact lookup and overview reading is the lowest among all languages (at 20 percent and 10 percent, respectively), while in-depth reading is the highest (almost 70 percent). It is also interesting to note that Hebrew Wikipedia readers have the highest rate of overview readers (almost 50 percent).
The average familiarity with the topic of the article in question is 55 percent across all languages. Bengali and Chinese Wikipedia users report much lower familiarity (almost 40 percent), while Dutch, Hungarian, and Ukrainian users report very high familiarity (over 65 percent). Further research is needed to understand whether these are fundamental differences between the reader behavior in these languages or whether such differences are the result of cultural differences in self-reporting.
Among the seven motivations the users could choose from, intrinsic learning is reported as the highest motivator for readers, followed by wanting to know more about a topic that they had seen from media sources (books, movies, radio programs, etc.) as well as conversations. There are some exceptions: In Spanish, intrinsic learning is followed by coming to an article because of a work or school assignment; in Bengali by conversation and current event. Hindi has the lowest motivation by media score (10%), while Bengali has the highest motivation by intrinsic learning.
Below, you can find the results above, but on a per language basis. The blue bars indicate the values prior to debiasing, and the green bars show the debiased results. Both bars are kept in these plots to help us learn the effect of debiasing on the results. For all practical purposes, the numbers associated with the green bars should be used.