Research:Characterizing Wikipedia Reader Behaviour/Prevalence of Wikipedia use cases

From Meta, a Wikimedia project coordination wiki

We present the distribution of responses to the survey questions across the 14 Wikipedia languages of the study. We compute the weighted percentages of survey respondents with specific motivations, information needs, and prior knowledge, where the weights are computed based on methods described in the paper and our code documentation that help to correct for bias in who responds to the survey. We also assess whether the survey responses are robust over time, a key step in assuring that the responses we receive can be used to infer insights about Wikipedia readers. Note: all survey responses below are the weighted survey responses where the weighting was done to reduce the bias in the data as much as possible.


The plot shows the survey participant responses to the motivation question of the taxonomy.

Through studying the plot and the data behind it we observe:

  • People are motivated to read Wikipedia for a wide number of reasons. There is no clear dominant reason in any of the languages.
  • Intrinsic learning is the most dominant motivator reported in all languages, except three: English, Dutch, and Japanese. For these media is the most dominant motivator.
  • Media is reported as one of the top motivators across all language with the exception of Bengali and Hindi.
  • There are major differences across languages when considering intrinsic learning: The response shares for intrinsic learning are generally lower for Western European languages (Dutch: 21%, English: 27%) and substantially higher for Eastern European languages (Romanian: 42%, Russian: 41%, Ukrainian: 41%) as well as Western/Central Asian languages (Arabic: 40 %, Bengali: 55%, Hindi: 48%).
  • Common motivators are intrinsic learning (mean: 37%), media (mean: 25%), conversations (mean: 24%), work or school-related tasks (mean: 18%), current event (mean: 17%), and the need for making personal decisions (mean: 13%).
  • The percentage of responses for work or school related motivations as well as being bored or randomly exploring Wikipedia for fun diverge significantly between some of the languages. While work or school related motivations are reported only 10% of the times in English Wikipedia, they are reported 31% (more than three times as often) in Spanish Wikipedia. We recommend future studies to keep track of this motivation as the differences may be explained by the differences in school season times in South America (primarily Spanish speaking) vs. North America and the rest of the northern hemisphere. Also, people report being bored as a motivator for visiting Wikipedia as low as 10% of the times in Hindi, Romanian, and Ukrainian Wikipedia, and more than 20% of the times in English, Japanese, Chinese and Arabic Wikipedia.
  • The answer “other” was only rarely selected (with maximum at 10%) indicating the robustness of the taxonomy defined in earlier research.

Information need[edit]

The plot shows the survey participants' responses to the question of information need in the taxonomy. We observe:

  • Considering all languages, Wikipedia is visited roughly equally by readers for in-depth understanding (mean: 32%), fact-checking (mean: 35%), or reading an overview or summary (mean: 33%).
  • We find a strong diversity among languages. More specifically, in-depth reading is substantially less often reported for the Western and Central European languages such as English (26%), German (21%), Hungarian (24%), or Dutch (21%). Instead, Wikipedia is more often used for fact-checking in these language editions (38%, 43%, 43%, and 47% respectively). An outlier is the Hindi language where users report in-depth reading of articles 68% of the time.
  • The case of very high in-depth reading deserved special attention. We took a few approaches to try to understand whether there are self-reporting issues for this specific item in Hindi. We encourage future studies to keep an eye on this item for Hindi and confirm or amend our observations. Here is what we did:
  • We shared the Hindi translation of the question related to information need in the survey with a few Hindi speakers. They all confirmed that the question is understandable, even if there was some room for improvement. This removed the hypothesis that there may have been something wrong with the translation of the question.
  • We compared the behavior of Hindi Wikipedia readers who reported in-depth reading with those of other languages. We can confirm that the behavior is very similar between Hindi readers and other language readers on this front, when we consider many of the usual features in this study. However, the behavior of Hindi readers is not the same for all of the features of the study. For example, Hindi is the only language with an increased likelihood that a reader will make a single article visit when the information need is in-depth. The response rate for this movitvation is around 3% based on a sample of 3000 responses in this language (Note: we weighted for debiasing, which may have altered some of the statistical significance). Overall, we do not see major differences in terms of reader behavior in the case of Hindi with the data collected. We recommend future studies to keep a close eye on this dimension and inform us of new findings. We also want to confirm that Hindi users reporting a high percentage of in-depth reading is in-line with our observations about the relation between in-depth reading and the human development index (HDI) of a country. We go through the details of this when we report on the relation between survey responses and HDI. In a nutshell: survey respondents in countries with lower HDI report higher numbers of in-depth reading.

Prior knowledge[edit]

The plot shows the survey participants' responses to the question of prior knowledge in the taxonomy. We observe:

  • There are nearly the same number of people reporting to be familiar vs. unfamiliar with the topic they read on Wikipedia across the languages (55% vs. 45%).
  • There are some substantial differences between the languages:
  • Eastern European languages report familiarity with the content at much higher rates (Ukrainian: 73%, Hungarian: 73%)
  • Asian languages, with the exception of Japanese, report to be unfamiliar more often (Bengali: 61%, Chinese: 60%, Hindi: 55%).
These differences may be partially explained by a tradition of social desirability

of humility in the Asian societies. See our study of the cultural indicators for more details.

Robustness over time[edit]

There is a natural question to be asked when prevalence of Wikipedia use cases are reported. Are the results we see in this page robust? While we cannot claim this for all Wikipedia languages, we did take the following step to assess the robustness of the responses in English Wikipedia by running the survey twice: one time in March 2016 and one time in June 2017. You can see the responses reported in each year in the figure.

The figure shows that the survey results are very similar, suggesting that the results are robust over time. The noticeable difference between the results is a decrease in work or school-related motivation (16% in March 2016 vs. 10% in June 2017), which may be due to seasonal effects.

Can I zoom in my language or country?[edit]

At this time, you would need to check the paper or dataset associated with the paper for language and country-specific details.