Research:Characterizing Wikipedia Reader Behaviour/Demographics and Wikipedia use cases

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Correct.svg This page is currently a draft. More information pertaining to this may be available on the talk page.

Translation admins: Normally, drafts should not be marked for translation.

NOTE: Check back later for information on future research.

The overall goal of this iteration of research is to better understand motivation and behavior in terms of different subpopulations of readers. We are focusing on different demographic groups that have been associated with either different behavior and awareness of Wikipedia. There will be two core components to this research: a survey on the demographics and motivations of Wikipedia readers on different projects and analysis of reader behavior. Combining these two approaches will allows us to better understand how the experiences of different subpopulations of readers overlap/diverge and where we can focus our efforts to improve this experience.

Reader Surveys[edit]

We are developing a survey to understand how reader motivation varies across different demographic groups. Many of the questions that we plan on asking are similar to those asked by the Global Reach team through phone surveys. These past surveys have been very informative regarding what populations are not reaching Wikipedia. Our reader surveys will complement this past work by helping us understand the needs of the readers who do reach Wikipedia. We are planning on asking about the following attributes:

English-language demographics survey questions[edit]

Are you at least 18 years of age?
¤ Yes
¤ No
<three motivation questions from: previous surveys>
Tell us about yourself

What is your age?
¤ 18-24 years
¤ 25-29 years
¤ 30-39 years
¤ 40-49 years
¤ 50-59 years
¤ 60 years and older
¤ Prefer not to say

What is your gender?
¤ Woman
¤ Man
¤ Prefer not to say
¤ Other... <open-text>

How many years (full-time equivalent) have you been in formal education? Include all primary and secondary schooling, university and other post-secondary education, and full-time vocational training, but do not include repeated years. If you are currently in education, count the number of years you have completed so far.
¤ I have no formal schooling
¤ 1-6 years
¤ 7 years
¤ 8 years
¤ 9 years
¤ 10 years
¤ 11 years
¤ 12 years
¤ 13 years
¤ 14 years
¤ 15 years
¤ 16 years
¤ 17 years
¤ 18 years
¤ >18 years
¤ Prefer not to say

Would you describe the place where you live as....
¤ A farm or home in the country
¤ A country village
¤ A small city or town
¤ The suburbs or outskirts of a big city
¤ A big city
¤ Prefer not to say

What is your native language?
<list of Wikipedia languages in their native script>

What is your second native language?
¤ I do not have a second native language
¤ Other... <open-text>

Pilot Results[edit]

From March 4 - 5, 2019, a small-scale pilot of the survey was run on English Wikipedia. It resulted in 771 responses, of which 626 were complete and not under the age of 18. The pilot (and start of the survey translation process) identified a number of issues, described below, that were worked through before expanding the survey to more languages / respondents.

QuickSurveys Sampling[edit]

Sampling for inclusion in a given survey is done by browser. The first time a user navigates to a Wikipedia article with an active survey, a token is stored in their browser's local storage that is associated with that survey's name and indicates in a deterministic way whether the survey will be displayed on that browser. Given that a survey is active for at least several days, readers who at least occasionally visit Wikipedia are just as likely to be sampled as frequent readers. More frequent readers who are included in the survey are more likely to respond to the survey though. In the pilot, respondents viewed an average of 6.9 pages and 52% only viewed a single page while individuals who did not respond viewed an average of 4.7 pages and 61% only viewed a single page. Additionally, selection bias or issues with translations / text of the questions could differentially affect response rates.

A small minority of survey respondents did not have associated EventLogging data, which limits our ability to understand the relationship between reader demographics / motivations and the types of pages that they are reading. The different causes and respective magnitude are provided below:

  • People we completely miss (~3-5%): there are some platforms for which EventLogging and QuickSurveys do not work because these platforms do not support JavaScript. This mainly would be older IE platforms (any IE version before 11) but also would include "lite" browsers (e.g., Opera Mini) that are optimized for low data or privacy. We cannot do much about this. It is not a huge proportion of the internet-connected world but likely is more likely to knock out older users and people from regions with poor internet connectivity, so we should be aware of that. See this for more details.
  • People who can see QuickSurveys but don't have EventLogging (~10%): It is possible that browsers that are slower are failing to load the EventLogging code and thus would be able to see and respond to surveys but would not be logged appropriately. See this phabricator task for more details. There is a chance that some of this is fixable (phab:T218243 and phab:T220627#5107667), but we cannot recover data in any real way for these respondents so any analysis that relies on EventLogging data will miss them. There was no strong demographics patterns related to who was missing EventLogging data, though they tended to be below 40 and male.
  • People who right-click and open in a new tab to take external surveys (~5%): We get QuickSurveyInitiation EventLogging but not QuickSurveysResponses EventLogging for this group. This happens almost exclusively on desktop and should only be a problem for external surveys (no reason to right-click on internal surveys). For this group, it's harder to get the contextual information but not impossible based on approximate methods. The main feature we lose is the editCountBucket.

Age / Gender Skew[edit]

The survey respondents skewed heavily young and male. Including those who were under the age of 18, 70% of respondents were under the age of 30. Of those who completed the survey, 76% identified as men. There were no clear interactions with other variables -- that is, the gender balance was consistent across age groups. This held true for country as well with the exception that the United States was slightly more balanced gender-wise (only 67% men). The United Kingdom and India, the other two most well-represented countries, had a gender balance of 75% and 83% men respectively.

This was a surprising level of skew for the reader population, which led to the question: is the readership truly skewed that far to men or is the skew resulting from different rates at which individuals of different gender identities self-select into the survey? We looked at past surveys and found the following data points regarding gender and frequency of Wikipedia reading:

  • Based on a survey of 1000 AMT workers from US: "Second, men use Wikipedia more often — they are twice as likely than women to use Wikipedia daily"[2]
  • While younger respondents were consistently more likely to read Wikipedia frequently, mixed evidence from Global Insights phone surveys on gender:
    • India: women more likely to be frequent readers of Wikipedia
    • Mexico: men more likely to be frequent readers of Wikipedia
    • Nigeria: men slightly more likely to be frequent readers of Wikipedia
    • Iraq: ~equal likelihood by gender of being frequent readers of Wikipedia

Urban / Rural Question[edit]

See locale analysis.

Language Switching[edit]

In order to prioritize content gaps across languages, it is useful to understand how people "jump" across different languages seeking given content. As a first approach to characterize this behavior, we quantified three elements:

  • People reading Wikipedia in more than one language : We found less than 20% of the people switch between languages in the same session when they read Wikipedia.
  • Share of the most popular project per country : Most of the countries have a clear dominant project, but there are exceptions in multilingual countries. However, we also found that in those countries, multilingual readers of each language are separate communities (people generally do not switch between languages), corresponding to smaller administrative divisions.
Share of the most popular Wikipedia per country: The vast majority of people in a country read in the same language.
  • Ratio of English Wikipedia Readers per country: In non-english speaking countries, the number of people visiting English Wikipedia is marginal.

EnglishSharePerCountry.png

References[edit]

  1. Hale, Scott A. (2014). "Multilinguals and Wikipedia Editing". Proceedings of the 2014 ACM conference on Web science - WebSci '14: 99–108. doi:10.1145/2615569.2615684. 
  2. Hinnosaar, Marit (26 April 2019). "Gender Inequality in New Media: Evidence from Wikipedia". Social Science Research Network.