Research:Comparing most read and trending edits for Top Articles feature

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T163818
Created
20:34, 26 April 2017 (UTC)
Duration:  2017-April – 2017-May
This project's code is open-source

This project's data is available for download and reuse.

This page documents a completed research project.


The Android and iOS Wikipedia apps have an Explore feed that contains a card listing 5 trending articles. Currently this Trending card contains the top 5 most read articles within the past 24-48 hours. The mobile team is considering replacing this list with 5 articles based on trending edits.

The current study intends to evaluate whether potential Wikipedia app users prefer a feed based on recent pageviews or trending edits.

Overview[edit]

The Wikimedia apps team wants to make the content that appears on the Trending card as recent and relevant as possible. Currently, pageviews for articles are not available in real time: the pageview API can only provide top read articles for the previous day. As a result, the content of the Trending card is always a little bit out of date.

The Trending edit service provides a list of articles that are experiencing a significant uptick in edits, in nearly real time. This means that this feed is potentially a better source of recent and relevant content than the pageview API.

However, we don't know if trending edits represent a more interesting set of reading recommendations than top pageviews (even day-old pageviews). The articles that are being heavily edited at a given time are not necessarily the same ones that are being heavily read.

Furthermore, the audience for the Wikipedia app (global readers) is significantly broader than the English Wikipedia editor base, which tends to skew heavily towards North America and Western Europe.[citation needed] For example, many people in India read English Wikipedia. We do not know if the proportion of India-based editors matches the proportion of India-based readers. If there is a higher proportion of readers in India compared to editors, a pageview-based feed would tend to show more content that is of likely interest to India-based readers than a trending edits-based feed.

Other variables, such as time of day, may also make a difference on the perceived relevance of the two feed sources. The current, pageview-based feed is updated every day (at around 3:00 UTC) with the previous-day's top read articles. The list then becomes progressively more out-of-date over the course of the day—and hence, less potentially relevant to a reader interested in "trending" content. The trending edits feed is much more current, but also noisier—there are many reasons that an article could have received a sudden spike in editing activity, and not all of them are necessarily equally relevant to a Wikipedia reader.

Goals[edit]

Overall, these two feeds are likely to display somewhat different sets of articles at any given time, and it is not clear which set of articles is, on average, more interesting to readers looking for recommendations of trending articles to read next.

Considerations
When considering what the 'optimum' set of top articles should be, the product team should consider the following general features of the list:
  • whether people are familiar with the topics of the articles listed in the feed
  • whether people consider these articles to be 'timely'
  • the degree to which people are interested in learning more about the articles in the feed
  • whether the previews of the articles listed in the feed contain contextual metadata (images, item descriptions) that make the list more visually engaging and invite curiosity

We address these considerations in the research questions below.

In addition, the product team should consider whether the articles lists generated by the 'top read' and 'trending' feeds are equally relevant and interesting to all readers, or whether the 'trending edits' approach to generating top article lists yields articles that are less familiar to readers from different geographies and cultural backgrounds than the English Wikipedia editing community. We address these considerations in the Hypotheses section below.

Research questions
In this study, we want to know:
  • RQ1: overall, do people prefer lists that are based on recent page views, or trending edits?
  • RQ2: which list contains more articles around topics that people are familiar with?
  • RQ3: which lists tend to contain more articles related to topics that people have heard/read about elsewhere (for example, on a news website) in the past 24 hours?
  • RQ4: are people more likely to consider using the 'top articles' feature after viewing articles in a 'top read'-based list vs. a 'trending edits'-based list?
  • RQ5: how often do people read Wikipedia on a mobile device?


Hypotheses
Reader familiarity with Explore feed content, by country of origin. Because editing-based activity metrics reflect the interests of the editing community, which in which US and EU-based editors are over-represented with respect to their proportion of global readership, we hypothesize that:
  • H1: India-based readers will be less familiar with content that appears in the 'trending' list than the 'top read' list,
  • H2: India-based readers will be less likely to have heard about topics that appear in the 'trending' list than the 'top read' list through off-wiki information sources (news websites, social media, blogs, etc.) within the past 24 hours, and
  • H3: India-based readers will be more interested in reading topics that appear in the 'top read' list than the 'trending' list.

Methods[edit]

Timeline[edit]

  • April 2017: run study on Amazon Mechanical Turk
  • May 2017: analyze and report results (first round)
  • July 2017: run second round
  • August 2017: analyze and report results

Study design[edit]

This study involves asking paid crowdworkers ("turkers" from Amazon Mechanical Turk) to provide basic information about their mobile internet browsing habits, followed by a task that involves analyzing a prototype interface of the Trending card in the Explore feed and filling out a short questionnaire. The questionnaire was offered through Qualtrics. We showed these article lists to US-based turkers and India-based turkers, in roughly similar proportions, to measure geographically mediated differences (a rough approximation for cultural differences) related to item familiarity and interest.

We released rating tasks to turkers over the course of 15 days during May 2016 and 7 days in July 2017, using the current set of top read and trending articles at each point. We released rating tasks at different times of days, to correct for time differences between the US and India, and to vary the relative 'timeliness' of the items in the top-read lists.

Article list examples
Survey questions
  1. How often do you read Wikipedia articles on a smartphone or other mobile device?
  2. How many articles in this list are CLEARLY RELATED to topics that you are familiar with?
  3. How many articles in this list are CLEARLY RELATED to topics that you have seen or read about on other websites (not Wikipedia) within the past 24 hours?
  4. How many articles in this list would you be interested in reading right now?
  5. If there was a list of trending articles LIKE THE ONES IN THIS LIST on the home screen of a Wikipedia app for mobile devices, how often would you use it to look for new articles to read?
  6. Why would you (choice from question #5) use a list that contained articles like these to find new articles to read?
Policy, Ethics and Human Subjects Research
Data collection and analysis will be conducted in compliance with Wikimedia's data retention guidelines for survey research. The design of the study will be informed by the Guidelines for Academic Requesters[1] developed by members of the Mechanical Turk worker community.

Results[edit]

RQ1[edit]

Which list contains more articles that you would be interested in reading right now?

do people prefer lists that are based on recent page views, or trending edits?

On average, raters reported that they would be more interested in reading the articles in the 'top read' list than the 'trending' list. The results were consistent across groups, and (marginally) significant for India-based raters.

t-test: interest in reading (click to expand)

India and US

   Interested in reading - Overall
   top read observations:	83
   top read average:	2.1686746988
   top read std:	1.17010222093
   trending observations:	92
   trending average:	1.90217391304
   trending std:	1.1800864247
   t-statistic =  1.489 pvalue = 0.1383

US only

   Interested in reading - USA
   top read observations:	46
   top read average:	1.86956521739
   top read std:	1.01314610415
   trending observations:	44
   trending average:	1.72727272727
   trending std:	1.09469041625
   t-statistic =  0.633 pvalue = 0.5283

India only

   Interested in reading - India
   top read observations:	37
   top read average:	2.54054054054
   top read std:	1.24324324324
   trending observations:	48
   trending average:	2.0625
   trending std:	1.23163593782
   t-statistic =  1.746 pvalue = 0.0845

RQ2[edit]

Which list contains more articles around topics that you are familiar with?

which list contains more articles around topics that people are familiar with?

On average, all raters were more familiar with the articles displayed in the 'top read' condition than articles displayed in the 'trending edits' condition, although the result for the US only group (n=90) was only marginally significant (p = 0.0543)

t-test: familiarity - general (click to expand)

India and US

   Familarity - Overall
   top read observations:	83
   top read average:	2.4578313253
   top read std:	1.34702890211
   trending observations:	92
   trending average:	1.71739130435
   trending std:	1.01431157803
   t-statistic =  4.108 pvalue = 0.0001

US only

   Familarity - USA
   top read observations:	46
   top read average:	2.04347826087
   top read std:	1.10249759418
   trending observations:	44
   trending average:	1.59090909091
   trending std:	1.07276579284
   t-statistic =  1.950 pvalue = 0.0543

India only

   Familarity - India
   top read observations:	37
   top read average:	2.97297297297
   top read std:	1.44234206099
   trending observations:	48
   trending average:	1.83333333333
   trending std:	0.942809041582
   t-statistic =  4.339 pvalue = 0.0000

RQ3[edit]

Which lists tend to contain more articles related to topics that you have heard/read about elsewhere (for example, on a news website) in the past 24 hours?

which lists tend to contain more articles related to topics that people have heard/read about elsewhere (for example, on a news website) in the past 24 hours?

On average, Indian raters found that more of the articles in the in the 'top read' condition were related to topics they had heard about recently through other media. US raters were not significantly familiar with the topics of 'top read' articles, though the means were different enough to yield an overall significant result.

t-test: familiarity - from recent news (click to expand)

India and US

   Familarity from news - Overall
   top read observations:	83
   top read average:	1.92771084337
   top read std:	1.24941559862
   trending observations:	92
   trending average:	1.48913043478
   trending std:	0.972385739542
   t-statistic =  2.589 pvalue = 0.0104

US only

   Familarity from news - USA
   top read observations:	46
   top read average:	1.84782608696
   top read std:	1.28481755268
   trending observations:	44
   trending average:	1.47727272727
   trending std:	1.0550449444
   t-statistic =  1.475 pvalue = 0.1438

India only

   Familarity from news - India
   top read observations:	37
   top read average:	2.02702702703
   top read std:	1.19653749304
   trending observations:	48
   trending average:	1.5
   trending std:	0.889756521003
   t-statistic =  2.301 pvalue = 0.0239

RQ4[edit]

If there was a list of trending articles like the ones in this list on the home screen of a Wikipedia app for mobile devices, how often would you use it to look for new articles to read?

are people more likely to consider using the 'top articles' feature after viewing articles in a 'top read'-based list vs. a 'trending edits'-based list?

Differences between US and India-based raters are significant, with India-based raters reporting that they would be more likely to use the 'top articles' feature more frequently than US-based raters (χ2=5.99, p=0.047).

in us
frequently 35 23
occasionally 36 46
never 6 13

RQ5[edit]

How often do you read Wikipedia articles on a smartphone or other mobile device?

how often do you read Wikipedia articles on a smartphone or other mobile device?

Differences between US and India-based raters are significant, with India-based raters reporting that they use Wikipedia on a mobile device much more frequently than US-based raters (χ2=9.49, p=0.04).

in us row_total
At least once a day 29 13 42
At least once a month 9 15 24
At least once a week 36 45 81
Less than once a month 6 10 16
I never read Wikipedia articles on a mobile device 5 7 12
col_total 85 90 175

Hypotheses[edit]

H1
India-based readers will be less familiar with content that appears in the 'trending' list than the 'top read' list.

Supported. India-based raters were familiar with significantly more of the topics featured on the 'top read' list than the 'trending' list.

H2
India-based readers will be less likely to have heard about topics that appear in the 'trending' list than the 'top read' list through off-wiki information sources (news websites, social media, blogs, etc.) within the past 24 hours.

Supported. India-based raters had heard about significantly more of the topics featured on the 'top read' list than the 'trending' list from other news sources within the past 24 hours.

H3
India-based readers will be more interested in reading topics that appear in the 'top read' list than the 'trending' list.

Partially supported. India-based raters were marginally significantly more interested in the topics featured on the 'top read' list than the 'trending' list.

Discussion[edit]

Limitations[edit]

  • Mechanical Turk workers' expressions of interest or personal preference may not necessarily reflect the interests or preferences of Wikipedia readers generally, or mobile app readers specifically. This limitation pertains specifically to RQ1, RQ4, and H3. Study question #1 (RQ5)—"How often do you read Wikipedia articles on a smartphone or other mobile device?"— is intended partially as a 'sanity check' for this source of error; at very least, we know that over two thirds of respondents are regular readers (daily or weekly) of Wikipedia on mobile devices.
  • The context of providing a rating/evaluation (for pay) is different than the context of browsing an app in your free time. What you say you like may be different from what you actually click on when no one is watching you or asking you questions. This limitation pertains specifically to RQ1, RQ4, and H3.

Conclusion[edit]

  • Overall, there was a strong preference for 'top read' based recommendations, and all raters were also more likely to be familiar with the topics of the articles presented in those lists. Both of these differences were more pronounced for India-based raters than for US-based raters.
  • Furthermore, switching to a 'trending edits'-based feed is likely to have a more pronounced negative impact on some app users than others. Specifically, Indian readers will find the articles in the new feed even less relevant and 'timely' than North American readers. There will be less stuff they know about in the feed.
  • Will that mean that India-based readers (or readers from other countries outside the US/Europe) will be less engaged by the feature or will consciously perceive it to be biased towards 'Western' topics? Will they be less interested in these topics? An A/B test could provide more evidence, though possibly at the cost of making people who are exposed to the 'trending edits' condition less inclined to trust the feature in the future [2].
  • Based on these findings, usage of the 'Top articles' feature is likely to drop if the app team switches to a feed based on the current trending edits algorithm instead of the 'top read' algorithm. This is despite the fact that the 'top read' data is always slightly out of date, whereas the 'trending edits' data is nearly live.
  • If the goal of the feed is to engage all readers, irrespective of their culture or country of origin, then the team should consider that the proposed change would likely implicitly prioritize the interests of one group over another.

Next steps[edit]

On possible way forward is to building a 'mixed model' that includes measures of both pageviews and editing velocity in the final ranking. Depending on how these different metrics are weighted in the model, this approach may be able to strike a nice balance between general appeal, timeliness, local relevance, and serendipity [3].

References[edit]

  1. "Guidelines for Academic Requesters - WeAreDynamo Wiki". wiki.wearedynamo.org. Retrieved 2017-04-27. 
  2. McNee, S. M., Kapoor, N., & Konstan, J. A. (2006, November). Don't look stupid: avoiding pitfalls when recommending research papers. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 171-180). ACM. (PDF)
  3. McNee, S. M., Riedl, J., & Konstan, J. A. (2006, April). Making recommendations better: an analytic model for human-recommender interaction. In CHI'06 extended abstracts on Human factors in computing systems (pp. 1103-1108). ACM. (https://doi.org/10.1145/1125451.1125660)