Social media direct a high volume of reader traffic to Wikipedia articles. These spikes may lead to increases in vandalism or other disruptive behavior. However, Wikipedia editors currently have no reliable source of information about which articles are receiving high volumes of traffic from social media at any given time.
We propose an experiment in which we will publish a daily report of articles that have recently received a high volume of traffic from social media platforms, to help editors monitor and maintain the quality of these articles. The platforms we currently plan to focus on for the experiment are Facebook, Twitter, Reddit, and YouTube. The experiment will last for 1-3 months, after which we will evaluate the impact of the intervention through both analyses of editing activities on those articles and by eliciting feedback from editors who used the report.
Background and motivation
Wikipedia articles are shared widely on social media platforms, by platform users and the platforms themselves. Wikipedia is huge and covers a wide variety of topics, any of which could be relevant to current events happening anywhere in the world. Because of the viral nature of information propagation on the internet, Wikipedia articles shared on these social media can experience huge, sudden traffic spikes. External events can cause an apparently uncontroversial, low-traffic article to quickly go from languishing in obscurity to being in the global spotlight in a matter of days or even hours. Furthermore, some social media platforms link to Wikipedia articles to “fact check” controversial content shared by their users. The potential unintended consequences for Wikipedia of social media platforms using it as a free fact checking service in this way is not known. Youtube uses these links in a few ways and describes them as information panels giving topical context and Facebook refers to them as part of their context buttons.
Currently, the Wikimedia Foundation makes aggregate counts of article views available publicly via the REST API and the PageViews tool. However, these counts represent total traffic (e.g. external traffic from search engines and news articles, or internal traffic from other Wikipedia articles). The Wikimedia Foundation does log information about the platform that traffic comes from (e.g. google.com, facebook.com) but this data has traditionally not been made available publicly due to sensitivity/privacy concerns. A highly-aggregated form of this referral data can be found in the monthly clickstream datasets.
Despite the lack of granularity of the current public pageview data, editors still rely on this information extensively to monitor reader activity and support patrolling work. On English Wikipedia, the top 25 report and the the top 5000 report together received close to 500 views per day in 2019, which approaches the daily traffic of popular editor-facing pages like the Village Pump.
- Assess user acceptance and usefulness of the traffic reports
- Elicit design requirements for improving the traffic reports
- Assess the impact of traffic reports on editor behavior
- Characterize the kinds of articles that receive social media traffic spikes
- Evaluate the impact of social media traffic (vs. other traffic sources) on article quality
- Identify potential disinformation or disruption campaigns coordinated via social media
- Create a single traffic report (a sortable wikitable) that is updated on a daily basis by a script
- The report will contain information on all Wikipedia articles that have received at least 500 views from one of the four identified social media sites (Twitter, Facebook, YouTube, and Reddit) during the previous calendar day.
- Each row in the report will also indicate..
- the number of views received by that page from that social media site the day before that (2 days before present)
- the total number of pageviews received by that article that day (similar to [https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Cat%7CDog the pageviews tool])
- the total number of watchers of that page
The report will be implemented on English Wikipedia (in the lead researcher's userspace). The researchers will announce their 'soft launch' on the English Wikipedia Village Pump and on the Research mailing list for comment.
- The researchers will monitor edits to the pages that appear on the list, and elicit reports of potential problematic edits (potential disinformation attempts) via a Qualtrics survey.
- The researcher will elicit feedback on design, functionality, and future plans from community members on the project talkpages.
- The researchers will determine next steps for the project beyond the 2-month pilot.
FAQ and report format
To judge the impact of the report and whether it justified continued publication, we looked both at feedback and traffic to the report as well as a simple analysis of edit trends associated with the articles that were surfaced on the report (3324 unique articles in total):
- We saw no significant change in the number of edits: between the two weeks prior to an article being published to the report (10.7; 99% confidence interval: [8.3-13.7]) and the two weeks following (10.8; 99% confidence interval: [8.5-13.3]).
- We saw no significant change in the number of reverts: between the two weeks prior to an article being published to the report (0.83 [0.70-0.99]) and the two weeks following (0.83 [0.70-0.98]).
- We saw a slight uptick in the amount of page restrictions: 64 (2%) of the articles saw an increase in protections while the rest remained the same (we did not verify whether this is "normal" for similar articles over the time range we are studying).
- We saw steady but low traffic to the report: after an initial spike, we saw 10 pageviews per day most days.
- We received positive but not strong feedback: we received some good feedback early on in the pilot that we have captured but did not hear strong reasons to continue the report despite evidence that many received notice of the report.
We have seen a number of positive things come out of this work:
- The above analyses suggest that the organic traffic coming from external platforms like Youtube and Facebook that link to Wikipedia articles as context for examining the credibility of content is not having a significant deleterious impact on Wikipedia or placing an additional burden on patrollers.
- We now have a working process that allows us to deploy reports like this one, which will make it much easier to prototype additional data releases of this sort in the future.
- Though the early evidence suggests that the Social Media Traffic Report as an intervention has not led to a substantial change in patrolling behavior around these articles, we now have a public dataset of externally-referred traffic for two months that can support further research into the impact of platforms (and users on those platforms) that link to Wikipedia articles.
- Google. "Information panel giving topical context - YouTube Help". support.google.com. Retrieved 21 May 2021.
- "Facebook Context Button Overview". Facebook Business Help Center. Retrieved 21 May 2021.