Research:Characterizing Wikipedia Citation Usage

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Tiziano Piccardi
Michele Catasta
Jure Leskovec
Robert West
Dario Taraborelli
Bahodir Mansurov
Duration:  2018-05 — ??
Open data project  Open data
no url provided

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.


The goal of this project is to form an understanding of the role of external citations in Wikipedia reading. To this purpose, we plan to instrument Wikipedia articles for a limited amount of time, to capture user interactions with the footnotes and references.

We believe this study will benefit not only the editor community (who will gain insights on how to best include citations in the articles), but also the Web community at large -- at a historical time where the role of fake news has proven to be pivotal in many political and societal matters, it is of great interest to assess if the external citations in Wikipedia are leveraged (and checked) by its readers.

Main research questions[edit]

  • How frequently are references clicked (e.g., what fraction of pageviews entails a reference click)?
  • What are the most cited resources?
    • breakdown per type of resource (e.g., scientific article, newspaper article, company webpage, blog, social media, etc.), per URL domain, per topic
  • Which characteristics of a Wikipedia article impact how often the external citations are visited?
    • breakdown per article popularity, per article topic, per article quality, per article saliency (e.g., article about a current, trending event)
  • What are the common characteristics of an external citation?
    • position on the page, creation time relative to the lifecycle of the article, reference type (e.g., further info, support for a fact, etc.)
  • Can we identify distinct groups of Wikipedia readers given their interaction patterns with external citations?
    • e.g., reference followers vs. ignorers, top-to-bottom readers vs random section readers, etc.
  • Which type of reading sessions lead to consult the external citations more frequently?
    • breakdown per session length, per entry point (search engine vs. random browsing), per Wikipedia browsing session (first article vs. end of a session)

Expected outcome[edit]

By the end of the project, we aim to:

  1. gain a deeper understanding of the citation usage patterns;
  2. develop a predictive model that can output the click frequency of any given external citation.

See also[edit]