Research:Towards Modeling Citation Quality

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
12:52, 21 May 2018 (UTC)
Duration:  2018-May — 2018-?
citations, references, accessibility, machine learning

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.


We would like to understand and map the quality of citations and references in Wikipedia. Reference 'quality' is a broad notion including: reliability, accessibility, neutrality, etc. In this project, we want to map a set of citation dimensions, towards the complete understanding of citation quality.

Citation Quality Dimensions[edit]

Towards modeling a full, rich notion of citaiton quality, we are exploring topic distribution and accessibility of citaitons look across different languages.


First, we define a topic for each publication, by:

  • Collecting all articles where a publication is cited
  • For articles in Wikipedia editions other than enwiki, find the corresponding article in enwiki. This is done by finding the Wikidata item corresponding to teach article, then retreiving the enwiki page linked from that Wikidata item.
  • Assigning a topic to each article, using the WikiProject directory


We mark each publication (doi type) as Open Access or Closed Access as follows:

  • We download the dataset from Unpaywall, containing, for each doi publication, a reference to its open access version, if any.
  • We match the Unpaywall dataset with the entries in our data, and assign an accessibility label to each of them