Research:Content gaps on Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Duration:  2019-November — December-2019

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Wikipedia is incomplete by design. The opportunity to share new information with the world is a major motivating factor among both new and established Wikipedia contributors. However, when important information about a topic is absent, incomplete, biased, or otherwise inaccessible to readers, these content gaps can undermine Wikipedia’s ability to serve the needs of its global audience.

Although a great deal of research has been done to identify different types of gaps, and the characteristics of those gaps, there has not yet been an attempt to synthesize this body of work into actionable guidance for identifying, prioritizing, and measuring content gaps. The goal of this project is to characterize previously-identified content gaps, and arrange them hierarchically in a taxonomy in order to facilitate future work on prioritizing which content gaps to focus on, measuring content gaps consistently, and evaluating the impact of interventions meant to close content gaps.


  • Summarize findings from a body of relevant academic and industry research focused on content gaps related to the selection, extent, and framing of hypertextual Wikipedia content (e.g. text, links and citations, structured meta data, but not multimedia)
  • Identify the empirical methods used in these various studies, and their advantages and limitations with respect to their general applicability for large-scale analysis of content gaps across different languages of Wikipedia and for different forms of hypertext-based information
  • Identify the potential causes of content gaps described in these various studies, and the supporting evidence for each
  • Develop a taxonomy of content gaps
  • Provide recommendations for topic-, language-, and format-agnostic metrics and measurement techniques that can support the evaluation of both technological and programmatic interventions to close content gaps.

Guiding questions[edit]

  1. What are the selection, extent, and framing gaps that have been identified in previous literature?
  2. Which of the proposed causes for these gaps are best supported by currently available evidence?
  3. What are the characteristics of previous programmatic and technological interventions that have shown some success at addressing these content gaps?  
  4. What metrics have been used to quantify extent or change over time in content gaps, and which of these metrics show most promise for general applicability—beyond a specific topic, language, or type of content?


This project will begin with a literature review of previous work related to content gaps. This literature review will follow the three-part classification of knowledge gap type outlined in the associated Wikimedia Research 2030 white paper: selection, extent, and framing gaps. The literature review will also:

  • Identify methods and metrics used to identify these kinds of gaps in previous research, and compare and contrast the benefits and limitations of these methods
  • Identify potential causes of content gaps, and evaluate the evidence provided for these proposed causes in previous research

By organizing previous research according to thematic categories related to gap type, methods/metrics, and proposed causes, we will be able to provide the first draft of a taxonomy of content gaps on Wikipedia.

This literature review will focus on content gaps in information represented in text (e.g. Wikipedia articles, or other textual entities) or hypertext (e.g. links between articles, categories, and other text-based metadata). Multimedia gaps, and gaps specific to WikiData and other Wikimedia projects (e.g. Wiktionary) are beyond the scope of this literature review, and may be addressed in a separate study.


To come


See also[edit]

Subpages of this page[edit]

Pages with the prefix 'Content gaps on Wikipedia' in the 'Research' and 'Research talk' namespaces:

Research talk: