Jump to content

Research:Prioritization of Wikipedia Articles

From Meta, a Wikimedia project coordination wiki
Duration:  2020-06 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This page encompasses a number of projects all related to understanding content dynamics and supporting the prioritization of wiki work. From a technical standpoint, the research generally focuses on three core technologies: article importance, article quality, and methods for building lists of Wikipedia articles to prioritize based on their importance and quality. From an ethics and governance standpoint, the research must address large questions such as how we measure and support equity within Wikipedia content and how to provide tools that are universal (can be used by all language communities) but contextual / distributed (each language community can operationalize their own topics and concepts of importance and quality).

Background[edit]

Ideally, all artifacts in a peer production community would be of the highest possible quality. However, all peer production communities — even the very large English Wikipedia community — have a limited number of contributors and all contributors have a limited amount of available time. Given these limitations, some artifacts necessarily will be of lower quality.

— Warncke-Wang et al.[1]

Knowledge equity: As a social movement, we will focus our efforts on the knowledge and communities that have been left out by structures of power and privilege. We will welcome people from every background to build strong and diverse communities. We will break down the social, political, and technical barriers preventing people from accessing and contributing to free knowledge.

Prioritization of content -- i.e. ranking Wikipedia articles by their importance -- is a necessary component of efforts aimed at achieving Knowledge Equity. It is an acknowledgment that additional support is required to overcome the limited resources and existence of systemic biases that arise in the absence of broadly-organized efforts to cover diverse topics. For editors and communities willing to focus on reducing knowledge gaps within Wikipedia, the Wikimedia Foundation can provide tools and guidance to help support these efforts and move the projects closer to Knowledge Equity.

This concept of prioritization is not new or singular to the Wikimedia Foundation but can be found in many places across the wikis: e.g., Vital Articles lists, WikiProject importance assessments, criteria for inclusion in offline wikis. It is already embedded in many of the technologies that recommend content to read or edit on the wikis. Of particular interest to me is the approach taken by Movement Strategy of focusing on the challenge of identifying topics for impact. This shifts the focus away from assigning some global metric of importance to each article and reframes it as a question of identifying topical areas that are of (local) importance. This then shifts the technical challenges away from a ranking problem for which most existing approaches (pageviews or centrality) reflect many of the existing biases of the world and towards tools for detecting topical areas that editors and organizers can use to identify what gaps are important to close (overcoming some of the existing biases).

Technologies[edit]

There are three core technologies that need to be in place for tools to be built that can effectively prioritize content according to a wide range of criteria and needs. Each is described below:

Article Importance[edit]

The goal of this work was to explore how Wikipedians think about the concept of article importance and how to operationalize this within recommender systems. This work demonstrated that methods for assigning importance such as pageview-based or centrality-based approaches are insufficient for closing knowledge gaps. Through a field experiment, it also showed that recommender systems can be a powerful tool for surfacing higher-importance articles via topic-based prioritization. Together, these findings show the promise in topics as an appropriate proxy for importance -- i.e. if you can enable community members to create worklists of articles about high-impact topics and provide further filters to help editors find the most relevant articles within those worklists for them to improve, this is largely sufficient to help editors close knowledge gaps without explicitly ranking articles by perceived importance.

Article Quality[edit]

Prioritization also requires some indication of how high quality an article is and therefore the expected benefit from improving it. Lower-quality articles are generally the best candidates for improvement while higher-quality articles are generally the best candidates for translation or exemplars to guide what content might be missing (and how it should be structured). A closely-related corollary is whether explicit edit recommendations can be made either via machine-learning models like add-a-link or community annotations via maintenance templates.

Topic Spaces[edit]

While there are use-cases that require prioritizing content across an entire Wikipedia language edition, people will also often be interested in prioritizing content that is relevant to a specific topic space -- e.g., articles related to ocean sustainability. For this to be effective, there will need to be technologies that assist people in building lists of content (akin to campaign worklists or WikiProjects). Closely related is language-agnostic topic classification, though that approach depends on a pre-defined taxonomy while topic spaces or list-building allows for more ad-hoc, user-defined topics.

Current Systems[edit]

There are already a variety of recommender systems and related technologies that do explicitly prioritize articles for editing or creation. I have been working to analyze these systems to understand what impact they are having on outcomes like equity as well as to understand how much editor interest affects what recommendations are actually acted upon.

References[edit]

  1. Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015. Retrieved 18 August 2020.