Research:List building for campaigns

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T348332

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This project captures efforts to make it easier for campaign organizers to build large article worklists that are relevant to their topic area. This project builds directly on earlier offline analyses of the efficacy of various algorithmic approaches to building article worklists. This stage consists of taking the top-performing models (text-based, link-based, and reader-based similarity) and building a basic interface for interweaving their results and allowing a user to interact with the resulting list. It will be used to collect organizer feedback on perceived utility of the various models and approach. It complements additional work to strengthen the campaign ecosystem.

The tool can be accessed via: https://list-building.toolforge.org/

Background[edit]

Campaigns are a key facet of progress towards knowledge equity on the Wikimedia projects. The Mediawiki software and an open movement ("Anyone can edit"), high-quality content that attracts readership (the "flywheel"), and reducing barriers to editing[1] through interventions such as recommender systems have all played a central role in building a strong Wikimedia movement. Campaigns have already played a central role in closing certain knowledge gaps[2] and serve a larger role in socializing newcomers[3] and identifying impactful topics to focus on. Further diversifying the Wikimedia community and richness of content likely requires a more central role for organizers and campaigns.[4]

Design[edit]

The tooling is designed to meet several constraints:

  • Language-agnostic: the tool works for any Wikipedia language edition, can find articles that do not exist yet in a given language edition (for translation), and ideally has strong performance regardless of the user's starting language edition.
  • Low barrier to entry: minimal knowledge of the Wikipedia ecosystem is required to use -- i.e. ideally the only required input is 1+ article titles or keywords.
  • Infinite scrolling: the tool can continuously generate more candidate articles (in practice probably up to hundreds or low thousands is sufficient -- longer lists could be attained by choosing new seed articles/keywords).
  • Flexibility: topics should be able to range from well-defined (Polish Women Scientists) to very broad/hazy (Human Rights) and still be well-supported.

Note that there are several relevant technologies not covered explicitly by this work but that would ideally be part of a larger list-building ecosystem:

  • Filtering: taking a large list and reducing it down to only articles that are relevant to a specific topic. For example, someone might build a global worklist about climate change and individual editors or organizers might want the capacity to easily filter that list to only articles that are relevant to their country. This is covered by projects like topic classification.
  • Prioritization: ranking articles in a list by importance. "Importance" could take many forms -- e.g., misalignment between page quality and views, articles with the most redlinks, etc. This is covered by a projects focusing on different aspects of prioritization.
  • Actionability: tagging articles in a list with concrete improvements that can be made to them -- e.g., articles lacking references, articles in need of copyediting, etc. This is covered by projects such as Newcomer Tasks.

References[edit]

  1. Redi, Miriam; Gerlach, Martin; Johnson, Isaac; Morgan, Jonathan; Zia, Leila (2021-01-29). "A Taxonomy of Knowledge Gaps for Wikimedia Projects" (PDF). 
  2. Halfaker, Aaron (2017-08-23). "Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect". Proceedings of the 13th International Symposium on Open Collaboration. OpenSym '17 (New York, NY, USA: Association for Computing Machinery): 1–9. ISBN 978-1-4503-5187-4. doi:10.1145/3125433.3125475. 
  3. Berson, Amber; Monika, Sengul-Jones; Tamani, Melissa (June 2021). "Unreliable Guidelines: Reliable Sources and Marginalized Communities in French, English and Spanish Wikipedias" (PDF). Art + Feminism. Retrieved 2022-03-30. 
  4. Stinson, Alex (2022-04-05). "An Organizer’s Perspective Part I: ‘Anyone can edit’ is not a strategy for growing the Wikimedia movement". Diff (in en-US). Retrieved 2024-01-03.