Jump to content

Research:Micro-task Generator for Organizers on Wikipedia

From Meta, a Wikimedia project coordination wiki
Created
08:31, 9 December 2025 (UTC)
Contact
[[en:Outreachy intern ([1])|Outreachy intern ([2])]]
Duration:  2025-December – 2026-March

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


The Micro-task Generator for Organizers on Wikipedia is a tool designed to be a test bed on which micro-tasks and prioritization signals could work for Wikimedia organizers. The tool retrieves Wikipedia article metadata from the LiftWing API to identify what needs improvement in Wikipedia articles (for instance: adding references, media, or categories). This could potentially reduce the manual workload for campaign organizers and give new editors clear, beginner-friendly tasks they can easily understand.

The tool is live and accessible at microtask-generator.toolforge.org. Source code is available on GitLab.

Methods

[edit]

This project is implemented using Python (FastAPI) for the backend, REST API calls to MediaWiki and Wikimedia inference endpoints, and HTML/CSS/JavaScript (with jQuery DataTables) for the user interface. The backend processes article data, scores article quality using the LiftWing articlequality model, and returns a prioritized list of micro-tasks.

The research and development workflow includes:

  • Collecting data via MediaWiki Action APIs (prop=revisions, prop=langlinks, list=categorymembers, etc.) and the Wikimedia pageviews REST API
  • Analyzing article quality features returned by the LiftWing articlequality model (references, wikilinks, headings, media, infobox, categories, sources, article length, maintenance messages) to identify maintenance needs
  • Classifying articles by topic using the LiftWing outlink-topic-model and by country using the LiftWing article-country model
  • Implementing a threshold-based needs-detection system that maps low-scoring article features to specific, beginner-friendly micro-task descriptions
  • Building a dashboard interface where users can input a list of article titles or browse by Wikipedia category and receive task recommendations
  • Conducting informal feedback sessions with experienced editors and organizers

Timeline

[edit]
Milestone Status Notes
Initial project setup and backend scaffolding Done December 2025
LiftWing quality, topic, and country API integration Done December 2025 – January 2026
Article needs detection logic Done January 2026
Category browsing with autocomplete Done January 2026
Filter system (by task type, topic, geography) Done January 2026
Export functionality (CSV, TSV, Wikitext) Done February 2026
Number of sitelinks and date of last edit column Done February 2026
Toggle functionality of specific progress indicators Done February 2026
Deployment on Toolforge Done February 2026
User feedback sessions and iteration Done February – March 2026
Final documentation and project wrap-up planned experiment March 2026

Policy, Ethics and Human Subjects Research

[edit]

This project adheres to Wikimedia community norms and policies, including:

  • Avoiding disruption to active editors or volunteer workflows
  • Only using publicly available Wikimedia data
  • Not collecting personal data or private information
  • Ensuring transparency by publishing code openly on GitLab

Current Features

[edit]

The following features are available in the current deployed prototype at microtask-generator.toolforge.org:

Two Input Modes

[edit]
  • Articles mode: Users enter a language code and a list of article titles (one per line) to receive task recommendations for those specific articles.
  • Category mode: Users enter a language code and a Wikipedia category name (with autocomplete suggestions) and select how many articles to retrieve. The tool fetches members of that category and analyzes them automatically.

Article Quality Analysis

[edit]

For each article, the tool retrieves and displays:

  • Quality grade (Stub, Start, C, B, GA, FA) via the LiftWing articlequality model
  • Quality progress score shown as a visual progress bar
  • Potential needs: beginner-friendly task descriptions derived from low-scoring article features:
    • Add more references
    • Add more internal wikilinks
    • Improve article structure (headings)
    • Add images or other media
    • Add an infobox
    • Add more relevant categories
    • Add more sources
    • Expand short articles
    • Check article for a maintenance message
  • Expandable feature details: clicking any row reveals a breakdown of all normalized feature scores as visual bars

Contextual Metadata

[edit]
  • Page views (for 12 months) through the Wikimedia pageviews API
  • Number of languages (sitelinks) via the MediaWiki langlinks property
  • Days since last edit calculated from the latest revision timestamp

Topic and Geography Classification

[edit]
  • Article topics via the LiftWing outlink-topic-model
  • Country associations via the LiftWing article-country model
  • Both are used to power the filter system

Filtering

[edit]

Users can filter the results table by:

  • Task type (e.g., show only articles that need references)
  • Topic (e.g., show only articles about Science)
  • Geography (by country or geographic region)

All three filters can be combined, and an All/None toggle is available for each filter group.

Export

[edit]

Results can be exported in three formats:

  • CSV – for spreadsheet tools
  • TSV – tab-separated values
  • Wikitext – a formatted wikitable with progress bar templates, suitable for pasting directly onto a Wikipedia or Meta-Wiki page; also available via a one-click "Copy Wikitext" button

Feedback Received

[edit]

We received some feedback from organizers and community members. The following Phabricator tickets were opened to implement their suggestions:

This section will be expanded as additional feedback sessions are conducted.

Results

[edit]

The following deliverables have been completed or are in progress:

  • A functional web application prototype deployed on Toolforge that generates prioritized micro-tasks ✓
  • A documented API pipeline for extracting and analyzing article metadata ✓
  • A needs-detection and ranking system that displays beginner-friendly tasks for organizers and editors ✓
  • Export functionality in CSV, TSV, and Wikitext formats ✓
  • Deployment-ready files for Toolforge ✓
  • My Progress and Leaderboard features (in progress)
  • Documentation covering usage, limitations, and future development possibilities (in progress)

Resources

[edit]

References

[edit]


Further reading

[edit]