Research:Micro-task Generator for Organizers on Wikipedia
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
The Micro-task Generator for Organizers on Wikipedia is a tool designed to be a test bed on which micro-tasks and prioritization signals could work for Wikimedia organizers. The tool retrieves Wikipedia article metadata from the LiftWing API to identify what needs improvement in Wikipedia articles (for instance: adding references, media, or categories). This could potentially reduce the manual workload for campaign organizers and give new editors clear, beginner-friendly tasks they can easily understand.
The tool is live and accessible at microtask-generator.toolforge.org. Source code is available on GitLab.
Methods
[edit]This project is implemented using Python (FastAPI) for the backend, REST API calls to MediaWiki and Wikimedia inference endpoints, and HTML/CSS/JavaScript (with jQuery DataTables) for the user interface. The backend processes article data, scores article quality using the LiftWing articlequality model, and returns a prioritized list of micro-tasks.
The research and development workflow includes:
- Collecting data via MediaWiki Action APIs (
prop=revisions,prop=langlinks,list=categorymembers, etc.) and the Wikimedia pageviews REST API
- Analyzing article quality features returned by the LiftWing articlequality model (references, wikilinks, headings, media, infobox, categories, sources, article length, maintenance messages) to identify maintenance needs
- Classifying articles by topic using the LiftWing outlink-topic-model and by country using the LiftWing article-country model
- Implementing a threshold-based needs-detection system that maps low-scoring article features to specific, beginner-friendly micro-task descriptions
- Building a dashboard interface where users can input a list of article titles or browse by Wikipedia category and receive task recommendations
- Conducting informal feedback sessions with experienced editors and organizers
Timeline
[edit]Policy, Ethics and Human Subjects Research
[edit]This project adheres to Wikimedia community norms and policies, including:
- Avoiding disruption to active editors or volunteer workflows
- Only using publicly available Wikimedia data
- Not collecting personal data or private information
- Ensuring transparency by publishing code openly on GitLab
Current Features
[edit]The following features are available in the current deployed prototype at microtask-generator.toolforge.org:
Two Input Modes
[edit]- Articles mode: Users enter a language code and a list of article titles (one per line) to receive task recommendations for those specific articles.
- Category mode: Users enter a language code and a Wikipedia category name (with autocomplete suggestions) and select how many articles to retrieve. The tool fetches members of that category and analyzes them automatically.
Article Quality Analysis
[edit]For each article, the tool retrieves and displays:
- Quality grade (Stub, Start, C, B, GA, FA) via the LiftWing articlequality model
- Quality progress score shown as a visual progress bar
- Potential needs: beginner-friendly task descriptions derived from low-scoring article features:
- Add more references
- Add more internal wikilinks
- Improve article structure (headings)
- Add images or other media
- Add an infobox
- Add more relevant categories
- Add more sources
- Expand short articles
- Check article for a maintenance message
- Expandable feature details: clicking any row reveals a breakdown of all normalized feature scores as visual bars
Contextual Metadata
[edit]- Page views (for 12 months) through the Wikimedia pageviews API
- Number of languages (sitelinks) via the MediaWiki langlinks property
- Days since last edit calculated from the latest revision timestamp
Topic and Geography Classification
[edit]- Article topics via the LiftWing outlink-topic-model
- Country associations via the LiftWing article-country model
- Both are used to power the filter system
Filtering
[edit]Users can filter the results table by:
- Task type (e.g., show only articles that need references)
- Topic (e.g., show only articles about Science)
- Geography (by country or geographic region)
All three filters can be combined, and an All/None toggle is available for each filter group.
Export
[edit]Results can be exported in three formats:
- CSV – for spreadsheet tools
- TSV – tab-separated values
- Wikitext – a formatted wikitable with progress bar templates, suitable for pasting directly onto a Wikipedia or Meta-Wiki page; also available via a one-click "Copy Wikitext" button
Feedback Received
[edit]We received some feedback from organizers and community members. The following Phabricator tickets were opened to implement their suggestions:
- phab:T415707 – (Microtask generator bugs)
- phab:T415710 – ([Microtask generator] Reduce latency / load for tool)
- phab:T415698 – (Add a prioritization column)
This section will be expanded as additional feedback sessions are conducted.
Results
[edit]The following deliverables have been completed or are in progress:
- A functional web application prototype deployed on Toolforge that generates prioritized micro-tasks ✓
- A documented API pipeline for extracting and analyzing article metadata ✓
- A needs-detection and ranking system that displays beginner-friendly tasks for organizers and editors ✓
- Export functionality in CSV, TSV, and Wikitext formats ✓
- Deployment-ready files for Toolforge ✓
- My Progress and Leaderboard features (in progress)
- Documentation covering usage, limitations, and future development possibilities (in progress)
Resources
[edit]- Live tool: microtask-generator.toolforge.org
- Source code: gitlab.wikimedia.org/toolforge-repos/microtask-generator
- Phabricator: phab:T415707, phab:T415710, phab:T415698
- Toolforge: Toolforge portal
- Outreachy blog posts:
- Blog #1
- Blog #2
- Blog #3
- Blog #4
- Blog #5
References
[edit]