Research:Cross-lingual article quality assessment

From Meta, a Wikimedia project coordination wiki
11:28, 24 June 2022 (UTC)
Diego Sáez-Trumper
Paramita Das
Duration:  2022-April – 2022-July

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Wikipedia’s bold policy, anyone can edit draws the attention of the crowd irrespective of the socio-cultural/geographical boundaries to collaborate and contribute openly. Many of them, especially the editors who are concerned with the quality standards assign the articles to the existing quality scales but the quality keeps on fluctuating in the strongly moderated review environment by individuals or panels. Among different approaches followed to measure the quality of Wikipedia articles, such as community assessments, machine learning-based predictive models, and rule-based synthetic models, have different ways to assess the quality but in most cases restricted to specific language versions. As a result, it is difficult to compare the articles of different language versions on a universal scale of quality. Although quality assessment by community editors is assumed as the gold standard, the assessment becomes obsolete quickly with the frequent editing of information.

To overcome the above fault line, a strong baseline model [1] has been implemented which covers all wikis and is sufficiently accurate to measure language-agnostic quality. The model is built in such a way that it can be amenable to community-specific or task-specific fine-tuning -- i.e. also makes the model easily configurable. Currently, the quality score dataset contains the predicted quality score for every single Wikipedia article across 305 wikis based on the current `2021-12` snapshot.

Objective: The central goal of this work is to apply the model to the historical data of all wiki versions and compare the quality evolution across languages. The pilot study tries to find the followings-
1. To publish a dataset containing quality (i.e., predicted by the model) for every revision of articles in 305 language versions.
2. To track the dynamic changes in the quality of a wiki version across the timeline.
3. To compare the trajectory of quality between different versions of wikis over a similar topic (i.e, wikidata item).


We have implemented the quality model on every revision of articles of all the wikis (i.e., 305 language versions) and computed the quality score for every such revision. Further, the predicted quality score is compared with ground truth quality which is extracted from namespace 1 following the language-specific quality class division. We compared the ground truth quality with the predicted one for English Wikipedia.