Research:Prioritization of Wikipedia Articles/Misalignment
Misalignment is a measure of how well-aligned reader interest and article quality are on a given Wikipedia article. The idea draws heavily on research in 2015 by Warncke-Wang et al. but explorations of misalignment date back to at least Gorbatai's 2011 work on this topic. Content can generally be broken down into three categories:
- Underproduced: content that has lower quality than would be expected based on reader demand. Also referred to as "insufficient quality."
- Aligned: content whose quality is comparable to the reader demand -- i.e. low-quality content receives few pageviews, medium-quality content receives some pageviews, high-quality content receives many pageviews.
- Overproduced: content that is of much higher quality than would be expected based on reader demand. Note that even a stub article can be considered overproduced if it receives no pageviews. Also referred to as "excess quality."
The exact measure of misalignment then depends on how quality is measured, how reader interest is measured, and how the two values are compared.
This first version of misalignment is kept quite simple. It uses the
[0-1] article quality score from this model and reader demand score that is a normalized
[0-1] measure of pageviews to an article that is described below. Misalignment then is equal to
quality - demand and thus ranges from
[-1 - 1] where -1 is heavily underproduced, 0 is aligned, and +1 is heavily over-produced.
Regarding the reader demand metric, specifically, the total number of user pageviews for each article are computed for April 2021. This count for each article is then
log-10 transformed and normalized based on the 99th percentile of transformed pageviews for that wiki -- or 100, whichever is larger. So if the top 1% of content on a wiki receives 1000 pageviews / month and an article receives 100 pageviews, then its reader demand score is
log10(100) / log10(1000) = 2/3. If an article received 10000 pageviews that month, it would be truncated to that maximum so
log10(1000) / log10(1000) = 1. And if the top 1% of content on that wiki only received 50 pageviews / month, then the calculation would instead be
log10(100) / log10(100) = 1.
- Average misalignment by wiki can be seen here: phab:T281912#7104316
- These article-level scores can then be averaged to compute misalignment for topic areas of entire wikis. An example of misalignment by wiki and topic area can be found at this tool: https://wiki-topic.toolforge.org/misalignment
- Exploration of misalignment in more ad-hoc topic areas in English Wikipedia (slightly different calculations used for misalignment but same general approach): https://wiki-ltt.toolforge.org/bar
- Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). ICWSM '15.
- Gorbatai, Andreea D. (2011). "Exploring underproduction in Wikipedia" (PDF). Proceedings of the 7th International Symposium on Wikis and Open Collaboration - WikiSym '11: 205. doi:10.1145/2038558.2038595. Retrieved 25 May 2021.