Research:Prioritization of Wikipedia Articles/Recommendation

From Meta, a Wikimedia project coordination wiki

This page is a review of the various recommender systems used on Wikipedia, with a focus on how they select content to recommend and how that impacts the types of content areas that are improved.

Recommender Systems[edit]

The usage of recommender systems to help editors find content to edit dates back many years (at least 2006[1]) but has been growing steadily in recent years on the Wikimedia projects. Notable examples as of November 2021 include Content Translation exceeding one million articles created,[2] Newcomer Tasks exceeding 190 thousand edits across 70 Wikipedia language editions,[3] and Suggested Edits exceeding 300 thousand edits on Wikimedia Commons[4] and 825 thousand edits on Wikidata.[5] A list of known recommender systems is included below along with some basic details about what types of content they recommend and how they operationalize the question of what content is most important to edit.

Project What is being recommended? Definition of importance
Newcomer Tasks Articles to be edited (add a link / image / reference / section) Random w/i universe of articles with maintenance templates or that need certain actions
GettingStarted Articles to be edited Random w/i universe of articles with maintenance templates
GapFinder Articles to be translated Relevance to keyword or most viewed articles (with some randomness)
Content Translation Articles (sections) to be translated Same as GapFinder
Suggested Edits Image caption and tags and Wikidata descriptions to be edited / translated Random
Citation Hunt Sentences in need of citations Relevance to keyword or random articles with citation needed template
SuggestBot Articles to be edited Relevance to past edits and maintenance templates
WikiGap (as a general example of campaigns) Articles to be edited or created Semi-randomly (by Wikidata QID) or by # of sitelinks. Editors often sort personal worklists alphabetically by title too.[6]

Content Equity[edit]

When considering how equitable the distribution of content is on Wikipedia, there are many different facets (see Knowledge Gaps Taxonomy). While there is no canonical weighting of these factors -- e.g., is gender equity more important than geographic equity? -- some are more straightforward to measure in a global, language-agnostic manner. In particular, gender (of people with biographies on Wikipedia) and geography (countries relevant to a given Wikipedia article) are two facets that can be easily operationalized, have high coverage, and lend themselves to relatively straightforward evaluations of whether a particular is contributing to content equity or maintaining the status quo.

Generally when evaluating the relationship between a recommender system and content equity, there are five distinct stages worth evaluating:

  • All content: in a given wiki, what is the existing distribution of content about people of different gender identities and different countries of the world? This provides a basic baseline / status quo against which to compare the recommender system's impact.
  • Content eligible for recommendation: in a recommender system that e.g., adds images, this might be Wikipedia articles that lack an image. Sometimes these filters are non-negotiable but oftentimes this rather "neutral" stage can have a large impact on the distribution of content that is recommended.
  • Recommended content: of the content that is eligible for recommendation, which are actually surfaced to users? At Wikimedia, this is generally a random sample of the eligible content but different prioritization schemes might bias -- in a good or bad way -- this content.
  • Engaged content: of the content that is surfaced as recommendations to a user, which do they engage with -- i.e. click on the recommendation to visit the article.
  • Edited content: of the content that is engaged with, which actually result in an edit being made -- i.e. the final impact of the recommender system.

At each of these stages, different processes affect the final impact on content equity. For example, the engineer/designer for a recommender system has a fair bit of control over the "content eligible for recommendation" and "recommended content" phases -- e.g., for link recommendation, almost any Wikipedia article likely has potential links that could be added but reasonable choices might be made to focus on just stub articles or newly-created articles, each of which would likely greatly shift the distribution of content recommended. Once the content is recommended, however, the individual editor determines the outcomes of the engagement and editing phases. Providing greater structure and support or improved design might lead to less drop-off, but ultimately the editor chooses whether they make an edit or not.

These phases are not isolated either. It might be tempting to adjust the prioritization of content to directly promote content equity goals -- i.e. recommend more content about women for improvement -- but, if done poorly, this could be cancelled out by shifts in engagement or increases in vandalism to this content. While not fully understood, there is likely a connection between the importance of context in a task and effectiveness of changing the prioritization of content on the resulting distribution of edits. For example, it might be argued that basic copyediting or link recommendation tasks do not require much prior familiarity with the article topic and thus editors will be more comfortable editing content outside of their prior expertise. Adding facts or citations, however, might depend more on an editor's prior familiarity with the topic of the article and thus editors would be more selective in which recommendations they accept.

Finally, though not evaluated explicitly in this work, the choice of the recommender system itself can have a very strong impact on equity. For example, (effectively) recommending alt-text for images has a clear positive impact on accessibility regardless of the distribution of content that is improved. Copyedit tasks meanwhile can be a good way to introduce new editors to Wikipedia (as they generally require only very limited knowledge of wikitext syntax) but might be viewed as limited in their contribution to equity.[7]

Case Studies[edit]

Suggested Edits[edit]

Suggested Edits is a module on the Android app that recommends images to which a caption or tags can be added or Wikidata items to which a description can be added. The recommendation pipeline itself is quite straightforward and does very little filtering and weights every article equally (random). As a result, it largely reinforces the status quo around gender and geography -- i.e. heavy imbalance towards men, the United States, and United Kingdom -- and therefore the net effect of the recommender is to improve content about men more than women or other gender identities and content about the US/UK more than other regions. The exact regions improved depends heavily on language -- i.e. US/UK for English Wikipedia but Japan for Japanese Wikipedia or Germany for German Wikipedia. There is a little bit of evidence that editors also exert geographic selection bias -- i.e. slightly preferring to edit content about some regions over others -- but the effect is not large. For gender, there is no indication that the gender identity associated with the content recommended affects whether editors choose to make an edit or not.

Newcomer Tasks[edit]

Newcomer Tasks is a module rolled out to various Wikipedia language editions with the aim of providing structure/support for new editors. This recommender system has some important differences from Suggested Edits that make it interesting to study:

  • It is focused on Wikipedia
  • It is oriented towards new editors, who might display different behavior from the presumably more experienced editors that use Suggested Edits
  • It allows editors to filter by topics of interest
  • It has complete logging at each step of the pipeline

The analysis found many of the same status-quo issues that were found within Suggested Edits. Topic filters provided some opportunity for closing these gaps (but also could reinforce them too). The largest takeaway was that the algorithmic filtering used to identify articles to be improved led to large changes in the geographic distribution of content. Again, there was no evidence of selection bias by editors with regards to gender but there was some clear evidence of geographic selection bias.

See Also[edit]

Notes/References[edit]

  1. Cosley, Dan; Frankowski, Dan; Terveen, Loren; Riedl, John (28 January 2007). "SuggestBot: using intelligent task routing to help people find work in wikipedia" (PDF). Proceedings of the 12th international conference on Intelligent user interfaces (Association for Computing Machinery): 32–41. doi:10.1145/1216295.1216309. 
  2. https://en.wikipedia.org/wiki/Special:ContentTranslationStats
  3. Full query requires NDA to view but the same data can be extracted via checking the newcomer task tag on individual wikis -- e.g., en:Special:Tags.
  4. https://commons.wikimedia.org/wiki/Special:Tags
  5. https://www.wikidata.org/wiki/Special:Tags
  6. Wattenberg, Martin; Viégas, Fernanda B.; Hollenbach, Katherine (10 September 2007). "Visualizing activity on wikipedia with chromograms" (PDF). INTERACT '07 (Springer-Verlag): 272–287. doi:10.1007/978-3-540-74800-7_23. 
  7. Copyediting that improves readability would have a large positive impact on accessibility but that is often a substantially more difficult undertaking.