From Meta, a Wikimedia project coordination wiki
Wikimedia Research Newsletter

Vol: 11 • Issue: 03 • March 2021 [contribute] [archives]

10%–30% of Wikipedia’s contributors have subject-matter expertise

By: Tilman Bayer and Miriam Redi

"Generating Architectural Landmark Descriptions" from Wikipedia, DBpedia and image analysis[edit]

This paper[1] describes a process to automatically generate descriptions of architectural landmarks using Wikipedia article text, Wikimedia Commons images, and DBpedia triples. The paper lists some examples on how this approach can go awry (see below), noting that "these descriptions cannot compete, in general, with more comprehensive well-written descriptions as encountered in Wikipedia. Still, it needs to be taken account that by far not all architectural landmarks that are of interest from the professional or cultural viewpoint are covered by Wikipedia. Fused content descriptions are then a welcomed solution".

Not actually a windmill in a zen garden: The Christ the Redeemer statue in Rio de Janeiro, Brasil
Evaluation of autogenerated text for Christ the Redeemer (Table 10 from the paper)
Wikipedia (human)
Christ the Redeemer is an Art Deco statue of Jesus Christ in Rio de Janeiro, Brazil, created by French sculptor Paul Landowski and built by Brazilian engineer Heitor da Silva Costa, in collaboration with French engineer Albert Caquot. Romanian sculptor Gheorghe Leonida fashioned the face. Constructed between 1922 and 1931, the statue is 30 metres (98 ft) high, excluding its 8-metre (26 ft) pedestal. The arms stretch 28 metres (92 ft) wide.
Christ the Redeemer (statue), which was built of Soapstone, is a Statue in Brazil.
Fused [with descriptions based on image recognition, incorrect content in red]
Christ the Redeemer (statue), which was built of Soapstone, is a statue in a zen garden environment in Brazil. Its architectural style is Hellinistic. Christ the Redeemer (statue) has similarities with a windmill and a beach house. There is an elevator shaft in it.


Other recent publications[edit]

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"10%–30% of Wikipedia’s contributors have substantial subject-matter expertise"[edit]

From the abstract:[2]

"we carefully crossed information from individual Wikipedia editor pages with external sources such as Google Scholar to reliably identify editors who are credentialed experts. Matching these credentialed experts with their Wikipedia editing patterns, we used this dataset to train a machine learning classifier that we then employed to identify additional expert editors and assess the nature and the scope of their work across Wikipedia. Our results suggest that the scope of expert involvement is substantial, albeit with considerable differences across topics. We estimate that approximately 10%–30% of Wikipedia’s contributors have substantial subject-matter expertise in the topics that they edit."

See also coverage of an earlier conference presentation: "Evidence of Dark Matter: Assessing the Contribution of Subject-matter Experts to Wikipedia"

Wikidata lexemes still lack multilingual links[edit]

A paper from last year's "7th Workshop on Linked Data in Linguistics"[3] presented descriptive statistics about the lexemes of Wikidata, showing that there are still relatively few multilingual links as of 2020 (i.e. around two years after the project's launch).

Wikidata's "sustainable integration into library operations remains a challenge"[edit]

From the abstract:[4]

"The review revealed that Wikidata in libraries is generally described as an open and reusable knowledgebase of structured data capable of linking local metadata with a network of global metadata. Libraries have started experimenting with Wikidata to improve the global reach and access of their unique and prominent collections and scholars. While Wikidata holds great potential to become the repository choice for authority data disambiguation and linking, its sustainable integration into library operations remains a challenge."

"On Altpedias: partisan epistemics in the encyclopaedias of alternative facts*"[edit]

From the abstract:[5]

"We consider a selection of Altpedias that reject Wikipedia’s celebrated ‘neutral point of view’ as an artefact of liberal consensus politics whilst regarding their own epistemics as inherently partisan. As opposed to disregarding objectivity or truth, Altpedias’ ‘alternative facts’ may thus be understood as the product of competing normative standpoints concerning the use value of knowledge. In competing with Wikipedia, Altpedias ultimately attempt to give their partisan viewpoints universal standards, both in tone and in their very nature as wiki platforms. Empirically, the article uses visual network analysis and natural language processing in order to represent the vernacular worldviews of several far- and extreme-right Altpedias: Metapedia, Infogalactic and Rightpedia. Theoretically, the article frames these Altpedias’ fractious approach to the study of knowledge in relation to Lyotard’s ‘general agonistic’ and his speculations concerning the impact of computation on epistemics in the postmodern condition. "


  1. Mille, Simon; Symeonidis, Spyridon; Rousi, Maria; Felipe, Montserrat; Stavrothanasopoulos, Klearchos; Alvanitopoulos, Petros; Carlini, Roberto; Grivolla, Jens; Meditskos, Georgios; Vrochidis, Stefanos; Wanner, Leo. A Case Study of NLG from Multimedia Data Sources: Generating Architectural Landmark Descriptions (PDF). com3rd International Workshop on Natural Language Generation from the Semantic Web (WebNL). Dublin, Ireland (Virtual). 
  2. Yarovoy, Alex; Nagar, Yiftach; Minkov, Einat; Arazy, Ofer (2020-10-16). "Assessing the Contribution of Subject-matter Experts to Wikipedia". ACM Transactions on Social Computing 3 (4): 21–1–21:36. ISSN 2469-7818. doi:10.1145/3416853.  Closed access
  3. Finn Arup Nielsen: "Lexemes in Wikidata: 2020 status". Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020), pages 82–86. PDF
  4. Tharani, Karim (2021-03-01). "Much more than a mere technology: A systematic review of Wikidata in libraries". The Journal of Academic Librarianship 47 (2): 102326. ISSN 0099-1333. doi:10.1016/j.acalib.2021.102326. 
  5. Keulenaar, Emillie V. de; Tuters, Marc; Kisjes, Ivan; Beelen, Kaspar (2019-07-11). "On Altpedias: partisan epistemics in the encyclopaedias of alternative facts*". Artnodes (24). 

Wikimedia Research Newsletter
Vol: 11 • Issue: 03 • March 2021
About • Subscribe: Email WikiResearch on Twitter WikiResearch on Facebook WikiResearch on[archives][Signpost edition][contribute][research index]