Grants:IdeaLab/Countering systemic bias through Wikidata authority control

From Meta, a Wikimedia project coordination wiki
Countering systemic bias through Wikidata authority control
Determine what notable topics are represented in other databases, but not Wikipedia, using Wikidata for joins
idea creator
Vladimir Alexiev
created on22:54, 20 November 2013

Project idea[edit]

Wikidata allows specifying IDs of a topic in external databases. I'll use the Virtual International Authority File (VIAF) as an example, but Wikidata has many such IDs, and similar principles apply to all. For example, Alexander Graham Bell (wikidata:Q34286) is 59263727 in the VIAF database. Thus, it is easy to see if a given person or entity is linked to an external entity. If it is not, it might not be in the other database, or someone may just not have linked it yet.

However, the idea of this is to go in the opposite direction, to find missing entries in Wikidata, Wikipedia, or both. For example, if we have a list of VIAF identifiers, a comparison to a Wikidata dump or query can determine which are in VIAF but not Wikidata. These can be considered candidates for article (and Wikidata entity) creation, if the topics also meet Wikipedia inclusion criteria (and it's confirmed there's not yet an article).

Even if there is a Wikidata entry, this can also determine which Wikipedias lack the topic.

Project goals[edit]

The goal is to counter systemic bias by seeing where Wikipedias are missing topics other databases/reference works have.

Open questions[edit]

The described idea made me think about an alternate solution. A local language can be used as a source set for extracting most viewed articles and then compared to a target language. The target language is checked for existence of the highest ranking articles on the source language. The result will as default list both existing and non-existing articles. By ticking a checkbox all the existing articles can be removed. Major problem is that the title must be machine translated in a lot of cases. It is possible to use fallback mechanisms like on Wikidata, and it is also possible to cache translations, so the need for machine translations is perhaps not that large.

Such a special page could have versions for both WikibaseRepo and WikibaseClient, and it will utilize the label - description structure to make it possible to list the top articles. The client version of the page should list links to pages, possibly also allow changing the label into a local name, and have some helper functionality to connect any newly created page to the correct item.

The system would the work as a continuous evolving "the N most viewed articles in language X", and by creating articles in other languages the editors will continuously try to diffuse those articles into other languages. — Jeblad 15:37, 2 November 2014 (UTC)

Get involved[edit]

Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.

Note, I (Superm401) am not planning to implement this idea.



  • Great idea. Leveraging contrasting metadata is a powerful way to surface gaps Ocaasi (talk) 00:36, 20 March 2015 (UTC)

Expand your idea[edit]

Do you want to submit your idea for funding from the Wikimedia Foundation?

Expand your idea into a grant proposal