Research:Recommending Images to Wikidata Items
Images allow to explain, enrich and complement knowledge without language barriers. They can also help illustrate the content of an item in a language-agnostic way to external data consumers. However, a large proportion of Wikidata items lack images: for example, as of today, more than 3.6M Wikidata items are about humans (Q5) but only 17% of them have an image(sparql query). A wider presence of images in such a rich, cross-lingual repository enables a more complete representation of human knowledge.
We want to help Wikidata contributors make Wikidata more “visual” by recommending high-quality Commons images to Wikidata items.
We will suggest a set of high-quality commons images for items where images are either missing or flagged as being . This recommendation will be performed by a classifier able to (1) identify images relevant to a Wikidata entry (2) rank such images according to their visual quality.
More specifically, we propose to design first a matching system to evaluate the relevance of an image to a given item, based on usage, location, and contextual data. We will then design a computer vision-based classifier able to score relevant images in terms of quality based on the operationalisation of existing image quality guidelines 
- Image Subject Lists
- External Image Sources: where can we find image candidates for items without P18 (image)?
Data Analysis: Feasibility
To understand the extent to which the sources above actually contain potential image candidates, we ran a simple analysis experiment.
- We took all entities of monuments and split them into With P18/Without P18, where P18 is the property field of Wikidata indicating the presence of an image describing the entity. Of around 100K entities, 2/3 have images and 1/3 don't.
- We then looked at how many pages are linked to each entity, and in which languages.. Only 20% of entities without images link to a Wikipedia page. In general, entities without an image link to pages in 2 or less different languages
- We then checked how many actual images lie in the linked pages: it is either 0 or more than 1
- We looked at how many Page Images are linked to entities, and this is similar to the page links number
- Finally, we counted the images returned by the commons free text search when queried with the entity name: here we find that around 50% of entities without images actually have at least one commons image matching them
Overall, more than 60% of entities without an image have at least one image from one of the sources above, making this approach a viable solution to find image candidates to recommend to Wikidata items.
We will pilot a set of a recommendations (powered by tools like WikiShootMe platform) to evaluate if our machine learning method can help support community efforts to address the problem of missing images.
- Van Hook, S.R. (2011, 11 April). Modes and models for transcending cultural differences in international classrooms. Journal of Research in International Education, 10(1), 5-27. http://jri.sagepub.com/content/10/1/5