Structured data for GLAM-Wiki/Check existing

From Meta, a Wikimedia project coordination wiki
Structured data for GLAM-Wiki

Steps: Check what is already on Wikimedia projects, understand your context, and make use of the tools available.

Check what is already on Wikimedia projects

One of the most important steps in sharing a collection on the Wikimedia projects is learning what is already available. In many cases, there will already be some content (media files and/or data) from your collections on Wikimedia projects. Check what's already there and in which quality and format to avoid duplication.

Always check which data and media items are already present on Wikidata and Wikimedia Commons.

  • Volunteers have often already autonomously uploaded quite a few images from GLAM collections.
  • Wikidata will probably already contain quite a few data items about creative works, people, and topics related to specific GLAM collections.
  • On Wikimedia Commons, it is considered good practice to upload new (higher-quality) media files. Don't overwrite existing files.
  • On Wikidata, duplicate items must be avoided and, when they are discovered, merged. It is OK (and even highly recommended) to add extra sources and statements to existing items though. Usually, the metadata provided by the GLAM institution is more complete than the one available already.

Understand your context

The previous section is especially true for institutions and collections available in richer and more privileged contexts. In developing countries or in places in the Global South, knowledge is not yet completely available or digitalized. Therefore, the media and metadata from institutions are yet to be uploaded on the Wikimedia projects.

One of the main values of Structured data is its translatability. It improves the translation of metadata from media files, making them multilingual and, consequently, more accessible and available to be understood and widely used in the Wikipedia languages, improving the reach of GLAM media files across Wikimedia. However, even thought MediaWiki works in more than 300 languages, not all of those languages are complete or well represented on all Wikimedia projects.

GLAMs and other cultural institutions, with their collections, might suffer from this incompleteness when working on Wikimedia in technical terms. At the same time, they can also help to solve this problem by documenting and sharing more knowledge about their local contexts, countries, cultures, and languages. Even getting their own shared media files and metadata to be enhanced by the community - with the possibility of roundtripping back data to enhance the institution's collection as well.

To learn more about this topic, see the page the value of Structured data for GLAMs or a presentation about Structured Data on Commons and Knowledge Equity during Wikimania 2018. Or see a blog post about How we helped a small art museum to increase the impact of its collections, with Wikimedia projects and structured data.

Make use of tools

To help you understand which metadata and media are already available on the Wikimedia projects, there are a few very useful tools:

Search functions

The most simple and fastest way to search for content is using the platforms' search functions or search bars. This is not useful for institutions that are looking for a lot of files at once. For that, use one of the query services or PetScan (below).

  • On Wikidata

The search function on Wikidata is quite simple. You can search for files with their names or titles, in different languages, or with their Q-IDs (QIDs or Q numbers). Wikidata also offers the Aliases fields, in which different labels (or ways to call it) for the item are available. For that reason, you can search for alternative names and they might appear available as well.

On the advanced search, there's the possibility of searching in page text for different information, as well as file types. It's also possible to look for Properties (P numbers) and use the Item by title search to find an item for a given page on a connected site.

  • On Wikimedia Commons

The search function on Wikimedia Commons allows Structured Data on Commons statements to be searched, as well as unstructured texts. It works both with the previous search, Search, and with the new Commons search, Media search, this one more focused on showing images, video, and audio files and allowing users to sort by licenses used, formats or types, sizes, and community assessments.

There's also the possibility of searching for files in a specific Wikimedia Commons category or using the haswbstatement search, which looks for files that have of a certain strucutre data.

For example, if you look for files in which the digital representation of (P6243) statement is the painting Girl with a Pearl Earring (Q185372), your search should be haswbstatement:P6243=Q185372 and some of the results would be the following:

To search for files in a specific Wikimedia Commons category, add "incategory:", followed by the name of the category with underlines (_) between each space, and finally add "haswbstatement:" plus the property you wish to search, like depicts, "P180".

While searching for files using the property depicts (P180) in the category Images from Metropolitan Museum of Art, for example, the text used in the search should be: incategory:Images_from_Metropolitan_Museum_of_Art haswbstatement:P180.

Query engines

  • On Wikidata

The Wikidata Query Service (WDQS) is well known by a good number of active users and is vastly used on the Wikimedia projects (see part about PetScan and Listeria bellow). It uses SPARQL, a semantic query language for databases.

For Wikimedia and GLAM content, WDQS is especially useful for providing analytics and visualizations for the metadata shared. It allows for graphs, tables, charts, and maps to be created using the metadata available. It's also an important tool for reporting, as it provides stats through analytics. It can even be used as a research mechanism as it's a good tool to analyze and identify content gaps in the available data.

For more information, see some examples of WDQS queries. For GLAM-related content, see a page about GLAM and tools that use Structured data on Commons.

If more primary content is needed, see this gentle introduction to the Wikidata Query Service.

  • On Wikimedia Commons

One of the most useful ways to keeping up with data already uploaded and available on Wikimedia Commons is using the Wikimedia Commons Query Service (WCQS), a semantic query language for databases, that extracts structured data on Commons. It's also very useful for providing analytics and visualizations for the data uploaded, as well as for research purposes and for identifying content gaps.

This query engine is based on the Wikidata Query Service (see section above) and is still a beta version, therefore some recent changes might not appear just immediately. The service uses Wikibase and the Wikidata Query Help provides the documentation needed to use it.

Read more about the Commons Query Service, find some query examples, and learn more about how to find data using Structured Data on Commons.

PetScan

PetScan is an advanced search and query tool for Wikimedia projects. It generates lists of Wikipedia pages or Wikidata items that matches certain criteria, as items with a specific property or in a particular category.

It's possible to search for pages from different Wikimedia projects, by using categories, page properties, and templates. For Wikidata, it allows searches using Wikidata SPARQL queries (see Wikidata Query Service above).

Read about this tool and learn how to use it in its manual, Manual for PetScan.

Listeria

Listeria is very similar to PetScan, but it works as a bot that generates and updates lists on Wikipedia, based on Wikidata queries (see Wikidata Query Service above). The difference is that this bot works constantly and its lists, once published, are always up to date, with new and improved items being added when needed. As it works based on Wikidata, it also allows that changes made on the platform are automatically being updated for various Wikipedia languages at once.

Structured data on Commons has a query service. However, as it currently works in a beta version, Listeria has not yet been updated to generate lists for its queries. Find out more about this topic in the blog post about the Listeria Evolution.

Some examples of dynamic Wikidata Lists, or Listeria lists, are available on this page.

See also