GLAM/Metadata standards and Wikimedia

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T249690

This page collects widely-used metadata standards in the GLAM sector (arts and culture; Galleries, Libraries, Archives and Museums) and indicates (if applicable) how they integrate with, 'map' or crosswalk with Wikimedia platforms.

Created in May 2020. Please add to this page, and help keep it up to date!

Scope[edit]

This overview:

  • Lists cultural heritage-related standards that will likely be encountered in practice / 'in the field' - regularly and internationally - when contributing a GLAM collection (data or media files) to either Wikidata, Wikimedia Commons, Wikisource, or another Wikimedia project.
  • Focuses on Linked Open Data or structured data, i.e. cultural data on Wikidata and in structured data on Wikimedia Commons).
  • Contains protocols, vocabularies, conceptual models, data models, implementation profiles of conceptual models, serialisation formats... and many other types of what may be called 'metadata standards and formats' in the broadest sense. When drafting this document for the first time, an attempt was made to sort and classify these GLAM metadata standards – but that proved to be a challenging exercise. If you are interested in a typology of metadata standards in the cultural heritage sector, Jenn Riley's Seeing Standards is a great resource.

The list below is certainly not complete. Please improve and extend this page, especially on widely-used metadata standards, and help keep it up to date.

This list can also be used to help prioritize GLAM metadata standards that need to be supported in improved tools for GLAM-Wiki projects: software for batch uploads to Wikidata and Wikimedia Commons, and software that supports metadata roundtripping between Wikimedia platforms and GLAM databases.

Thousands of vocabularies and authority files[edit]

Besides the widely- and internationally-used vocabularies and authority files in the list below, Wikidata contains many thousands of other links and mappings to vocabularies and authority files, many of these specific to a country and a discipline.

If you are interested in discovering country- and topic-specific Wikidata identifiers and properties for vocabularies and authority files, you can explore them via (among others) the following means:

  • SPARQL queries - example query: Wikidata properties for an identifier AND relevant to Belgium: https://w.wiki/TtP
  • Browse Wikidata's list of properties; this page also lists various tools with which you can browse Wikidata's properties
  • The Mix'n'match tool is used for matching external databases with Wikidata entities. It also has a search and browse functionality (use 'Search catalogs' in this case). Please note that Mix'n'match does not contain all relevant vocabularies and authority files, but only a selection.

Overview of GLAM metadata standards and models[edit]

Name + link to official website Wikidata item Short description Used in sector(s) Examples of usage (preferably in the GLAM sector) Usage and mapping/crosswalking on Wikimedia projects
OAI-PMH Q2430433 Open Archives Initiative Protocol for Metadata Harvesting. A mechanism for interoperability between data repositories. Cultural sector at large (and more broadly) Used by (large-scale) cultural aggregators to import data from their providers. Examples: (May 2020) Not in use by Wikimedia projects (anymore), but worth investigating in the context of (future) software development for batch uploading and data roundtripping
ResourceSync - A synchronization framework for web content. ? ? (May 2020) Not in use by Wikimedia projects (yet), but worth investigating in the context of (future) software development, especially for metadata roundtripping and synchronization between Wikimedia projects and external data providers
Activity Streams Q4677626 (?) A JSON-based data format for encoding and transferring activity/event metadata of a website Across the web; predominantly social media platforms Examples (not GLAM related). The in-development IIIF Discovery API uses ActivityStreams. (May 2020) Wikimedia projects (through the MediaWiki software) provide a similar mechanism: EventStreams.
IIIF (International Image Interoperability Framework)
Q22682088 A set of APIs (Application Programming Interfaces) designed to operate with the storage and presentation of digitized objects via a web-based interface Cultural sector at large Widely used and supported in the cultural sector. The IIIF Consortium collects a list of members, and institutions that use IIIF. Adopters include British Library, DPLA, Internet Archive, many national libraries. (May 2020) A group of Wikimedians has, for a while, been interested in IIIF implementation for Wikimedia, and some prototype implementations have been developed by volunteers. See the contact page of this initiative.

Wikidata has a property (P6108) that points to institutions' IIIF manifests. See documentation and usage of this property and a SPARQL query of items that are described with this property.


IIIF support for Wikimedia projects is tracked on the Phabricator bug reporting system:

CIDOC CRM (CIDOC Conceptual Reference Model) Q624005 An ontology for describing the implicit and explicit concepts and relationships used in cultural heritage documentation Mainly museum sector The CIDOC-CRM conceptual model is typically applied in profiles or implementations using a sub-set of it. LinkedArt and LIDO (see below) are both examples of CIDOC-CRM implementations. (May 2020) A partial (and experimental) crosswalk to proposed properties for Structured Data on Wikimedia Commons (needs revision) and some mapping on Wikidata:
LIDO (Lightweight Information Describing Objects) Q1249973 XML harvesting schema for describing museum and collection objects Mainly museum sector Many Collections Management Systems support export of collections data in LIDO-format "out of the box". The museum collections aggregator MUSEU-HUB has LIDO as its primary in-data format.
Linked Art - Shared data model based on Linked Open Data used to describe art. Components include CIDOC-CRM and the Getty Vocabularies, described elsewhere in this overview. Mainly museum sector; focus on art museums See information related to CIDOC CRM and the Getty Vocabularies (AAT, ULAN, TGN) in this list.
EDM (Europeana Data Model) - Data model for structuring and representing data delivered to Europeana by the various contributing cultural heritage institutions. Cultural heritage (broadly) Institutions and cultural aggregators that contribute data to Europeana
DPLA-MAP(DPLA Metadata Application Profile) - The basis for how metadata is structured and validated in DPLA, guiding how metadata is stored, serialized, and made available through DPLA's API in JSON-LD. Based on Europeana's EDM mentioned above. Cultural heritage (broadly) Institutions and cultural aggregators that contribute data to DPLA (Digital Public Library of America) Select collections from the DPLA are proactively being uploaded to Commons.
MARC / MARC21 (Machine-Readable Cataloging) Q722609 MARC standards are a set of digital formats for the description of items catalogued by libraries, such as books Libraries Widely used in the library sector.
Schema.org Q3475322 Schemas (vocabularies) for structured data on the internet Across the Web (May 2020) Integration between Schema.org and Wikidata discussed in 2017 at Wikidata:Schema.org (status?).

Mapping of Wikidata items and properties to Schema.org:

Dublin Core Q624610 A set of vocabulary terms that can be used to describe digital resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. Dublin Core serialised in XML is the default output of a protocol compliant OAI-PMH-server. Across the Web Mapping of Wikidata items and properties to Dublin Core:
FRBR (Functional Requirements for Bibliographic Records) and derivatives Q16388

Conceptual model for structuring bibliographic databases

Libraries Wikidata's WikiProject Books has built its data models upon FRBR principles.
ISAD(G) (General International Standard Archival Description) Q1654544 Defines the elements that should be included in an archival finding aid Archives Wikidata's WikiProject Archival Description has some mappings to ISAD(G) on its Data Structure subpage (in French)
Creative Commons licenses Q284742 Copyright licenses that enable the free distribution of an otherwise copyrighted "work" Across the Web Available as items on Wikidata and templates on Wikimedia Commons.
RightsStatements.org statements Q47530706 A set of 12 standardized rights statements for online cultural heritage Cultural sector at large Available as items on Wikidata. The applicability of specific RS.org statements on Wikimedia projects is documented at Wikidata's Help:Copyrights page.
AAT (Getty's Art and Architecture Thesaurus) Q611299 A structured vocabulary containing terms and other information about concepts. Terms in AAT may be used to describe art, architecture, decorative arts, material culture, and archival materials. Mainly museums (visual arts, architecture)
ULAN (Getty's Union List of Artist Names) Q2494649 A structured vocabulary containing names and other information about artists, patrons, firms, museums, and others related to the production and collection of art and architecture Mainly museums (visual arts, architecture)
TGN (Getty Thesaurus of Geographic Names) Q1520117 A structured vocabulary containing names and other information about places around the world.
Iconclass Q1502787 Hierarchical system for describing creative content Mainly museums (mainly pre-20th Century Western art) Used by several major museums and cultural heritage websites, for instance
  • Rijksmuseum (example)
  • Rijksdienst voor Kunsthistorische Documentatie (RKD) (example)
  • Wikidata concepts are (partially) mapped to Iconclass. See SPARQL query of mapped concepts: https://w.wiki/Rxp
  • Creative works on Wikidata are sometimes described with Iconclass codes. See SPARQL query of works where this is the case: https://w.wiki/Rxr
  • Wikidata properties:
    • P1256 (Iconclass notation - used to connect concepts to their corresponding Iconclass code)
    • P1257 (Depicts Iconclass notation - used to describe creative works with Iconclass codes)
  • More mappings can be made with the Iconclass catalog in the Mix'n'match tool


GeoNames Q830106
Perio.do

Pointers to GLAM-related data modeling on Wikimedia projects[edit]

Wikidata, Wikimedia's general knowledge base[edit]

Several thematic WikiProjects on Wikidata focus on arts and culture. Many of these also document data models on Wikidata, and sometimes also mappings to external data models maintained in the cultural sector.

WikiProjects on Wikidata are the central place where active Wikimedia community members gather around shared topics of interest, and are also the best venue to get in touch with the community and to ask questions.

For GLAM-Wiki and cultural heritage in general, the best starting point is WikiProject Cultural heritage on Wikidata - it acts as a hub and pointer towards many more specialized culture-related WikiProjects. Also see the very comprehensive 'navbox' on cultural heritage on Wikidata (navigation table - click on 'Expand' to view the entire box) that links out to a lot of active WikiProjects.

A few general and high-level WikiProjects include:

Wikimedia Commons, Wikimedia's media repository[edit]