GLAM/Metadata standards and Wikimedia
This page collects widely-used metadata standards in the GLAM sector (arts and culture; Galleries, Libraries, Archives and Museums) and indicates (if applicable) how they integrate with, 'map' or crosswalk with Wikimedia platforms.
Created in May 2020. Please add to this page, and help keep it up to date!
- Lists cultural heritage-related standards that will likely be encountered in practice / 'in the field' - regularly and internationally - when contributing a GLAM collection (data or media files) to either Wikidata, Wikimedia Commons, Wikisource, or another Wikimedia project.
- Focuses on Linked Open Data or structured data, i.e. cultural data on Wikidata and in structured data on Wikimedia Commons).
- Contains protocols, vocabularies, conceptual models, data models, implementation profiles of conceptual models, serialisation formats... and many other types of what may be called 'metadata standards and formats' in the broadest sense. When drafting this document for the first time, an attempt was made to sort and classify these GLAM metadata standards – but that proved to be a challenging exercise. If you are interested in a typology of metadata standards in the cultural heritage sector, Jenn Riley's Seeing Standards is a great resource.
The list below is certainly not complete. Please improve and extend this page, especially on widely-used metadata standards, and help keep it up to date.
This list can also be used to help prioritize GLAM metadata standards that need to be supported in improved tools for GLAM-Wiki projects: software for batch uploads to Wikidata and Wikimedia Commons, and software that supports metadata roundtripping between Wikimedia platforms and GLAM databases.
Besides the widely- and internationally-used vocabularies and authority files in the list below, Wikidata contains many thousands of other links and mappings to vocabularies and authority files, many of these specific to a country and a discipline.
If you are interested in discovering country- and topic-specific Wikidata identifiers and properties for vocabularies and authority files, you can explore them via (among others) the following means:
- SPARQL queries - example query: Wikidata properties for an identifier AND relevant to Belgium: https://w.wiki/TtP
- Browse Wikidata's list of properties; this page also lists various tools with which you can browse Wikidata's properties
- The Mix'n'match tool is used for matching external databases with Wikidata entities. It also has a search and browse functionality (use 'Search catalogs' in this case). Please note that Mix'n'match does not contain all relevant vocabularies and authority files, but only a selection.
Overview of GLAM metadata standards and models
|Name + link to official website||Wikidata item||Short description||Used in sector(s)||Examples of usage (preferably in the GLAM sector)||Usage and mapping/crosswalking on Wikimedia projects|
|OAI-PMH||Q2430433||Open Archives Initiative Protocol for Metadata Harvesting. A mechanism for interoperability between data repositories.||Cultural sector at large (and more broadly)||Used by (large-scale) cultural aggregators to import data from their providers. Examples:||(May 2020) Not in use by Wikimedia projects (anymore), but worth investigating in the context of (future) software development for batch uploading and data roundtripping|
|ResourceSync||-||A synchronization framework for web content.||?||?||(May 2020) Not in use by Wikimedia projects (yet), but worth investigating in the context of (future) software development, especially for metadata roundtripping and synchronization between Wikimedia projects and external data providers|
|Activity Streams||Q4677626 (?)||A JSON-based data format for encoding and transferring activity/event metadata of a website||Across the web; predominantly social media platforms||Examples (not GLAM related). The in-development IIIF Discovery API uses ActivityStreams.||(May 2020) Wikimedia projects (through the MediaWiki software) provide a similar mechanism: EventStreams.|
|IIIF (International Image Interoperability Framework)||Q22682088||A set of APIs (Application Programming Interfaces) designed to operate with the storage and presentation of digitized objects via a web-based interface||Cultural sector at large||Widely used and supported in the cultural sector. The IIIF Consortium collects a list of members, and institutions that use IIIF. Adopters include British Library, DPLA, Internet Archive, many national libraries.||(May 2020) A group of Wikimedians has, for a while, been interested in IIIF implementation for Wikimedia, and some prototype implementations have been developed by volunteers. See the contact page of this initiative.
Wikidata has a property (P6108) that points to institutions' IIIF manifests. See documentation and usage of this property and a SPARQL query of items that are described with this property.
Tracked in Phabricator:
|CIDOC CRM (CIDOC Conceptual Reference Model)||Q624005||An ontology for describing the implicit and explicit concepts and relationships used in cultural heritage documentation||Mainly museum sector||The CIDOC-CRM conceptual model is typically applied in profiles or implementations using a sub-set of it. LinkedArt and LIDO (see below) are both examples of CIDOC-CRM implementations.||(May 2020) A partial (and experimental) crosswalk to proposed properties for Structured Data on Wikimedia Commons (needs revision) and some mapping on Wikidata:|
|LIDO (Lightweight Information Describing Objects)||Q1249973||XML harvesting schema for describing museum and collection objects||Mainly museum sector||Many Collections Management Systems support export of collections data in LIDO-format "out of the box". The museum collections aggregator MUSEU-HUB has LIDO as its primary in-data format.|
|Linked Art||-||Shared data model based on Linked Open Data used to describe art. Components include CIDOC-CRM and the Getty Vocabularies, described elsewhere in this overview.||Mainly museum sector; focus on art museums||See information related to CIDOC CRM and the Getty Vocabularies (AAT, ULAN, TGN) in this list.|
|EDM (Europeana Data Model)||-||Data model for structuring and representing data delivered to Europeana by the various contributing cultural heritage institutions.||Cultural heritage (broadly)||Institutions and cultural aggregators that contribute data to Europeana|
|DPLA-MAP(DPLA Metadata Application Profile)||-||The basis for how metadata is structured and validated in DPLA, guiding how metadata is stored, serialized, and made available through DPLA's API in JSON-LD. Based on Europeana's EDM mentioned above.||Cultural heritage (broadly)||Institutions and cultural aggregators that contribute data to DPLA (Digital Public Library of America)||Select collections from the DPLA are proactively being uploaded to Commons.|
|MARC / MARC21 (Machine-Readable Cataloging)||Q722609||MARC standards are a set of digital formats for the description of items catalogued by libraries, such as books||Libraries||Widely used in the library sector.|
|Schema.org||Q3475322||Schemas (vocabularies) for structured data on the internet||Across the Web||(May 2020) Integration between Schema.org and Wikidata discussed in 2017 at Wikidata:Schema.org (status?).
Mapping of Wikidata items and properties to Schema.org:
|Dublin Core||Q624610||A set of vocabulary terms that can be used to describe digital resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. Dublin Core serialised in XML is the default output of a protocol compliant OAI-PMH-server.||Across the Web||Mapping of Wikidata items and properties to Dublin Core:|
|FRBR (Functional Requirements for Bibliographic Records) and derivatives||Q16388||
Conceptual model for structuring bibliographic databases
|Libraries||Wikidata's WikiProject Books has built its data models upon FRBR principles.|
|ISAD(G) (General International Standard Archival Description)||Q1654544||Defines the elements that should be included in an archival finding aid||Archives||Wikidata's WikiProject Archival Description has some mappings to ISAD(G) on its Data Structure subpage (in French)|
|Creative Commons licenses||Q284742||Copyright licenses that enable the free distribution of an otherwise copyrighted "work"||Across the Web||Available as items on Wikidata and templates on Wikimedia Commons.|
|RightsStatements.org statements||Q47530706||A set of 12 standardized rights statements for online cultural heritage||Cultural sector at large||Available as items on Wikidata. The applicability of specific RS.org statements on Wikimedia projects is documented at Wikidata's Help:Copyrights page.|
|AAT (Getty's Art and Architecture Thesaurus)||Q611299||A structured vocabulary containing terms and other information about concepts. Terms in AAT may be used to describe art, architecture, decorative arts, material culture, and archival materials.||Mainly museums (visual arts, architecture)||
|ULAN (Getty's Union List of Artist Names)||Q2494649||A structured vocabulary containing names and other information about artists, patrons, firms, museums, and others related to the production and collection of art and architecture||Mainly museums (visual arts, architecture)||
|TGN (Getty Thesaurus of Geographic Names)||Q1520117||A structured vocabulary containing names and other information about places around the world.||
|Iconclass||Q1502787||Hierarchical system for describing creative content||Mainly museums (mainly pre-20th Century Western art)||Used by several major museums and cultural heritage websites, for instance
Wikidata, Wikimedia's general knowledge base
Several thematic WikiProjects on Wikidata focus on arts and culture. Many of these also document data models on Wikidata, and sometimes also mappings to external data models maintained in the cultural sector.
WikiProjects on Wikidata are the central place where active Wikimedia community members gather around shared topics of interest, and are also the best venue to get in touch with the community and to ask questions.
For GLAM-Wiki and cultural heritage in general, the best starting point is WikiProject Cultural heritage on Wikidata - it acts as a hub and pointer towards many more specialized culture-related WikiProjects. Also see the very comprehensive 'navbox' on cultural heritage on Wikidata (navigation table - click on 'Expand' to view the entire box) that links out to a lot of active WikiProjects.
A few general and high-level WikiProjects include:
- WikiProject Archival Description gathers best practices for the description of archival collections on Wikidata
- WikiProject Books mainly documents how books are described on Wikidata and how this data then corresponds to templates on other Wikimedia projects.
- WikiProject Visual arts
- The WikiProject Visual arts/Item structure subpage contains detailed information about various types of visual arts, with links to relevant documentation where available.
- WikiProject Performing arts contains detailed information and examples around data modeling on Wikidata for the performing arts: music, theater, dance, opera...
- WikiProject Performing arts also maintains a page on (external) data models related to performing arts, including many of the standards mentioned on this page.
Wikimedia Commons, Wikimedia's media repository
- Before the deployment of structured data on Wikimedia Commons (before 2019), GLAM files have been contributed in many diverse ways, mapping metadata from GLAM databases to a wide variety of infobox templates on Wikimedia Commons. Widely used infobox templates (several of these also supported by the batch upload tool Pattypan) are:
- With the deployment of structured data on Wikimedia Commons in 2019, the Commons community has started discussing data models (including for GLAM files) at Commons:Structured_data/Modeling. This is the central place where discussion and documentation of data models takes place, and a good place to get in touch with the community and ask questions.
- Also see the GLAM subpage of Structured Data on Commons for links to example/pilot projects.
- The challenge of data modeling for creative works versus files representing these creative works is explained in this blog post (February 2020).