Jump to content

AvoinGLAM/Media Art Archiving on Wikimedia Projects

From Meta, a Wikimedia project coordination wiki

Introduction

[edit]

Context and ambition

[edit]

How can artists, curators, cultural producers with smaller cultural organizations connect to larger ecosystems of open heritage knowledge?

Digital art communities, festivals, and individual artists provide alternative practices for engaging with digital cultural heritage and commons initiatives. These real-world examples illustrate effective practices in data modeling, metadata management, and database systems usage. However, connecting these practices to larger ecosystems of institutional culture and heritage archives, often raises significant challenges due to the centering of artists and artworks, rather than catering for events and temporary communities of practice.

Open ecosystems, such as the Wikimedia projects, are renowned for interconnecting reference materials across individual disciplinary or institutional silos. Despite this, emerging cultural practices frequently encounter obstacles within these environments particularly when inserting or connecting materials to standardized systems. Over the course of several expert meetings in 2024-25, these issues were addressed by focusing on the role of linked open data in cultural production and the preparation of metadata for integration. The participants of the meetings can be found in the credits at the end of this document.

Target audience

[edit]

This blueprint document is aimed at individuals or communities interested in open archives and heritage knowledge systems, considering making use of Mediawiki instances, Wikidata and/or Wikibase as potential structured data management solutions. It is primarily a guide to assist media art archiving on Wikimedia projects. However, it can also assist individual artists, cultural producers, curators, community archivists, cultural data managers to understand how to prepare cultural data, and what they wish to do with it.

This guide may be interesting for you if:

You are an artist

  • And you already have your own website presenting your work, but you are looking for a storage tool that will help you to browse your archive quickly and effectively. Queries allow you to ask a complex question from your archive.
  • You want to make your archive private, so that only you can see and edit the content, you can create your own instance that you can install on your own server.
  • You want to start documenting and archiving your artistic practice for the first time and are looking for a free and open source tool to do so.

You are a cultural producer

  • Perhaps you are a small art institution or cultural organization that has commissioned works in the past and would like to link information about the artists to other collections at other institutions. Maybe you have already collected and stored information about your artworks in Excel spreadsheets that you can feed into a database or wiki system.
  • Maybe you are an art institution with your own IT department that can host a private instance of MediaWiki or Wikibase on your own server. Maybe you want to develop your own custom template on top of your wiki database to differentiate your archive. For example, the Wikibase of the Foto Museum Antwerp.

You are a curator

  • Maybe you want to archive your past exhibitions and events, which include a wide variety of artifacts and materials such as websites, flyers, reviews, documentation, photography, etc.
  • Maybe you've done a lot of research on a specific topic and you'd like to share it beyond your exhibition, linking your sources to other existing catalogs and archives.

You are a community archivist

  • The wiki instance is a collaborative tool that allows you to create different levels of access. This means that you can decide in your group or community who is responsible for what part of your archive.
  • Maybe you're archiving different materials that don't fit into the standard categories of available public archives. Maybe you want to collaboratively describe and contextualize them in a Wikipedia-like way.
  • Maybe you don't have a programmer or server administrator in your group, but would like to use the cloud-based version of Wikibase without having to run your own server.

Terminology explainer

[edit]

Ecosystem: This term refers to the social-technical system that you are using for the inter-related components of your data, database, online platforms, its inputs, how data or actions take place in that system, its outputs. Don’t forget the humans involved to keep it operational.

Structured data: This is typically qualitative data that is organised and decipherable by machine algorithms. This can be as simple as a spreadsheet. Or it can be a relational database which allows structured queries (SQL).[1] In this document, structured data will also be used to refer to linked open data.

Linked Open Data: In the context of the Semantic Web.[2] (meta)data should not only be structured in a machine-readable format but it should also be published openly on the web, i.e. without prohibitive copyright restrictions, and it should be linked, i.e. following specific standards which allow various connections (in terms of querying and representation) to be drawn across heterogeneous databases. The standards that typically allow this linking and querying to happen include: RDF (data model), OWL (ontology) and SPARQL (query language).

Standard ontology: In information science, an ontology is the formal naming and definition of data, its categories, properties and relations with other data, so that there is a common reference.[3] In this document, we will refer to ontologies used in cultural contexts that typically contain classes of objects (e.g. works of art vs institutions) and properties (e.g. expressing relations between objects, creators, locations, etc). Standard ontologies relevant to the culture field include schema.org, CIDOC-CRM, PREMIS, and more.

Controlled vocabularies: If ontologies present a formal definition of the data, such as that it can be organised in a database and queried, vocabularies provide standardised terms to describe the data once it’s been categorised, e.g. if an object is classified as a work of art, a controlled vocabulary would provide the naming of specific formats, e.g. a painting, an installation, etc. Vocabularies have different degrees of granularities and hierarchies. Common vocabularies in the culture field include Getty AAT and Iconclass.

Semantic queries: To query the data is to retrieve something from the database based on a set of clear instructions. To make a semantic query is to use implicit or explicit reference to words, phrases, meanings, to give more ‘fuzzy’ answers based on patterns or context.[4] In this document, semantic queries are typically referred to in relation to a SPARQL endpoint (web service that lets you query and retrieve data from RDF databases).

Data federation: To federate the data is to have a database system that is made up of decentralised autonomous parts, with a common interface.[5] In other words, federation is about linking separate databases so you can search all of them at once without merging them.

👉 Additional definitions of federation are explored by the Wikibase community.

👉 See also and contribute to: Open Culture/GLAM Glossary.

Overview of the Wiki ecosystem

[edit]

Shared architecture and common principles

[edit]

MediaWiki is an open source software based on several core principles for creating, editing and sharing information on the web. It is most well-known in its public form – the international online encyclopedia project Wikipedia. Over the last few decades MediaWiki software has proliferated into an interconnected system of public applications – Wikidata, Wikimedia Commons, Wikisource – to complement Wikipedia. In addition to that, private instances of the software are used across different institutions and organisations for a variety of data management use cases. All MediaWiki software is multilingual and collaborative, designed for crowdsourcing information creation and editing. It implements version control natively, so that collaboration can be managed easily, allowing individual operations to be tracked and linked to their respective agent (human or software, e.g. bot). Apart from these common characteristics, the rest of the ecosystem is widely varied and requires a deeper dive into the individual specifics of public-vs-private instances to understand what application is the most appropriate in a specific media art archiving use case.

Working with data

[edit]

While the main MediaWiki software is designed for working with text, the need to add data related to the text articles resulted in the development of several software extensions. The two primary extensions to the core MediaWiki are Semantic MediaWiki and Wikibase.

Semantic MediaWiki

Semantic MediaWiki adds a layer of data that makes filtering and sorting information in a regular Wiki easier and more efficient. It is widely used in privately deployed instances for internal organizational knowledge management. See examples.

Wikibase, Wikidata, and Wikibase.cloud

The Wikibase extension was originally developed to run Wikidata. With well over a billion statements and 97 million items, Wikidata is currently the largest public knowledge base on the web. It is distinct from Wikipedia, for the fact that it stores structured data and makes that data available for querying via a SPARQL endpoint. Wikidata highlights the power of the combination of centralised and distributed approaches: A vast amount of information is accessible through a central endpoint in a standardised linked open data (LOD) format and all items have persistent identifiers (QIDs). This is also why it is used by various GLAMs and cultural heritage projects as a data repository for collections and archive data.

👉 See a list of efforts to discuss and/or structure cultural data on Wikidata.

There are numerous well-documented issues with the ontology design in Wikidata, including how curation can be managed across knowledge domains. On top of that, there are the issues of the vast scale of the project that affects performance, strict licensing requirements (all data has to be public) and, ultimately, the question of trustworthiness in a crowdsourced, public environment. This is why Wikibase, the underlying software suite, is increasingly deployed as an independent instance on private institutional servers, or via the free public cloud platform wikibase.cloud. In such instances, users can choose and configure access, licenses and the data model for their data with much more control and precision.

👉 Examples of projects using Wikibase.

👉 Learn more about the differences between Semantic MediaWiki, Wikibase and Wikibase.cloud.

👉 Learn more about how other software compares to Wikibase as a LOD management tool for cultural data.

Working with media files

[edit]

Wikimedia Commons

When data needs to be linked to images, or other forms of digital representation (AV files, 3D models, PDF documents, etc), the current practice within the Mediawiki ecosystem is to use the Wikimedia Commons platform. Files can be uploaded there individually or in bulk via bot libraries or open source tools such as OpenRefine. Once uploaded, media files can be reused in Wikipedia articles, in Wikidata items, or elsewhere on the web.

Files on Wikimedia Commons have strict rules about their copyright status. The uploader must have authority over the file's copyright to be able to share it with an open license, unless the work can be proven to be in the public domain. This limits the utility of the platform for smaller organisations that may want to retain copyrights over their materials, contemporary art collections, as well as scenarios where rights ownership may be ambiguous. In such cases, individual artists or organisations may prefer to store media files on their own host environment, e.g. private server or database, and only provide links to them in Wiki projects. Files with any chosen license or rights status can also be uploaded in privately hosted instances of Mediawiki or Wikibase, but not Wikibase.cloud.

👉 Learn more about how structured data is used in Wikimedia Commons and how that reflects on discovery and federation with Wikidata.

Example project implementations

[edit]

2012–2019
SFMOMA’s Media Wiki
[private Semantic MediaWiki]

Martina Haidvogl, and Layna White, “Reimagining the Object Record: SFMOMA’s MediaWiki," Stedelijk Studies Journal, no. 10 (2020), https://doi.org/10.54533/StedStud.vol010.art08.

Dušan Barok, Julia Noordegraaf, and Arjen P. de Vries, “From Collection Management to Content Management in Art Documentation: The Conservator as an Editor", Studies in Conservation 64, no. 8 (2019): 472–489, https://doi.org/10.1080/00393630.2019.1603921.

2015–ongoing
Rhizome, Artbase
https://artbase.rhizome.org/ [Wikibase]

The ArtBase is an archive with over 2,200 artworks to date, primarily hosting works of net art, but also including works that employ media such as software, code, websites, moving images, games, and browsers. Rhizome’s commitment to the preservation of works in the ArtBase has grown alongside the archive’s expansion in size, scope, and complexity over the years.

This is an archive of born-digital artworks from 1983 to the present day. A small sample of artworks are shown below; you can also browse the archive by date or by artist name. Some entries in this archive include external links to artworks maintained by artists or others. Some contain archived copies, hosted on Rhizome infrastructure. All of these, as well as forms of documentation, are called variants—distinct manifestations of the artwork, all accessible via the main artwork page.

2019–2020
Joan Jonas Knowledge Base
https://artistarchives.hosting.nyu.edu/JoanJonas/welcome.htm [Wikidata]

The Joan Jonas Knowledge Base (JJKB) is an open source digital resource housing information about the New York-based multimedia and performance artist Joan Jonas (b. 1936). It offers in-depth information about selected artworks and exhibition case studies and is intended as a resource for curators, conservators, exhibition designers, performers, and other art world professionals along with academic researchers.

Since a traditional SQL relational database couldn’t accommodate the material selected for the project, the team developing JJKB designed a linked data model to capture data in the project and to implement this solution through Wikidata. This solution supported cross-cultural and cross-institutional research and collaboration, giving visibility to the dataset and standardizing commonly recognized properties and terminology related to performance art.

2019–ongoing
ZKM’s Werke Wiki
https://werke.zkm.de/wiki/index.php/Documentation_model [Mediawiki]

In 2019, ZKM’s wiki was resurrected to document the software-based art collection. Improved and adapted to their specific needs, this new wiki led to a profound restructuring of documentation and management strategies for software-based artworks. The possibilities offered by this editing tool include version control and collaborative documentation, enabling the transfer of knowledge distributed among many individuals, years of communication via email and documents scattered over different analogue and digital locations to be gathered in one environment.

2021-2022
LI-MA, CopyClear
https://www.wikidata.org/wiki/Wikidata:CopyClear/LIMA [Wikidata]

For this project, all media art creators who appear in the collections of the project partners are placed on Wikidata or linked to existing entries. If possible a link will be made with RKD, and from this website the date of birth will automatically be added to the person. The current partners are: LIMA, Stedelijk Museum Amsterdam, Van Abbemuseum, RCE, Frans Hals Museum, ZKM.

2021–2023
Media Art History in Finland (MEHI)
https://media-art-finland.fi/ [Wikidata]

MEHI – Media Art History in Finland was a 3-year project initiated by the Finnish Media Art Network. The objective of the project was to record and publish the history of Finnish Media Art, and to build information infrastructures for its documentation in the future. MEHI project included a comprehensive reference database of Finnish media art works and events, as well as:

  • OMA - Ontology for Media Art - Finto, a special ontology for media art in three languages (English, Finnish, Swedish), based on General Finnish ontology YSO. OMA covers the many genres and parallel art forms of media art as well as concepts relating to technologies, materials and aesthetics of the field. Key sources used in creating the ontology include international thesauri of media art, literature and content descriptions of art works and expert consultations. A major part of the ontology was linked to Wikidata.
  • Database import to Wikidata and related Wikimedia outputs, including:

2022–ongoing
WikiProject Media Art
https://www.wikidata.org/wiki/Wikidata:WikiProject_Media_Art [Wikidata]

The goal of this WikiProject is to work and improve all items about Media Art. In particular its ambition is to be the hub for all media art metadata. To date it includes the following two related projects:

2023–ongoing
Critical Media Lab, Basel, Sharing Knowledge in the Arts
https://criticalmedialab.ch/projects/sharing-knowledge-in-the-arts/ [Wikibase]

The project “Sharing Knowledge in the Arts” analyzes what to learn from an experiment to design novel research infrastructures in and for the arts. The project aims to analyze “THEswissTHING” as a bottom-up initiative for sharing knowledge to understand how ethical issues of sharing were negotiated in this context and learning from those insights for current Open Research Data (ORD) publishing practices.

2024–ongoing
Zentrum für Netzkunst, Berlin, netart repository
https://netart.wikibase.cloud/ [Wikibase.cloud]

Netart repository is an online knowledge and data base of Zentrum für Netzkunst - a non-profit organization based in Berlin. Zentrum für Netzkunst reconstructs, maintains and preserves net art and net culture. The initiative explores the possibilities of archiving, contextualizing and mediating net art.

In 2024, artists Joachim Blank and Karl-Heinz Jeron (Blank & Jeron), together with Zentrum für Netzkunst decided to revisit their canonical piece without addresses and reconstruct it for the public, a new generation of researchers, and art historians. The aim was to collect and document all available data, fragments of data, and secondary documentation material saved by the artists. The resulting repository, stored on wikibase. In addition, a simulation of the variant 1 was created, allowing visitors to view the work in their contemporary browser.

Examples beyond the scope of media art

[edit]

Aziatische Keramiek (NL)
https://aziatischekeramiek.nl/ [based on the https://api.kunstmuseum.nl/ Wikibase]

Centar za dokumentiranje nezavisne kulture (HR)
https://abcdnk.hr/Arhiv [Mediawiki]

Delfts Aardewerk (NL)
https://delftsaardewerk.nl/ [based on the https://api.kunstmuseum.nl/ Wikibase]

Digital Archive of Artists’ Publishing | DAAP (UK)
https://daap.network [Wikibase]

Foto Museum Antwerp, Gevaert Paper Project (BE)
https://gevaert.fomu.be/ [Wikibase]

Manor houses in the Baltic Sea Region (DE)
https://wb.manorhouses.tibwiki.io/wiki/Hauptseite [Wikibase & Semantic MediaWiki]

Wikidocumentaries (FI)
https://wikidocumentaries-demo.wmcloud.org/ [Visualizing data and content from Wikimedia projects and third party repositories (eg. Finna.fi) based on Wikidata and federated content display]

Zürich Cantonal Heritage Conservation Office, Datenbank der Denkmalpflege Kanton Zürich (CH)
About: https://ad.zh.ch/odb/ [Semantic Mediawiki, internal]

Data access and usage rights

[edit]

Understanding the specifics of what openness on Wikimedia platforms means is key to choosing the right solution for your specific project.

[edit]

Wikimedia projects are storing and sharing only Open Access materials. The Open Definition created by the Open Knowledge Foundation defines the principles that are followed by Wikimedia projects as well.

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).

In practice, the materials that can be shared, must be either in public domain or licensed under open licenses.

Public domain works are creative works that are not protected by exclusive intellectual property rights, either because those rights have expired, been waived, or were never applicable.

Open licenses are legal tools that the rights holders can use to grant reuse rights for the public. Creative Commons licenses are the standard tools for granting public rights to online works.

[edit]

The creator of an artwork can upload a photo of their in-copyright artwork to Wikimedia Commons following a more complex scenario than usual.

First step: Check with your collecting society

[edit]

When an artist has given a collecting society the exclusive right to manage their copyright, they must check if their agreement permits them to openly license their works or parts of them.

Second step: Give the image of the artwork an open license

[edit]

The photo of the artwork can be treated as part of the artwork and therefore the creator can apply an open license to it. This will not change the copyright status of the original artwork.

Choose the license: The recommended open license in Wikimedia Commons is Creative Commons Attribution–ShareAlike 4.0 (CC BY-SA 4.0). More open conditions can be used, or other licensing frameworks that have equivalent conditions. All these rights must be cleared for openly licensing the image:

  • The artist, or all the artists in key roles have copyright.
  • Technical personnel of the artwork have neighbouring rights.
  • Photographer of the image has neighbouring rights.

Third step: Upload to Wikimedia Commons

[edit]

When the image is uploaded to Wikimedia Commons, this information must be recorded.

Acknowledge the underlying copyright: Send a verification to the Volunteer Response Team to clarify the use of the open license. For example:

I am he creator of the underlying work <name of the work> in this photo, and I have openly licensed this photo of the work with <license>. The photo is considered part of that work and can therefore be openly licensed. The underlying work remains protected by copyright.

Remember to also clear copyright for the photographer or have them send their own acknowledgement to the VRT team.

Metadata about the artworks on Wikidata

[edit]

The descriptive data of a dataset about artworks can be protected by database rights in the European jurisdictions, and the protection lasts for 20 years. To make the descriptive data interoperable for sharing online, it should be made available without copyright restrictions. The rights holder can waive copyright using Creative Commons Zero waiver. The rights holder is the dataset creator, not the artist, as individual facts cannot be protected by copyright.

Wikidata only accepts data that has been released as CC0, and the data provider is expected to demonstrate that.

Considerations when opening up data

[edit]

There are a number of reasons media or data is not suitable for sharing publicly, even if the copyright of the materials was cleared or they were public domain. The materials may contain personal information, or sensitive cultural data, for example. Creative Commons open licenses should not be used to grant or restrict the use of materials with such rights, as they only deal with copyright. You can explore the use of the following tools in those cases:

Personal data or sensitive information

[edit]

Clearing rights of the collection is not limited to copyright. When dealing with personal and/or sensitive information, it’s important to think about who has access to the data and to what degree, e.g. who can see the data, and can they see all of it; who can edit and/or add to the data; who can delete data and assign rights to other user groups. For a more detailed view on data stewardship policies and how they can be enacted with Wikidata vs Wikibase, consult the policy matrix prepared from the Wikibase Stakeholders Group.

Access to the original data and updates

[edit]

Additional questions to consider include how data is to be treated over time – does it need to be updated, how often and by whom? Is the process of collecting also going to be public or internal to your organization? Is it a historic or a living archive, with community members being able to edit and update data after publication?

For archives that will be primarily used internally to collect original data, with only a small part of it being made public in the end, solutions such as the internal Wikis at SFMoMA or ZKM (listed in the Examples section) are applicable. Parts of the data that are to be made publicly visible can then be published to Wikidata, similar to how the CopyClear, MEHI or Joan Jonas Knowledge Base projects demonstrate in the examples.

If the archive should be fully public, but only editable by an internal team of archivists and/or the community of contributing artists, then solutions such as Rhizome’s Artbase or the Zentrum für Netzkunst’s netart repository might be the better choice.

In cases dealing with public domain, non-sensitive, historic data which should remain fully open and fully editable by a community of stakeholders and/or the general public, then fully publishing the data on Wikidata, with associated multimedia on Wikimedia Commons, is a valid and appropriate choice.

Data preparation: collection and modeling

[edit]

Preparing data to get it ready for ingesting into a Wiki system, includes several key steps.

Data collection

[edit]

Data collection is the foundation upon which high-quality data contributions to Wiki projects are built. In the media and cultural sectors, data can reside in a myriad of environments, from unstructured text documents to sophisticated databases. The diversity of these sources presents both challenges and opportunities for effective data integration into Wikidata.

Formats for data collection and export

[edit]

When preparing data for contribution, it is crucial to ensure that the data is in a format that can be easily ingested by the platform. The following formats are commonly used for both data collection and export:

  • Databases (SQL/NoSQL): Structured query language (SQL) databases like MySQL or PostgreSQL are ideal for managing large, complex datasets. NoSQL databases such as MongoDB can be particularly useful for handling unstructured data or data with a flexible schema.
  • JSON (JavaScript Object Notation): JSON is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It is often used in web applications for transmitting data between a server and a web client.
  • XML (eXtensible Markup Language): XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is often used for the representation of arbitrary data structures and for the transportation of data across networks.
  • CSV (Comma-Separated Values): CSV is a simple file format used to store tabular data, where each line corresponds to a row and each field within the line is separated by a predefined delimiter, typically a comma. This format is widely supported and can be easily manipulated with spreadsheet software or text editors.

Data collection workflows

[edit]

To streamline the process of collecting data, it is beneficial to establish clear workflows. These workflows might include:

  • Data Auditing: Ensure that the data collected aligns with the project's goals and meets the necessary quality standards.
  • Data Extraction: Use additional tools and scripts to extract data from various sources, including databases, web pages, or PDF documents.
  • Data Transformation: Convert the extracted data into one of the preferred formats (JSON, XML, CSV) for compatibility with Wikidata's import tools.
  • Data Validation: Check the integrity and accuracy of the data through validation rules or by comparing it against a trusted source.
  • Data Documentation: Maintain comprehensive documentation to describe the data sources, collection methods, and any transformations applied.

Tools and tutorials supporting data collection

[edit]

There are many tools for preparing and collecting data. The simplest are spreadsheets such as Excel or Google Sheets. For more advanced tasks like cleaning, deduplication, or validation (e.g. checking date formats), tools like OpenRefine are useful. Although OpenRefine has a steep learning curve, it supports complete workflows from data collection to final upload into Wiki projects such as Wikidata, Wikimedia Commons, or individual Wikibase instances.

Tutorials on how to use OpenRefine to transform data can be found here:

Additional tutorials that can be helpful in media art contexts, especially around data preparation and transformation with OpenRefine can be found on the Joan Jonas Knowledge Base website.

Events supporting data collection

[edit]

Engaging with stakeholder communities – other archivists, art historians, curators or artists – is key to successful data collection. Various events can catalyze this process:

  • Workshops: Organize targeted workshops to train participants on how to extract and structure data according to Wiki projects’ standards. These can be in-person or virtual and tailored to different skill levels.
  • Hackathons: Host hackathons that focus on data extraction and preparation, encouraging collaboration and innovation among technologists, researchers, and data scientists.
  • Edit-A-Thons: Conduct Edit-A-Thons specifically aimed at collecting and inputting data into Wiki projects, providing a structured and community-driven approach to data contribution.
  • Community Science Projects: Engage with citizen science projects that collect data relevant to the media or cultural sectors. These projects often have a built-in audience interested in contributing data to public databases like Wikidata.

Data modeling

[edit]

Data modeling for media art datasets is arguably one of the hardest data preparation tasks before data can be meaningfully shared and made more widely accessible via Wiki projects. While most Wiki software, including Wikidata, offers much flexibility when it comes to structuring datasets according to specific categories (expressing object classes) and properties (expressing relations), this flexibility can also be a barrier due to a lack of easily applicable and readily available standard ontologies and vocabularies. Most of the projects outlined in the Examples section have had to deal with this issue and have set up specific models, ontologies, tutorials, and discussion spaces which are aimed at helping the rest of the media art community.

Worthwhile visiting and considering before starting your own project of data collection and modeling are the following resources:

Data management in the Wiki ecosystem

[edit]

Data management in the public platforms of the Wiki ecosystem is done primarily in Wikidata and partly in Wikimedia Commons. As Wikimedia Commons is focused on image and video material with a free license, it is rarely is suited for inclusion of media art beyond images of the artworks that have cleared rights (see the section on Data access and usage rights). In cases where you want to manage data that goes beyond copyright-cleared images, maintaining your own institutional or collective infrastructure via MediaWiki and/or Wikibase instances might be more appropriate.

Existing guidelines

[edit]

Managing data on Wiki platforms has been widely discussed across the GLAM sector, and there are many helpful guidelines (some are ongoing):

Editing and bulk upload, including tutorials

[edit]

Editing data in all MediaWiki projects can be done manually – via user interfaces, or in bulk – via script libraries or dedicated external tools.

When editing a single or a few items at a time, manual editing is efficient and fast. It is also a good way to create communities around projects during workshops or edit-a-thon events, as the learning curve for manual edits is significantly lower than using bulk tools.

A quick tutorial Adding manual edits to Wikidata - how to add statements with verifiable data and how to create items on Wikidata by Ewan McAndrew is also applicable to Wikibase instances.

When you need to edit existing data or add new data in bulk to projects like Wikidata, Wikibase or Wikimedia Commons, the most widely used methods include using Python scripts and services such as WikidataIntegrator, or open source tools with visual interfaces such as QuickStatements and OpenRefine.

A key step in bulk operations with OpenRefine is data reconciliation, that is linking data points from the original dataset to corresponding items in the Wiki projects. To perform reconciliation with the public projects, you need to use the available reconciliation endpoints. To perform reconciliation with your own private Wiki instance, you need to set up such a service.

To learn more about using these tools, including reconciliation, consult these tutorials:

Monitoring and data reuse

[edit]

As soon as data is uploaded to Wikidata or Wikimedia Commons, it becomes part of the living Wiki ecosystem. The data can be improved by other users or institutions manually or via bulk operations. Information such as birth dates can be pulled from other data sources that are connected to Wikidata, e.g. VIAF, ULAN and IMDB. In some cases the data may be unintentionally or intentionally vandalised, and it is recommended to monitor the data on a regular basis to keep track of the changes.

Community collaboration can also reveal when multiple items describe the same concept, object, or person. In such cases, items may be merged. Since reversing a merge is difficult, it can be safer to use the property “not the same as” to clarify that two people with the same name are distinct individuals.

Furthermore, when adding artworks that are closely related to each other (e.g. via versioning or derivation, or being part of the same series), it is useful to connect these artworks via respective properties. This is helpful for querying and allows to better distinguish items that may appear to be the same or very similar.

A way to monitor and continue to improve data on an ongoing basis is by setting up Listeria lists. These are automatically created lists on Wiki pages using the Listeria tool. The lists are based on SPARQL queries and can be set up based on specific property preferences, for example, looking for video artists without a birth date. Listeria automatically creates tables based on the query results which are updated every day and can be useful to monitor how data changes over time and if manual intervention by a curator may be necessary. The resulting table lists create a low-level entrance for GLAM professionals and others to improve the data.

Last but not least, one of the advantages of adding some or all of your data to collaborative, public Wiki projects is that the data can be reused in other platforms, e.g. institutional websites and you can even take advantage of the possibility that the data in the end may be richer than the original data. Keeping local data mapped to corresponding entries in Wikidata and/or Wikipedia allows for data statements from Wikidata or articles from Wikipedia to be embedded in content and collection management systems in various ways – i.e. directly via iFrames, or via API calls.

Visualisation

[edit]

Structured data can also be used to generate a range of different data visualisations. These can be used for statistical analysis, monitoring data with a visual approach that makes it easier to spot changes or inconsistencies, as well as directly into the user interfaces of web-based applications for making data more interactive and browsable. The following tools are available openly and work well with the WIkdiata SPARQL endpoint, but can also be adapted for independent MediaWiki instances:

For tutorials that delve deeper into using Wikidata’s SPARQL endpoint and writing your own queries, you can consult these pages:

Additional upload considerations

[edit]

Notability in the Wikimedia ecosystem

[edit]

Items in the Wikimedia projects must meet a notability requirement to be considered important enough to serve the goals and ambitions of the open knowledge movement behind Wikimedia. The criteria for notability can be critiqued especially as they have been shown to be biased against marginalised social groups, and there are continuous debates in the Wiki user and developer communities how to address these shortcomings. On Wikidata, unlike Wikipedia, the criteria are more permissive and an item can be added when:

  • It refers to an instance of a clearly identifiable conceptual or material entity that can be described using serious and publicly available references.
  • It fulfills a structural need, for example: it is needed to make statements made in other items more useful.

Media artists are ‘notable’ as soon as they are mentioned in a collection, publications, books, exhibitions or notable works. In some cases where easily referenced materials are not available, artist items can also be added based on the ‘structural need’ argument.

Media artworks are also generally considered ‘notable’, but specific tapes or versions may not be. This depends on how those items can be described or referenced. For example, if two museums have distinct versions of a media artwork in their collections, two items can be created in Wikidata, but if one museum has several tapes of that same artwork, these could be added as several references but not as several items. In the latter case, maintaining a MediaWiki and/or Wikibase internal database can be helpful, with links from the internal database towards the public platforms. Note that the internal database can also be public with options to limit editing or certain views for specific user groups only (example is Rhizome’s Artbase).

FAIR data and Wikimedia projects

[edit]

WikiFAIR is an initiative within the Wikimedia ecosystem that aims to help research projects adopt the FAIR principles (for Findable, Accessible, Interoperable, and Reusable data) by integrating Wikimedia platforms like Wikidata and Wikimedia Commons into research data management. It provides a set of ideas, instructions, and helpful examples to achieve the FAIR principles in research projects using Wikimedia systems and technologies.

Service and software maintenance

[edit]

One of the main advantages of working with Wikimedia projects is the open source software practices behind the projects and the vibrant international community of users and developers contributing to the active further development of features, extensions, plugins, widgets, software updates and support tools around the ecosystem. With Wikimedia Foundation, various national chapters, and affiliate user groups behind long-term projects and maintenance initiatives, the ecosystem has built up resilience and is as likely to persist long-term as any other large commercial internet service endeavour, if not even more so.

Several venues serve as active hubs for communication and consultation on the current and future state of the ecosystem. Among them are:

Credits

[edit]

The first version of this document was written and edited by: Lozana Rossenova (Open Science Lab, TIB Hannover), Susanna Ånäs (AvoinGLAM, MEHI, ISEA94), Andrew Gryf Paterson (AvoinGLAM, Pixelache), Tereza Havlíková (Zentrum für Netzkunst), Joost Dofferhoff (LI-MA), Gaby Wijers (LI-MA), Hanno Lans, Andreas Kohlbecker (ZKM), Dušan Barok (Monoskop), Chiara Borgonovo, and Annet Dekker (University of Amsterdam).

The document draws on and links out to much existing work carried out by members of the international GLAM and Wiki communities and remains open to contributions and further improvement.

References

[edit]
  1. "Structured vs. Unstructured Data: What’s the Difference? | IBM". www.ibm.com. 2025-02-07. Retrieved 2025-08-21. 
  2. "Web Standards". W3C. Retrieved 2025-08-21. 
  3. Ontology (information science), 2025-08-16, retrieved 2025-08-21 
  4. Semantic query, 2025-08-11, retrieved 2025-08-21 
  5. Federated database system, 2025-06-21, retrieved 2025-08-21