Open Science for Arts, Design and Music/Guidelines/Data repositories

From Meta, a Wikimedia project coordination wiki

Data repositories[edit]

Why data repositories?[edit]

Repositories are the safest homes for research outputs with long-term availability guarantees, secure access protocols, standard licensing and versioning options that come with many benefits compared to simply publishing your research outputs or artefacts on a website or leaving them on your GitHub repository. By sharing your resources in a trusted repository,

  • You can make sure that others (humans and machines alike) can find your work, beyond accidentally stumbling across it on your project website. Repositories usually have their own discovery platforms and are harvested by other scholarly databases.
  • You can make sure that your data will be preserved over a long term, and therefore remain available for future research or verification.
  • You make your data uniquely identifiable and citable and visible for machines and information management systems too. Assigning a persistent identifier to your data will point to their exact location, even if that changes over time - no broken links! You can read more about the value of persistent identifiers below.
  • You add rich metadata and a standard license to your work when publishing them in a repository so that reuse conditions are clear.
  • You can control who has access to your outputs by using the authentication control and - if needed - embargo features of the repository.
  • You have the possibilities to add different versions of your outputs and clearly indicate the latest one.
  • You can clearly define how to cite your work, or can follow the citation standards of the repository.
  • Sharing your data in a repository inevitably brings quality improvement. By depositing them, you will clean them, document them to make them understandable for third parties and add the metadata necessary for their deposit.
  • Last but not least: you comply with funder requirements as they usually require sharing resources associated with the research project in a trusted repository.

(adapted from here: https://campus.dariah.eu/resource/posts/dariah-pathfinder-to-data-management-best-practices-in-the-humanities)

If your research team still wishes to display the project outputs, resources on an institutional or project website, that is of course also possible. For that, consult the "What if I wish to display them elsewhere too (i.e. on a project/institutional website)?" section below. (to link!)

How to select the most suitable repository for my resources?[edit]

Depending on the scope of your project and the volume of your data, depositing can be realised in a complex, iterative process with a strong emphasis on precise data quality assessment, but it can also be simple and easy. As a first step, check which kinds of support structures are available in your institution. If your institution has solid data repository facilities and/or data stewards who can help your work, you are lucky! It is well-worth to involving them from the project planning phase onwards and discussing the best solutions, which work well for your research but are also compliant with the standards, specifications and protocols along which the repository operates. Repository staff can also assist you with understanding any specific data management requirements and associated costs. If your funder requires a data management plan, this information should be included.

In deciding where to store your data, you may have a number of choices about who will look after it and how its findability and potential can be maximised.

Where can I find repositories for my resources?[edit]

You can use generic repositories such as Zenodo, which will take almost any data sets, but you can also use more specific ones such as the Swiss National Data and Service Center for the Humanities (DaSCH) or the TextGrid Repository, primarily serving text-based humanities disciplines but containing also arts and music resources. If your institution has data sharing policies, it is advisable to use your own institutional repository. The main standard to assess quality criteria of data repositories is the CoreTrustSeal certificate. Choose Core Trust Seal certified repositories to make sure that your data will remain available in the future in a secure, sustainably maintained and curated environment.

You can find suitable research data repositories that best match the technical or legal requirements of your research data in the re3data.org database that you can browse by country, disciplines, Core, Trust, Seal compliance and many other parameters.

(adapted from here: https://campus.dariah.eu/resource/posts/dariah-pathfinder-to-data-management-best-practices-in-the-humanities)

Where can I find repositories for legal copies of my publications (pre-prints or post-prints)?[edit]

If you are looking for repositories for your publications (pre-prints and post-prints), you can find the most suitable one via the Registry of Open Access Repositories.

Where can I find repositories for research software?[edit]

Serving as an environment for experiments, visualisations, installations or research analysis, research software can be a valuable research output on its own and as such, it is also worth to be shared alongside other resources. Currently, there are two major repositories in Europe that are committed to software archiving, preservation and citation: Zenodo and Software Heritage.

1.) Sharing and archiving research software via Zenodo:

Relevant GitHub repositories can easily be connected to the project’s Zenodo collection to deposit software releases from GitHub to Zenodo, together with the provision of appropriate metadata, providing contextual information about the software. Zenodo mints DOIs for each released version of the software, and also creates a concept DOI which refers to all versions of a given software. This way, a PID will be assigned to all versions and specific deployments of source codes. This resource guides you through how to set up automated or semi-automated Github --> Zenodo or GitLab --> Zenodo exports: https://genr.eu/wp/cite/#Authorise

2) Sharing and archiving research Software via Software Heritage

Software Heritage is another publicly funded, European open archive that harvests all public GitHub repositories to ensure long-term availability, traceability and citability of source codes of research software. You can get started here: https://www.softwareheritage.org/faq/

Where can I find repositories for research software?[edit]

Serving as an environment for experiments, visualisations, installations or research analysis, research software can be a valuable research output on its own and as such, it is also worth to be shared alongside other resources. Currently, there are two major repositories in Europe that are committed to software archiving, preservation and citation: Zenodo and Software Heritage.

1.) Sharing and archiving research software via Zenodo:

Relevant GitHub repositories can easily be connected to the project’s Zenodo collection to deposit software releases from GitHub to Zenodo, together with the provision of appropriate metadata, providing contextual information about the software. Zenodo mints DOIs for each released version of the software, and also creates a concept DOI which refers to all versions of a given software. This way, a PID will be assigned to all versions and specific deployments of source codes. This resource guides you through how to set up automated or semi-automated Github --> Zenodo or GitLab --> Zenodo exports: https://genr.eu/wp/cite/#Authorise

2) Sharing and archiving research Software via Software Heritage

Software Heritage is another publicly funded, European open archive that harvests all public GitHub repositories to ensure long-term availability, traceability and citability of source codes of research software. You can get started here: https://www.softwareheritage.org/faq/