Open Science for Arts, Design and Music/Guidelines/Storing, publishing large volumes of data

From Meta, a Wikimedia project coordination wiki

Storing, publishing large volumes of data[edit]

Criteria to select a data storage[edit]

The volume and granularity of your data define optimal storage methods and locations both during and after the analysis phase. To make an informed decision, it is worth to keep a few perimeters in mind, such as:

  • the size and nature of your data (mega- or gigabytes vs. tera-or petabytes)
  • Wether data is hot or cold (i.e. whether it belongs to an ongoing or concluded research activity)
  • Number of partners who need simultaneous access
  • Desired types of access to the data (through landing pages or APIs, or a SPARQL endpoint etc.)
  • Privacy concerns
  • Optimal transfer and retrieval time
  • The costs of storage
  • Length of storage

Institutional cloud storage for actively curated data[edit]

During the active phases of the project, a good practice is to use one’s institutional cloud storage solutions (this can be an institutional OneDrive, Dropbox or other solutions) to store and backup large volumes of data.

Data repositories and limitations in terms of volume[edit]

When it comes to publication and long-term archiving of research data, institutional, thematic or genic data repositories usually have an upper limit for size per data record. The default genetic data repository, Zenodo maximizes this in 50 GB per dataset with the possibility of parsing larger datasets to smaller units as individual data records (https://help.zenodo.org/), while the Swiss National Data and Service Center for the Humanities(DaSCH) repository service that is is free of charge for national research projects or those with Swiss participation requires annual cost sharing by the project or its hosting institution for data volumes exceeding 500 GB (https://www.dasch.swiss/pricing)

Dedicated storage for large volumes of data[edit]

In some cases, institutional, national or thematic data centers offer storage and archiving services specifically to that require large quantities of research data (typically terabytes), usually alongside supercomputing facilities. These offer a bigger data container than the average data record unit of a data repository. A case in point from the arts and humanities domain is the French Huma-Num Box (https://documentation.huma-num.fr/humanum-en/). It offers a secure and long-term storage for data sets, mainly large ones (several hundred terabytes in total). The device uses magnetic disks and magnetic tapes to store data. Data deposits can be both “warm” or “cold” , including digitised cultural heritage collections, photos, audio recordings, maps, videos and 3D models. Importantly and to enhance discoverability, reusability and generic user-friendliness of the deposited data volumes, such Huma-Num Boxes can be easily connected to web-based publishing and web application systems such as Omeka or IIIF. Full description of the service is available (in French) here:https://documentation.huma-num.fr/humanum-box/ For further assistance, please contact: assistance@huma-num.fr.

For further reading, see: Nicolas Larrousse, Joël Marchand. A Techno-Human Mesh for Humanities in France: Dealing with preservation complexity. DH 2019, Jul 2019, Utrecht, Netherlands. ⟨hal-02153016⟩

What if I wish to display them elsewhere too (i.e. on a project/institutional website?)[edit]

If your research team still wishes to display the project outputs, resources on an institutional or project website, that is of course also possible. For that, consult the "What if I wish to display them elsewhere too (i.e. on a project/institutional website?" section below. (to link!) An emerging good practice is to publish them via repositories and then link them back on the respective website. You can see an example here: https://arkeogis.org/home/ or here under 'Datasets': https://p3.snf.ch/project-179755 . This way, you can avoid confusion and inaccuracies caused by generating multiple copies on multiple platforms.