Digitization Projects/Public Domain Project/Master Thesis of Christoph Zimmermann

From Meta, a Wikimedia project coordination wiki
Public Domain Project


Full report including appendix[edit]

Abstract from this master thesis report: The Swiss Foundation Public Domain is responsible for the long time data archive of the volunteer driven Public Domain Project. The volunteers are collecting, digitizing and capturing metadata of old audio records, mainly 78 rpms (Shellac), that are out of copyright.

In the course of this master thesis a data model was developed to represent the metadata as Open Linked Data. Also a trustworthy archival storage according to OAIS was evaluated and first migration steps were undertaken. Following the semantic web (Web 3.0) standards the metadata (title, creator, publication date, images etc.) is modeled as triples (subject, predicate, object) using the ontologies Dublin Core, Schema.org, Music Ontology, Creative Commons and Logistics Core. The new data model is accessible via a web API that delivers RDF/XML or turtle. This fosters the reuse of this metadata on other websites and projects, which thereby increases the overall value of the metadata and the work of the Public Domain Project itself. This model is implemented as a set of new templates and forms using Semantic MediaWiki (SMW). SMW allows the value of a data field to be shown on other wiki pages with a semantic query. A data field may have data validation or can have only a limited set of values. These features simplify data entry and reduce errors significantly.

A trustworthy storage system for the digitized audio files must fulfill digital preservation requirements defined by the OAIS model. A new system structure was evaluated and a migration strategy was defined. As a first step the operating system of the file server was replaced by Gentoo GNU/Linux because it stores the source code of every installed software. The source code together with file format specifications etc. is called representation information and which needs to be preserved together with the audio files to guarantee the understandability of the bits on the storage media.

A document management system (DMS) for the internal document handling of the foundation was evaluated and the selected NextCloud was implemented on a new virtual machine (VM) secured with TLS and certificates from Let's encrypt.

Master Thesis (German)[edit]

Presentation slides (English)[edit]