Digitization Projects/Public Domain Project/Digitization

From Meta, a Wikimedia project coordination wiki
Public Domain Project

I. Digitization Workflow[edit]

The digitization step consists of

  • cleaning the record,
  • digital recording of the music,
  • saving the resulting file with rudimentary meta data,
  • importing it into the SFPD repository,
  • possibly splitting the music into tracks, and
  • enhancing the music meta data.

1.1. Cleaning the Record[edit]

Every record must be wiped with a soft cloth (e.g. micro fibre) to remove loose dust. For this it is useful to rotate it on the turntable, in order to wipe it in the direction of the groove.

Of course, all handling of the records requires use of gloves.

After wiping the record should be digitized. If that results in a decent copy not to noisy and with not too many clicks, no further cleaning is needed. Otherwise this first copy is to be used as a safety copy which permits comparison before and after washing.

Washing of records uses special liquids (no alcohol, which dissolves shellac!) and special mechanical contraptions using a wet thread to clean the groove.

After washing and drying of the record it is digitized again and the result is compared to the copy before washing. The better copy of the two is then used for the repository.

1.2. Digital Recording of the Music[edit]

The SFPD owns high-quality mechanical record players as well as two laser turntables (ELP) . The first digitzation attempt should be made with a pickup using a needle, because the records were produced and optimized for this kind of playing. If the result proves unsatisfactory, one can use the ELP, which has the advantage of not touching the record phyically. Often, however, the result from the ELP will be more noisy, because these turntables were really created for playing vinyl records.

The turntable is connected to an analog amplifier and digitized using a high quality digitization interface to the computer. On the PC we use the open-source program Audacity for digitizing the sound as two channels with 192 kHz and 24bits per sample[1]

1.3 Exporting the Digitized Music as a FLAC File[edit]

The digitized music is exported as a FLAC file. From within Audacity it is possible to store rudimentary meta data (Title, Label, Matrix) as keywords with the music. This becomes important, when the music is to be matched with the label photography in the inventory.

The ultimate identification of a shellac record is the label, catalog number and matrix number. Currently these data are also used to construct the file name for the digitized music. This filename has the structure

<label>-<catalog>-<matrix>

The filename uses only lowercase ASCII letters and digits as well as the underscore ("_") character. All special characters in the catalog and matrix number are dropped. Some labels (e.g. His Master's Voice) are abbreviated ("hmv"). The precise rules for this kind of naming scheme are embodied in the software converting an inventory entry to this kind of file name.

At the time of writing the inventory has just started. The older digitizations have used the file name scheme described above. It is imaginable, that the filename for records that have already been inventorized in the future will be chosen using the SFPD shelfmark which can be read from the label on the record sleeve.

1.4. Importing Digitized Music into the Repository[edit]

After the music has been digitzed, it needs to be imported into the repository. The program importflacs.exe from the SFPD toolset is used for this purpose. It is to be called with the following arguments:

 
importflacs.exe [-a] -r <SFPD root> -f <FLACS root> 

where:

-a
means to (re-)import all FLAC files into the repository - overwriting files that were imported before
<SFPD root>
is the root folder of the repository, which contains all PNG files from the inventory.
<FLACS root>
is the root folder which contains the digitized music in record label subfolders with record label, catalog number and matrix number as the file name (e.g. decca-k2121-ar12159.flac).

The program importflacs.exe loops over all PNG files in the repository and:

  • determines the label, catalog number and matrix number from the Subject meta data field and constructs the file name of the digitized music,
  • opens the music file and initializes the music meta data from the PNG file,
  • adds the PNG file and a 512 x 512 thumbnail version of it to the FLAC data,
  • saves the FLAC file under the SFPD root in the repository using the SFPD shelfmark as its file name.

1.5. Music Metadata[edit]

The meta data of the imported music files can now be enhanced. In particular the publication date and the list of creators, contributors and provenances is of interest. Any noteworthy fact can be added to the description field. Lengthy treatises should be published separately and may be referenced in the description field.

The tool flacmeta.exe of the SFPD toolset is used for this purpose. It takes the name of a FLAC file as argument. Like pngmeta.exe this executable would copy clipboard data, if no Cultlib meta data are found in the FLAC file. However, the import step has already created a rudimentary instance of Cultlib meta data for the FLAC file. So those are never empty, when this workflow is followed. (The tools pngmeta.exe and flacmeta.exe are designed to be independent of their role in the SFPD workflow and support many more command-line arguments than is documented here. One can use the option "-h" for information about them.)

One can consult and manually copy any meta data entered at digitization time:

Works and Performances[edit]

In the Cultlib frame of reference each performance is a "work" with its own ID. If the performance is based on another "work" (e.g. composition, libretto), its ID can be listed under "sources".

Recordings of the same performance have the same ID. Thus an MP3 derivative of a high-quality digitization of a performance, has the same ID for the "work", lists the Digest of the high-quality digitization under "Origin" and has the digest over its primary data as identifying digest.

Thus all performers are "creators" of the work. Any other contributors to a performance are listed as "contributors" (e.g. camera man, digitizer, restorer, ...).

At the time of writing no abstract works (composition by Beethoven) are part of the SFPD database. Whenever there are two or more performances of the "same" piece of music, such an entry for the work should be added and referred to as a "source" of the performance.

Publication Date[edit]

It is to be noted, that Cultlib meta data record a date ante quem. That is to say, the earliest date known to be after publication. If one knows absolutely nothing about the publication date, the current date may be entered. If the music was published later in a collection with title "Songs from 1938-1948" the publication date can be set to 1948-12-31. If the year of publication is known, the last day of that year is used as publication date. If the month of publication is known, the last day of that month is to be entered.

The publication date is important for establishing the legal copyright status of a recording in most jurisdictions.

The last shellac records were produced in 1958 in most Western countries. The last shellac records in the GDR was produced in 1961.[2] Thus for shellac records one can always enter 1959-12-31 or 1961-12-31 as a terminus ante quem.

1.6. License[edit]

The license field is intended for licenses explicitly bestowed on the work by its creators or rights holders. No "public domain" license is to be applied here, because the criteria for placing a work in the public domain change from legislation to legislation and over time. It is left to the users of the archive to determine, whether a recording is in the public domain.

Creator[edit]

Reliable content of the list of creators is very important establishing the legal copyright status of a recording in most jurisdictions.

As opposed to the content (title, subject, description) the overall Cultlib meta data are in English. Therefore the roles should be entered in English (e.g. "Yodel", rather than "Jodel"). Use VORBIS comment tags "composer", "author", "arranger", "conductor", "lyricist", whereever applicable. Similarly the English version of names (e.g. taken from Wikipedia) is to be used ("George Frideric Handel" instead of "Friedrich Händel", "Pyotr Ilyich Tchaikovsky" instead of "Pjotr Iljitsch Tschaikowski", ...).

Provenance[edit]

The provenance field is used for listing persons and institutions who made it possible to create the music file in question.

The role should be entered in English preferring abstract over personalized denotations.

1.7. Splitting Records into Tracks[edit]

Generally the whole side of a record is digitized as a single music file. The sides of shellac records are rather short (3-5 minutes). Therefore longer pieces or movements are distributed over more than one record side. The SFPD archive never fuses the content of separate record sides and only rarely splits the music on a side into separate tracks. Only if the information relevant for the determination of copyright terms (mainly creators, possibly publication date) is different from one track to the other, the digitized music is split into individual tracks.

Tracks initially are enhanced with the same pictures and the same meta data as the music file of the whole record side. After that their meta data must be enhanced to reflect the special situation of each track.

1.8. Continuous Enhancement of Metadata[edit]

The meta data of a music file can be enhanced at any time later, when more knowledge about the music surfaces.

1.9. Time estimate[edit]

The digitizing of a record takes at least as long as playing it. As most shellac record sides last about 3-5 minutes, one spends about 10 minutes per side. Thus one can handle 6 in an hour and 30 in a five-hour day. Digitizing 140'000 sides with therefore take 4'667 days, or about twenty years, taking into account holidays etc.

It is therefore necessary, that not too much time is spent per record. Washing and multiple digitizing of records is to be avoided, whereever possible. A noisy record is better than none, as long as digitzation is not financed by sponsors. Lengthy research is to be avoided. Incomplete meta data can always be enhanced later.


References[edit]

  1. N.B.: The prejudice preferring analog over digital music is the result of low-quality audio standards for CDs. High-quality digital music is indistinguishable from analog music, if it is rendered on a high-quality sound system. Our digitization quality is infinitely superior over standard CD quality.
  2. https://de.wikipedia.org/wiki/Schallplatte