Digitization Projects/Public Domain Project/Inventory

From Meta, a Wikimedia project coordination wiki
Public Domain Project


I. Introduction[edit]

The Swiss Foundation Public Domain (SFPD) owns and stores approximately 70'000 old music recordings - mostly on shellac media, some very old cylinders, some vinyl records and some CDs. These were donated by Carl Flisch, Martin Osterwalder and a number of smaller sponsors. The foundation intends to make this music available to the general public by digitizing and publishing it. Whereever legally possible the resulting music will be available in the public domain (e.g. on Wikimedia servers).

Such a large archive requires a uniformity of procedures in the processes of inventory and digitization. This paper describes the workflow for these processes.

1.1. Status Quo[edit]

Currently the records in the store of SFPD are largely unordered and unknown stored in heavy big boxes on shelves (Gabsregale, Dürst AG) in the archive room in Rüti ZH. They need to be screened, inventoried, and, finally, to be digitized. About 1000-2000 have at the time of writing already been digitized and stored on the servers of the SFPD and uploaded onto the Wikimedia Commons servers of the Wikimedia foundation whenever copyright considerations permitted it.

1.2. Processes[edit]

The process of screening assigns an SFPD shelfmark[1] to each side of the record which are affixed as barcodes to the record sleeves.

The process of inventory consists of making a (high-resolution) digital photograph of the record label (usually one for each side, including the barcode), capturing its text in the meta data of the digital label picture in the PNG format, and collecting the records in cardboard boxes which a single person can carry and which are collected in the store. The file names of the label pictures and their embedded meta data contain the SFPD shelfmark.

The process of digitization uses appropriate record player for capturing the music in very high quality (192 kHz, 24bits per sample), embedding the label picture and entering the embedded meta data of the sound file in the FLAC format.

1.3. Results[edit]

The result of the screening process is the record in a sleeve with barcodes of the shelfmark of each side of the record on it.

The result of the inventory process is a PNG file with the record label and the barcode for each side of a record with the content of the label in the embedded meta data. These meta data are used for searching the inventory online.

The result of the digitization process is a FLAC file for each track of each side of the record with the corresponding label picture and its meta data embedded. The embedded meta data can be refined any time later. The important features in the meta data are the date of publication and the list of all creators and contributors needed for determining the copyright status in most legislations.


II. SFPD Toolset[edit]

A tool set of Python programs has been developed to support the workflows described here. Each tool is described, below, where it is used.


III. Multi-record Workflows[edit]

3.1. Cardboard Boxes[edit]

The cardboard boxes are procured from archivebox.com: Cardboard boxes

Whenever the current supply is used up, a new batch of boxes must be ordered. It is not quite obvious how to assemble the cardboard boxes. It is recommended, to keep an empty assembled model around.

3.2. Record Sleeves[edit]

Most records in the archive have no record sleeves or old record sleeves unsuitable for long-term storage. In the process of screening they are put into new record sleeves. These are also procured from archivebox.com:

acid-free record sleeves
Art.-Nr.: 99656
Whenever the current supply is used up, a new batch of record sleeves must be ordered. We only buy a single - large - size for all records.

3.3. Barcode Labels[edit]

The labels with the shelfmark barcode are printed on HERMA labels (PREMIUM, 64,6 x 33,8 with 24 labels per sheet, No. 4262) which are certified for long-term storage.

We buy them from Office World Zürich (http://www.officeworld.ch/) but they could also be bought online directly from HERMA.

Whenever 12 records have been selected for inventory, the barcode labels for them are printed. The program prepc128.exe from the SFPD toolset is used for this purpose. It takes a text file as an argument which contains the label text for each label on a separate line and generates an .ods file (by default in the same folder and with the same name as the input but with an .ods extension), which can be opened in LibreOffice to print the barcode labels.
The program prepc128 computes the check character for each barcode and inserts the barcodes and barcode texts into the ODS template. The LibreOffice template makes use of the C128 barcode character set (embedded in the .ods file).
When printing the labels from LibreOffice remember to chose manual paperfeed and insert label sheet (face-down on our color laser jet printer).

3.4. SFPD Shelfmarks[edit]

The SFPD shelfmarks are simple numbers created consecutively as the inventorization proceeds. Each object (record or album) in the SFPD archive, receives an SFPD shelfmark when it is selected for inventory. The basic SFPD shelfmarks are preceded by a two-digit year which is followed by a six-digit consecutive number (e.g. '18-000123'). An SFPD root contains a subfolder for each year and each year subfolder contains an object subfolder for each object inventorized that year.

SFPD
  └ 18
  ⎮   └ 000123
  └ 19

The names of files associated with an object start with the basic SFPD shelfmark of the object. If the object was an album, each individual record of the album is identified by a two-digit record number (e.g. '18-000061-03'). The sides of a records are indicated by appending an 'A' or 'B' to the shelfmark of the record (e.g. '18-000123A', '18-000061-02A'). If a recording of the side of a record is split into tracks, the track number is appended as a two-digit number (e.g. '18-000016A-02'). All files associated with an object object are stored in its object subfolder:

SFPD
  └ 18
     └ 000016
     ⎮    └ 18-000016.png      (original record sleeve)
     ⎮    └ 18-000016A.flac    (digitized music from side A)
     ⎮    └ 18-000016A.gif     (animated GIF of rotating label of side A)
     ⎮    └ 18-000016A.png     (inventory photograph of label of side A)
     ⎮    └ 18-000016A-01.flac (first track of music from side A)
     ⎮    └ 18-000016A-02.flac (second track of music from side A)
     ⎮    └ 18-000016B.flac    (digitized music from side B)
     ⎮    └ 18-000016B.png     (inventory photograph of label of side B)

IV. Screening Workflow[edit]

The process of inventory starts with screening the next batch of records from the store. Each record that should be inventorized is put into a sleeve.

4.1. Original Sleeve[edit]

During screening the record with the original sleeve may be photographed if it particularly interesting. But this artwork is not systematically collected by the SFPD archive. Whenever a record is inventorized, the original sleeve is discarded, as it is usually very dusty and not a suitable sleeve for long-term archival.

4.2. The Good into the Pot, the Bad into the Crop[edit]

If a record is obviously so badly damaged that it cannot be digitzed, it is put into a separate box of damaged records and is not inventorized.

If a record is part of an album, it is provisionally excluded from inventory and put into a separate box for partial albums. In this box, the records are inserted ordered by label and catalog numbers. The records belonging to the same album usually have consecutive catalog numbers. Thus it is possible to detect, where a complete album has accumulated. Whenever that happens, all records of the album are selected for inventory. Partial albums will be inventorized after the inventory of all others is complete.

Each record selected for inventory is put into an new sleeve. Whenever 12 records have been selected for inventory, the barcode labels for them are printed and pasted onto the sleeves.

V. Inventory Workflow[edit]

This scanning station with camera, light etc. was made available to SFPD by Bruno Jehle (BJ Institute) for free.

5.1. Photography[edit]

The two labels of each record are photographed together with the corresponding barcode on the sleeve. The image window must be chosen, such that the whole label including the matrix number as well as the barcode is visible on the picture.

This means, that the first side must be detected from the label or the matrix number and be photographed with the barcode for the A side. Then the record must be turned around and the other barcode must be placed on the right side of the label. The most frequent error in this step is that one forgets to turn the sleeve with its barcode label together with the record.

Whenever the records are handled, gloves are used to protect the originals.

For photography suitable light is used. No true color reference is photographed as the purpose of the inventory is not preserving the artwork but only the information on the label.

After the record has been photographed it is put into the sleeve and into the cardboard box currently being filled.

At the time of writing we use a Nikon camera with aperture 11 and shutter speed 1/30 s. The pictures are recorded using the highest quality (4352 x 2868 pixels, RGB) in the Nikon "raw" format as .NEF files.[2] Each resulting NEF file is about 10 MB in size.

The storage media of the camera can store 16 GB (e.g. more than 1000 images). But it is reasonable, to transfer the NEF files to a PC after few batches. The quality control will discover a few errors. Those can be remedied easily and timely, if the corresponding cardboard box is still accessible.

Whenever a cardboard box is full, the last barcode label of the last record/album in it is pasted on its front below the first barcode label that was pasted there initially.

Then a new cardboard box is assembled and the first barcode label of the first record/album in it is pasted on its front.


(picture)

The completed cardboard box is transferred to the SFPD storeroom, after quality control (v. below) has been passed for all photos of its contents.

5.2. First Visual Inspection[edit]

On the PC all new NEF files are put into an import folder. It makes sense to visually examine each file before proceeding. For this one may have to use IrfanView because comment image processing programs do not support reading NEF raw files.

The visual inspection checks, whether:

  • all barcodes of the batch appear,
  • the photographer has forgotten to associate a new barcode with a new side,
  • the barcode is completely and cleary visible.

The errors are recorded together with barcode numbers and communicated to photography with the request to repeat the photography.

5.3. Image Import[edit]

Next the program importlabels.exe of the SFPD toolset is used to import the photographs into the SFPD repository.

It is called with the following arguments:

importlabels.exe -r <SFPD root> -i <import> -b <backup> -f <failed> 

where:

<SFPD root>
is the root folder of the SFPD repository where the PNG files are to be stored.
<import>
is the import folder containing the NEF files to be imported. If it is not given explicitly, it is assumed to be a folder named import under the root folder.
<backup>
is the backup folder to which the NEF files are moved after successful importing. If it is not given explicitly, it is assumed to be a folder named backup under the root folder.
<failed>
is the folder to which NEF files are moved after importing failed. If it is not given explicitly, it is assumed to be a folder named failed under the root folder.

This program:

  • loops over files in the import folder,
  • for each NEF file it
    • analyzes the file and extracts RGB data as well as meta data,
    • recognizes the barcode in the picture,
    • creates rudimentary Cultlib meta data for the picture,
    • stores the picture in PNG format in the appropriate folder and subfolder of the repository,
    • moves the original NEF file from the import folder to a backup folder.

This process may reveal some more quality problems (unrecognizable barcode). After the import the failed folder contains the problematic files and the import folder has been emptied.

5.4. Image Meta Data[edit]

The import process has initialized the meta data embedded in each PNG file. More meta data are added with the help of pngmeta.exe of the SFPD toolset.

This program takes a PNG file as argument, displays the record label and opens an editor for the meta data. If it is associated with the .png extension, it is sufficient to double-click the PNG file to start it.

Cultlib Meta Data[edit]

For SFPD the Cultlib meta data described elsewhere[3] are used. These are based on the priciples that:

  • all meta data of a file are embedded in the file,
  • few clearly defined meta data fields are better than many,
  • it makes sense to have the same structure for meta data for any type of object,
  • each object has an abstract "work" id and a concrete digest of the primary data,
  • only unchanging facts related to the object are to be stored in them.

In this inventory step, the meta data of the photograph (not the music on the record!) need to be recorded. Therefore the publication date is the time of inventory, no creators or contributors are listed, and the license for all photographs is CC-0.[4]

5.5. Meta Data fields for Record Labels[edit]

The only part of the meta data that need to be entered, is the content, consisting of language, title, subject and description.


Generally the text content of the record label is to be entered in the meta data.

Language[edit]

The language of the content description must be chosen. This should match the language of the record label.

Title[edit]

The exact title on the record label is to be entered here.

Subject[edit]

The subject field is important for the identification and management of the inventorized objects. It is prefilled with the SFPD shelfmark recognized from the barcode. The label and catalog number as well as the matrix number must be added. Between SFPD shelfmark and label name and between catalog number and matrix number a slash ("/") must be entered.

Example: SFPD:18-000067-03B / DECCAN ERA K2121 / AR12159

The matrix number identifies the side of the record uniquely. Therefore a side indication (e.g. "A" or "-2") is dropped from the catalog number printed on the label. The matrix number is often printed on the label. It also often is to be found scratched on the inner ring of the shellac.

Some Odeon records have two catalog numbers, an original one e.g. "O-6030" and one assigned to a reissue of the old recording e.g. "AA 79459". In that case the original catalog number is to be used: SFPD:18-000002A / ODEON O-6030 / xxBo 8059.

If - rarely but possible with relatively new records - no matrix number exists, only label and catalog number (this time with side indication "A" or '"2" as printed on the label) are added.

Description[edit]

All the remaining information on the picture of the record label is to be copied here.

Nota Bene: Clipboard![edit]

When pngmeta.exe is closed, it saves the meta data entered as embedded meta data in the PNG file. It also saves the Cultlib meta data in the clipboard. When pngmeta.exe opens a PNG file and finds Cultlib meta data from a previous editing session in the clipboard, it judiciously replaces empty fields in the Cultlib meta data by the corresponding fields from the clipboard. These must usually be overwritten!

However, this treatment of the clipboard saves a lot of typing, as a large portion of the meta data of the B side is the same as the meta data on the A side.

Using the switch -n for "new" or "no clipboard" suppresses the automatic clipboard copying.

5.6. Publishing the Inventory[edit]

The new photos of the record labels are published to the web site of the public domain project. This web site is in the process of revision at the time of writing. The meta data of all photos are inserted into a MySQL database. Users can enter full text search strings in order to find all objects in the inventory with matching text in the record label.

The availability of the photos of the inventory enables the SFPD to prioritize the order of digitization of the music. Users can also "vote" for having a particular record digitized soon.

5.7. Time Estimate[edit]

If the inventory of a record takes two minutes, one can handle 30 in an hour and 150 in a five-hour day. Inventorizing 70'000 records will then take 467 days, or about two years, taking into account holidays etc.


References[edit]

  1. We use the word shelfmark instead of label, because the word label is already overused for record labels and barcode labels, although the SFPD shelfmark does not strictly refer to a shelf. It is, however, related to the physical storage of the record
  2. As opposed to common prejudice the Nikon "raw" files are not very raw but already compressed similar to the JPEG format. The only reason, why we use the .NEF format is the fact, that the JPEG format produced by the camera does not record the time, when the picture was taken. The conversion from .NEF to JPEG is part of the SFPD toolset and takes quite long. Whenever that appears to be inacceptable, the JPEG format could be saved by the camera.
  3. http://www.cultlib.ch/de/Langzeiterhaltung_IV.pdf
  4. https://creativecommons.org/publicdomain/zero/1.0/