Jump to content

Learning patterns/Uploading bibliographical or artwork metadata using OpenRefine

From Meta, a Wikimedia project coordination wiki
A learning pattern forGLAM
Uploading bibliographical or artwork metadata using OpenRefine
problemHow to successfully prepare and process batch integrate/export the metadata to Wikidata using the tool OpenRefine.
created on19:13, 8 March 2020 (UTC)

What problem does this solve?[edit]

Upload large set of bibliographical dataset such as literary works,author biodata or artwork metadata to integrate into Wikidata, as part of a content donation from an institution or some other project with advanced features such as reconciliation to avoid duplicate items. This learning pattern can be especially useful for the Wikidata projects such as Sum of all Art and Sum of all Authors. px=500

What is the solution?[edit]

Install OpenRefine[edit | edit source][edit]

  1. Before installing, make sure that the latest version of Java or JDK installed is installed on your computer. Download Java here: https://www.java.com/download/
  2. Download Pattypan from: https://openrefine.org/download.htmland save the .dmg file.
  3. You need to have supporting Operating System on your device to run OpenRefine from Windows, Linux, macOS.

Create a spreadsheet to be filled in

Prepare your dataset to be uploaded in format of one of these formats: CSV file, Web Addresses (URLs), or Google Spreadsheet.[edit]

  • (Note: It is essential to correct any spelling errors, in the names of the items to avoid creation of duplicate items).

Getting Started[edit]

Pattypan - selection of description fields

  1. Open OpenRefine file.
  2. Click choose file and navigate to a file on your computer or online link of website or google drive containing the dataset you would like to upload or simply paste the dataset into the clipboard.
  3. Click Begin
  4. On the top right, give the name of the project and Click Start
  5. Make sure you are logged in with your username.


OpenRefine Reconcile Matching Wikidata Identifiers

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata. Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes.

  1. Click Arrow next to the column name.
  2. Click ReconcileStart Reconciling .
  3. Click Start Reconciling .
  4. Select Wikidata (en). Alternatively, you can install a version of the Wikidata reconciliation service for your language. Open the reconciliation dialog and click Add Standard Service. The URL is https://tools.wmflabs.org/openrefine-wikidata/pa/api where "pa" is replaced by your language code. When reconciling using this interface, items and properties will be displayed in your language if a translation is available.
  5. Select the type to reconcile each cell to an entity of one of the mentioned types:(example:human, written work or location) and/or choose the appropriate option from Reconcile against type: ; Reconcile against no particular type; Auto-match candidates with high confidence.
  6. Depending on the amount of data, the Wikidata reconciliation service processes about 3 rows per second for the process after which you will see that the reconciliation data in the cells.
  7. In the reconciled column, you will see either the cell was successfully matched: it displays a single dark blue link or a few candidates are displayed, together with their reconciliation score, with light blue links. You need to pick manually the correct one. For each matching decision you make, you have two options: either Click match this cell only(), or also use the same identifier for all other cells containing the same unreconciled value (). You can Search for match for the correct Wikidata identifier and if no value matches your case, click on Create New Item.
  8. Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table.
Dataset Augmentation with Schema[edit]
Dataset Schema
New issues show the problems in the Schema or Records automatically
Preview of the Wikidata Items

  1. Click Wikidata on top right of the screen.
  2. Click Export Schema.
  3. Schema page would be opened, add the required items relevant to your dataset.
  4. New issues would be created in the Issue tab. Check the issues for the problems and make changes accordingly in the record and schema.
  5. Click Save Schema .
  6. You can view the items in the preview tab before uploading to see how the dataset will appear as the Wikidata edits and inspect them manually.

Uploading to Wikidata[edit]

Open Refine export to Wikidata
  1. Click Wikidata on top right of the screen.
  2. Click Upload edits to Wikidata .
  3. A dialogue box appears with your dataset, write briefs words of the edit in the edit summary.
  4. Click Upload

Things to consider[edit]

  1. Installation:Make sure that the latest version of Java is installed on your computer. Download Java here: https://www.java.com/download/.
  2. Data quality: If you create new properties, make sure to check the pre-existing items on Wikidata, the identifiers may have different spelling or case due to which the existing items does not show up during reconciliation process. You can Search for match for the correct Wikidata identifier or proactively find those identifiers and map them to Wikidata items to ensure the prevent the creation of duplicate items.
  3. After completing working on the schema, analyze and fix any issues raised automatically before exporting to Wikidata to avoid errors in the uploading process also, to avoid the possibility of important information relevant to the items missing in the exported dataset.
  4. This learning pattern can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc. For other type of datasets, you may wish to consult:https://github.com/OpenRefine/OpenRefine/wiki/User-Guide.

When to use[edit]

This tutorial can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc.

  • When uploading a batch of dataset to Wikidata (e.g. Artist, Photographer, Institution, License) and/or
  • When flexibility and control over descriptions is needed.
  • For GLAM projects: in order to create or mass import/export the meta data to Wikidata.
  • This software can be downloaded for Windows, Linux and Mac.


See also[edit]

Related patterns[edit]