Jump to content

WikiCite 2016/Report/Group 2

From Meta, a Wikimedia project coordination wiki

Group 2: Reference extraction and metadata lookup tools[edit]

Room 121, 4:00 - 6:00 pm • Etherpad: Room 212


Design or improve tools to extract identifiers and bibliographic data from Wikipedia citation templates, look up and retrieve metadata


  1. Aaron Halfaker (Wikimedia Research)
  2. Antonin Delpeuch (Dissemin)
  3. Cristian Consonni (Wikimedia Italia, Università degli Studi di Trento (University of Trento))
  4. Finn Årup Nielsen (Danmarks Tekniske Universitet (Technical University of Denmark))
  5. Jake Orlowitz (Ocaasi) (The Wikipedia Library)
  6. Jon Tennant (Imperial College London, ScienceOpen)
  7. Marin Dacos (CNRS - OpenEdition Lab)
  8. Patrice Bellot (Aix-Marseille Université - CNRS - LSIS / OpenEdition Lab)
  9. Philipp Zumstein (Universitätsbibliothek Mannheim (Mannheim University Library))
  10. Scott Chamberlain (rOpenSci)
  11. Sebastian Karcher (Qualitative Data Repository / Zotero, Citation Style Language (CSL))



There are different ways to bring bibliographic metadata and reference data into Wikidata which should be explored more. Tools are needed for easy ways to extract and save this data. Workflows for manual edits as well as workflows for batch editing are considered.


  • Existings tools to extract bibliographic metadata and references should be improved further and more tailored to Wikidata.
  • Further technical discussions especially between people involved in Zotero translators and people involved in Citoid/WikiCite seems fruitful for both sides.
  • ...


Several discussions and work happened in parallel:

  • Lookup tools for common identifiers like ISBN, doi are essential to have. We want to receive good quality data and the answer from such lookup services should be fast, to allow also processing data dumps.
  • In the Citoid-Wikidata-Integration the possibility of Wikipedia to import any reference by just typing the url in a form with the help of Citoid and the Zotero translator was studied and extended for Wikidata. Moreover, during the discussion it became clear that in order to have rich bibliographic data in Wikidata we need also to improve some Zotero translator and for example add more identifiers (e.g. OCLC ID) in the first place, such that they can be saved in Wikidata.
  • The OABOT, which can add links to citations to available free-to-read versions of the papers, was improved in several ways.
  • Library and command-line tool for extracting wikilinks from XML Wikipedia dump history files are considered.

Appendix: workgroup notes[edit]

Raw notes from group 2.