WikiCite 2016

From Meta, a Wikimedia project coordination wiki
(Redirected from WikiCite)
Jump to: navigation, search
For related legacy proposals, see Wikicite and WikiScholar.
WikiCite wordmark.svg
WikiCite 2016 presentation

WikiCite 2016 was an event focused on designing data models and technology to improve the coverage, quality, standards-compliance and machine-readability of citations and source metadata in Wikipedia, Wikidata and other Wikimedia projects. Our goal in particular was to define a technical roadmap for building a repository of sources cited across Wikimedia projects in Wikidata.

Skip to: proposals, draft report, news, mailing lists.

About[edit]

Building a central repository of citations in Wikidata[edit]

There is currently a lot of momentum around citations and bibliographic metadata at the Wikimedia Foundation, in the movement, and across a number of open science, library tech, and open access organizations. The idea of building a repository to store all citations and source metadata across Wikimedia projects has been proposed in different forms for the past 10 years. Wikipedia is currently one of the most popular entry points into the scientific literature and ranges among the top 5 referrers of scholarly citations. However, as of today, references and source metadata are still a second-class citizen in Wikimedia projects. References are the most fundamental building block of open knowledge, but:

  • they are still served by fragile mechanisms such as citation templates;
  • they are inconsistently represented across (and sometimes within) articles, languages and Wikimedia projects;
  • the data they store is curated in multiple places and often ill-formed, incomplete and not machine-readable;
  • references often fail to use identifiers such as DOIs, PubMed IDs, or ISBNs, even when such identifiers are available for the cited source.

Reusing sources across articles, languages or projects is still a complex task, and conducting research on the use of references and citations in Wikimedia projects requires sophisticated information extraction skills. In addition to that, an overwhelming number of statements in Wikidata are currently not sourced at all or generically sourced to a Wikimedia site rather than specific references.

Initiatives such as the Wikipedia Library have focused primarily on the outreach and programmatic angle, while efforts focused on infrastructure in Wikimedia projects (like WikiProject Source Metadata and WikiProject Open Access) have not been strategically aligned across all parties involved. We believe the time is ripe for aligning and innovating around the major efforts to build the necessary infrastructure, data models, automated tools and user interfaces to build a central repository of citations in Wikidata and support high quality sourcing of free knowledge.

A timeline of previous efforts[edit]

  • Seminal work towards the design of a central repository of citations leveraging Wikidata started at the 2012 Wikimedia Hackathon in Berlin.
  • These efforts continued through a series of "citathons" hosted over the years at:
    • 2014: Wikimania London
    • 2014: Rich citations hackathon at the Public Library of Science
    • 2015: Wikimania Mexico
    • 2016: Wikimedia Developer Summit
  • In 2015, a WikiBase test instance called LibraryBase with a dedicated SPARQL endpoint was set up on Wikimedia Labs.
  • For a history of related initiatives predating Wikidata, see this page.

The event[edit]

Goals[edit]

Building the sum of all human citations (slides)

The event will be focused on the short-term goal of defining a roadmap and the technical requirements for developing a central Wikidata repository for references and bibliographic metadata. This will facilitate research on references across Wikimedia projects and provide much needed tools to support the sourcing work of volunteers and movement affiliates contributing open content to Commons and Wikidata.

We will also discuss a long-term vision aiming to facilitate the integration of citation and bibliographic metadata with other scholarly and linked data repositories. You can learn more about these goals from a joint presentation we gave in December at WMF in partnership with Crossref.

WikiCite is hosted by the Wikimedia Foundation and is subject to our friendly space policy which we expect all participants to acknowledge.

Proposals[edit]

We're inviting WikiCite attendees to submit a proposal ahead of the event. Let us know what you're planning to work on by noon Pacific Time on Friday May 20 and we'll make sure to cover your pitch in the intro session. You can browse a list of proposals for the event or submit a new one.

Participants[edit]

We are bringing together Wikidatans, Wikipedians, developers, data modelers, open access publishers, and information and library science experts from various organizations, as well as academic researchers from groups with experience working with Wikipedia's citations and bibliographic (linked open) data in general.

Participant list[edit]

Please add your name here after filling out the EventBrite registration form (you'll receive a link from the organizers)

  1. Thomas Arrow (ContentMine)
  2. Adam Becker (Open Journal, Freelance Astrophysicist)
  3. Patrice Bellot (Aix-Marseille Université - CNRS - LSIS / OpenEdition Lab)
  4. Terry Catapano (Plazi Verein / Columbia University Libraries)
  5. Scott Chamberlain (rOpenSci)
  6. Cristian Consonni (Wikimedia Italia, Università degli Studi di Trento (University of Trento))
  7. Karen Coyle (KarenCoyle.net)
  8. Marin Dacos (CNRS - OpenEdition Lab)
  9. Antonin Delpeuch (Dissemin)
  10. Eamon Duede (Knowledge Lab @ University of Chicago)
  11. Jonathan Dugan (organizer)
  12. Katie Filbert (Wikimedia Deutschland, Wikidata)
  13. Konrad Förstner (Universität Würzburg (University of Wurzburg))
  14. Marco Fossati (Fondazione Bruno Kessler (FBK))
  15. Susanna Giaccai (Wikimedia Italia)
  16. Aaron Halfaker (Wikimedia Research)
  17. James Hare (WikiProject X, Wikimedia DC)
  18. Lambert Heller (Technische Informationsbibliothek (TIB) (German National Library of Science and Technology))
  19. Erika Herzog (Wikimedia New York City)
  20. Markus Kaindl (Springer Nature)
  21. Alex Kalderimis (RefMe)
  22. Sebastian Karcher (Qualitative Data Repository / Zotero, Citation Style Language (CSL))
  23. John Kaye (Jisc)
  24. Chris Keene (Jisc)
  25. Daniel Kinzler (Wikimedia Deutschland, Wikidata)
  26. Jonas Kress (Wikimedia Deutschland)
  27. Nettie Lagace (National Information Standards Organization (NISO))
  28. Rachael Lammey (Crossref)
  29. Mairelys Lemus-Rojas (University of Miami Libraries)
  30. Luca Martinelli (Wikimedia Italia)
  31. Daniel Mietchen (National Institutes of Health (NIH), organizer)
  32. Jens Nauber (Die Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden (SLUB) (Saxon State and University Library Dresden (SLUB)))
  33. Finn Årup Nielsen (Danmarks Tekniske Universitet (Technical University of Denmark))
  34. Jake Orlowitz (Ocaasi) (The Wikipedia Library)
  35. Lydia Pintscher (Wikimedia Deutschland, Wikidata, organizer)
  36. Merrilee Proffitt (OCLC Research)
  37. Laura Rueda (DataCite)
  38. Diego Sáez-Trumper (Eurecat)
  39. Sébastien Santoro
  40. Till Sauerwein (Universität Würzburg (University of Wurzburg))
  41. Tobias Schönberg (talk) (Wikidata)
  42. Elizabeth Seiver (Public Library of Science (PLOS))
  43. Adam Shorland (Wikimedia Deutschland, Wikidata)
  44. Mike Showalter (OCLC)
  45. Chiara Storti, (Wikimedia Italia, Rete bibliotecaria di Romagna e San Marino)
  46. Dario Taraborelli (Wikimedia Research, organizer)
  47. Jon Tennant (Imperial College London, ScienceOpen)
  48. Katherine Thornton (University of Washington)
  49. Marielle Volz (Wikimedia Foundation) (attending remotely)
  50. Andra Waagmeester (Micelio)
  51. Joe Wass (Crossref)
  52. Chris Wilkinson (eLife Sciences)
  53. Andrea Zanni (Wikisource) / Aubrey
  54. Jan Zerebecki (Wikimedia Deutschland)
  55. Philipp Zumstein (Universitätsbibliothek Mannheim (Mannheim University Library))

How to apply[edit]

Due to the capacity of the venue, attendance is limited to registered participants. Applications are now closed

  • if you were pre-invited and have already filled in a form, you will receive a separate note with a registration link from the organizers
  • if you have not been invited but you would like to participate, please submit an application to give us some information about you and your interest and expected contribution to the event. We'll send out notifications of acceptance by April 15.

Important dates[edit]

March 29, 2016
applications open
April 11, 2016
applications close
April 15, 2016
notifications of acceptance are issued (if you applied for a travel grant, we'll be able to confirm by this date if we can cover the costs of your trip)
May 20, 2016
submit your proposal for what to do precisely
May 25-26, 2016
event takes place

Program[edit]

The focus of WikiCite will be on dialogue, not monologues, and on hands-on activities aimed at improving or establishing workflows or learning by doing. The preliminary agenda of the event is below. Breakfast and lunch will be served at the venue, while dinner will be hosted at a separate restaurant still to be announced. The venue will remain open late on Wednesday night to allow people to continue hacking.

Day 1[edit]

Wednesday May 25, 2016

9.00-9.15
Welcome / breakfast
9.15-10.00
Current state of the project (Dario)
A short introduction to Wikidata (Lydia)
10.00-12.00
Work session 1: intros, idea generation, first pitches for breakout groups
12-13.00
Lunch break
13.00-15.30
Work session 2: scoping work, formulate goals, define what each group expects accomplish by the end of the event
15.30-16.00
Break
16.00-18.00
Work session 3
18.00-18.30
Informal check-in, group report-backs
18.30-19.30
Break before dinner
(attendees are welcome to return to the venue after dinner, the conference rooms will stay open for late work 8PM-on)
19.30 - ...
Dinner
Where: Hofbräuhaus Berlin
Address: Karl-Liebknecht-Strasse 30 10178 Berlin (map)

Day 2[edit]

Thursday May 26, 2016

9.00-9.30
Breakfast, lightning presentations, short progress reports
9.30-12.00
Work Session 4
12.00-13.00
Lunch
13.00-15.00
Work Session 5
15.00-15.30
Check-in, group report-backs
15.30-16.00
Break
16.00-18.00
Work Session 6
18.00-18.30
Re-group, progress reports, wrap up
18.30-20.30
Dinner
Where: Café Krone
Address: Oderberger Strasse 38, 10435 Berlin (map)

Venue[edit]

GLS Campus Berlin, a few days before the event.
GLS Campus Berlin
address: Kastanienallee 82, 10435 Berlin Prenzlauer Berg (map)
phone: +49 (030) 780 089 550
email: info@gls-campus-berlin.de

Organizing committee[edit]

  • Dario Taraborelli
  • Jonathan Dugan
  • Lydia Pintscher
  • Daniel Mietchen
  • Cameron Neylon

You can contact the organizers via wikicite@wikimedia.org

Social media[edit]

Mailing lists[edit]

Notes from the sessions[edit]

Funding[edit]

WikiCite is cohosted by the Wikimedia Foundation and Wikimedia Deutschland. It is generously supported by Crossref, the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation. Funding to cover the cost of the event has been approved by the Wikimedia Foundation Board of Trustees.

Outcomes[edit]

Report[edit]

We're drafting a complete report from the event. Meanwhile, you can browse the notes from each workgroup below.

Main workgroups[edit]

Group 1: Modeling bibliographic source metadata

Discuss and draft data models to represent different types of sources as Wikidata items

add here

Group 2: Reference extraction and metadata lookup tools

Design or improve tools to extract identifiers and bibliographic data from Wikipedia citation templates, look up and retrieve metadata

Group 3: Representing citations and citation events

Discuss how to express the citation of a source in a Wikimedia artifact (such as a Wikipedia article, a Wikidata statements etc.) and review alternative ways to represent them

Group 4: (Semi-)automated ways to add references to Wikidata statements

Improve tools for semi-automated statement and reference creation (StrepHit, ContentMine)

Group 5: Use cases for source-related queries

Identify use cases for SPARQL queries involving source metadata. Obtain a small open licensed bibliographic and citation graph dataset to build a proof-of-concept of the querying and visualization potential of source metadata in Wikidata. Includes work on Zika virus

Additional workgroups[edit]

Group 6: Wikidata as the central hub on license information on databases

Add license information to Wikidata to make Wikidata the central hub on license information on databases

Group 7: Using citations and bibliographic source metadata

Merge groups working on citation structure and source metadata models and integrate their recommendations

Group 8: Citoid-Wikidata integration

Extend Citoid to write source metadata into Wikidata

Gallery[edit]

Related projects[edit]

See also[edit]

Wikimedia Commons
Wikimedia Commons has more media related to:

Press[edit]

External links[edit]