WikiCite 2016/Report/Group 4

From Meta, a Wikimedia project coordination wiki

Group 4: (Semi-)automated ways to add references to Wikidata statements[edit]

Room 124, 4:00 - 6:00 pm • Etherpad: Room 124

Goal[edit]

Improve tools for semi-automated statement and reference creation (e.g. StrepHit, ContentMine)

Participants[edit]

  1. Adam Shorland (Wikimedia Deutschland, Wikidata), Thursday
  2. Alex Kalderimis (RefMe), Wednesday
  3. Marco Fossati (Fondazione Bruno Kessler (FBK)), both days
  4. Scott Chamberlain (rOpenSci), Wednesday
  5. Thomas Arrow (ContentMine), Thursday
  6. Till Sauerwein (Universität Würzburg (University of Wurzburg)), both days

Summary[edit]

This work group had very specific goals that were met. The report is included in wikidata:Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements.

Introduction[edit]

Quoting from the StrepHit project page:

The trustworthiness of Wikidata assertions plays the most crucial role in delivering a high-quality, reliable Knowledge Base: in order to assess their truth, assertions should be validated against third-party resources, and few efforts have been carried out under this perspective. One form of validation can be achieved via references to external (i.e, non-wiki), authoritative sources. This has motivated the development of the primary sources tool: it will serve as a platform for users to either accept or reject new references and/or assertions coming from third-party datasets. We argue that there is a need for datasets which guarantee at least one reference for each assertion.

Recommendations[edit]

The StrepHit and ContentMine teams have a common vision for Wikidata. If we manage to join forces through the support of the Wikimedia Grants program (both StrepHit and ContentMine have proposals for project grants, cf. #Resources), we will produce protocols for Wikimedia that leverage semi-automated approaches to extract facts from reliable Web sources.

Discussion[edit]

The discussion focused on the primary sources tool usability, a platform for data curation in Wikidata. The outcomes can be browsed at wikidata:Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements. Previous discussion on the tool is available at wikidata:Wikidata_talk:Primary_sources_tool.

Resources[edit]

Appendix: workgroup notes[edit]

Raw notes from group 4.