Notes and links[edit]

see also StrepHit 1.0 Beta Release

Goal[edit]

it would be great if you could add a statement of interest about ContentMine's potential data donation via the primary sources tool here (feel free to add a new section of course): https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline

Instructions to upload a dataset to the primary sources tool:

format your data in the QuickStatements syntax, documentation at http://tools.wmflabs.org/wikidata-todo/quick_statements.php
ping me for an API access token
upload the dataset through the following API endpoint

Alternatively to points 2 and 3, you can just give the dataset to Hjfocs and he will upload it directly.

Data modeling, i.e., from ContentMine extraction results to the QuickStatements dataset.

Each statement is composed of:

A. subject = given the extracted named entity, look up the subject Wikidata Item ID via

A.1. SPARQL

B. property = d:property:P248 'stated in'

C. value = item ID of the source, e.g., d:Q229883 for PubMed Central

D. reference URL = d:P854

Side notes

Instead of 'stated in', a better property would be 'mentioned in', but it has been rejected: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/45#mentioned_in

Adam: references collected from Microdata
- especially for movies
- google custom search for specific microformats (cf. Sindice)