From Meta, a Wikimedia project coordination wiki

September 2016[edit]

This is a newsletter (cross-posted to the wikicite-discuss and wikidata lists) with major updates on WikiCite-related initiatives.


All English Wikipedia references citing PMCIDs[edit]

Thanks to Daniel Mietchen, all references used in English Wikipedia with a PubMed Central identifier (P932), based on a dataset produced by Aaron Hafaker using the mwcites library, have been created in Wikidata. As of today, there are over 110,000 items using this property.

Half a million citations using P2860[edit]

James Hare has been working on importing open access review papers published in the last 5 years as well as their citation graph. These review papers are not only critical to Wikimedia projects, as sources of citations for Wikipedia articles and statements, but they also open license their contents, which will allow semi-automated statement extraction via text and data mining strategies. As part of this project, the property cites (P2860) created during WikiCite 2016 has been used in over half a million statements representing citations from one paper to another. While this is a tiny fraction of the entire citation graph, it's a great way of making data available to Wikimedia volunteers for crosslinking statements, sources and the works they cite.

New properties[edit]

The Crossref funder ID property (P3153) can now be used to identify funders which can be linked to particular works (when available) via the P859 (sponsor) property. This will allow novel analyses on sources for Wikidata statements as a function of particular funders.

The uses property property (P3176), which Finn Årup Nielsen conveniently dubbed the "selfie property", can now be used to identify works that mention specific Wikidata properties. The list of articles and papers with that grows.

The OpenCitations bibliographic resource ID property (P3181) can be used to specify the bibliographic resource identifier for any publication in WikiCite that is also included in the OpenCitations Corpus.


WikiCite 2017[edit]

Our original plans to host the 2017 edition of WikiCite in San Francisco in the week of January 16, 2017 (after the Wikimedia Developer Summit) failed due to a major, Salesforce-style conference happening in that week, which will bring tens of thousands of delegates to the city. The WMF Travel team blacklisted that week for hosting events or meetings in SF, since hotel rates will go through the roof. We're now looking at alternative locations and dates in FY-Q4 (April-June 2017) in Europe, most likely Berlin (like this year), or piggybacking on the 2017 Wikimedia Hackathon in Vienna (May 19-21, 2017), which will give us access to a large number of volunteers as well as WMF and WMDE developers.


WikiCite 2016 Report[edit]

A draft report from WikiCite 2016 is available on Meta. It will be closed in the coming days with the additional information required by the funders of the event.

Book metadata modeling proposal[edit]

Chiara Storti and Andrea Zanni posted a proposal with examples to address in a pragmatic way the complex issues surrounding metadata modeling for books. If you're interested in the topic, please chime in.

Wikidata Primary Sources Tool RFC[edit]

The open request for comment centralizes feature requests, technical issues and general discussion on the primary sources tool, namely a data curation facility with a focus on the addition of references to Wikidata claims.


Verifiable, linked open knowledge that anyone can edit (VIVO '16)
September 23, 2016 presentation at the NIH Frontiers in Data Science lecture series.
September 29, 2016 WMF Monthly Metrics and Activities presentation

WikiCite, Wikidata and Open Access publishing[edit]

On September 21, Dario Taraborelli gave an invited presentation (slides) on WikiCite in the Technology and Innovation panel at the 8th Annual Conference of the Open Access Publisher Society (COASP 2016) in Arlington, VA. The presentation triggered a discussion on the availability of open citation data. In collaboration with Jennifer Lin (Crossref) we discovered that out of 999 publishers already depositing citation data to Crossref, only 28 (3%) make this data open. We urged publishers, particularly Open Access publishers and OASPA members, to release this data that's critical to initiatives such as WikiCite.

Linking sources and expert curation in Wikidata: NIH lecture[edit]

On September 23, Dario Taraborelli also gave a longer presentation at the National Institutes of Health (NIH) in Bethesda, MD, on September 23, mostly focused on the integration of expert-curated statements (such as those created by members of the Gene Wiki project) and source metadata in Wikidata, as part of the NIH Frontiers in Data Science lecture series. (video, slides) This is a slightly modified version of the VIVO '16 closing keynote, targeted at the biomedical science community.

So what can we use WikiCite for?[edit]

Finn Årup Nielsen wrote a blog post showcasing different ways in which a repository of source metadata could be used. He also posted a list of possible use cases, comparing Wikidata to other research information/profile systems.

WikiCite at WMF Monthly Metrics[edit]

On September 29, a short retrospective on WikiCite was presented during the September 2016 Wikimedia Monthly Activity and Metrics Meeting (video, slides)

Grant proposals[edit]

Three proposals closely related to WikiCite are applying for funding through Wikimedia Grants:


WikiFactMine is a proposal by the ContentMine team to harvest the scientific literature for facts and recommend them for inclusion in Wikidata.


Librarybase is a proposal to build an "online reference library" for WIkimedia contributors, leveraging Wikidata.


The StrepHit team submitted a grant renewal application to support semi-automated reference recommendation for Wikidata statements. The main goal is to make the primary sources tool usable.

Data releases[edit]

First release of the Open Citation Corpus[edit]

The OpenCitations project announced the first release of the Open Citation Corpus, an "open repository of scholarly citation data made available under a Creative Commons public domain dedication (CC0), which provides accurate bibliographic references harvested from the scholarly literature that others may freely build upon, enhance and reuse for any purpose, without restriction under copyright or database law." The OpenCitation project uses provenance and SPARQL for tracking changes in the data.

Data on DOI citations in Wikipedia from Crossref[edit]

Crossref recently announced a preview of the Crossref Event Data user guide, which provides information on mentions of Digital Object Identifiers (DOI) across non-scholarly sources. The guide includes a detailed overview of how the system collects and stores DOI citations from Wikimedia projects, and how this data can be programmatically retrieved via the Crossref APIs.

Code releases[edit]

Converting Wikidata entries to BibTeX[edit]

ContentMine fellow Lars Willighagen announced a tool combining Citation.js with Node.js, which allows, among other things, to convert a list of bibliographic entries stored as Wikidata items into a BibTex file.


Finding news citations for Wikipedia[edit]

Besnik Fetahu (Leibniz University of Hannover) presented his research on news citation recommendations for Wikipedia at the Wikimedia Research showcase (slides, video). In his own words, "in this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two-stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a news citation or other kinds of citations (web, book, journal, etc.). In the second step, we develop a news citation algorithm for Wikipedia statements, which recommends appropriate citations from a given news collection."

DBpedia Citation Challenge[edit]

Krzysztof Węcel (Poznań University of Economics and Business) presented his research (slides) in response to the DBpedia Citations and References Challenge, analyzing content in Belarusian, English, French, German, Polish, Russian, Ukrainian and showing how citation analysis can improve the modeling of quality of Wikipedia articles.