Research:Identification of Unsourced Statements

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
23:06, 18 October 2017 (UTC)
Duration:  2017-October — 2017-
citation needed, machine learning

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.

To guarantee reliability, Wikipedia's Verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in the article space. While already around 300K statements [4] have been identified as unsourced, we might be missing many more!

This projects aims to the community discover potentially unsourced statements, i.e. statements that need an inline citation to a reliable source, using a machine assisted framework.


We will flag statements that might need the [citation needed] tag. This recommendation will be based on a classifier that can identify whether a statement needs a reference or not. The classifier will encode general rules on referencing and verifiability, which will come from existing guidelines [1][2] or new observational studies we will put in place.

More specifically, we propose to design a supervised learning classifier that, given examples of statements with citations (or needing citations), and examples of statements where citations are not required, learns how to flag statements with the [citation needed] Template.

Data Collection[edit]

Data Sources[edit]

Possible sources:

Guidelines for data collection (and modeling)[edit]

Feature Extraction[edit]


We aim to work in close contact with the Citation Hunt developers and the Wikipedia Library communities. We will pilot a set of a recommendations, powered by the new citation context dataset, to evaluate if our classifiers can help support community efforts to address the problem of unsourced statements.


Q1, Q2