Research:Identification of Unsourced Statements
To guarantee reliability, Wikipedia's Verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in the article space. While already around 300K statements  have been identified as unsourced, we might be missing many more!
We will flag statements that might need the  tag. This recommendation will be based on a classifier that can identify whether a statement needs a reference or not. The classifier will encode general rules on referencing and verifiability, which will come from existing guidelines  or new observational studies we will put in place.
More specifically, we propose to design a supervised learning classifier that, given examples of statements with citations (or needing citations), and examples of statements where citations are not required, learns how to flag statements with the  Template.
- Manual annotation: WikiLabels (@Halfak )
- Manual annotation: Hypothesis.is
- 1lib1ref edits
- Automatic collection:
Guidelines for data collection (and modeling)
- Best Practices mined from Wikipedia citation guidelines: https://docs.google.com/a/wikimedia.org/spreadsheets/d/1nUc8WmtU8F97vcmNv9LnqNSmOiK2UU9AOAl2JBdRFBs/edit?usp=sharing
- Patterns of typical reasons why editors add the  tag to a statement:
We aim to work in close contact with the Citation Hunt developers and the Wikipedia Library communities. We will pilot a set of a recommendations, powered by the new citation context dataset, to evaluate if our classifiers can help support community efforts to address the problem of unsourced statements.