Research talk:Identification of Unsourced Statements

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Assumption on direction of mistakes[edit]

The text seems to assume that if anything we need more "citation needed" templates, but in reality we might as well have too many. The same is true for the warnings on top of the articles. A common reason is that people adding sources, citations and footnotes, or even removing questionable text, don't remove the corresponding warnings. The test being proposed here is very complicated, while it would be rather simple to test existing templates and find how many would need to be removed. You can then probably find some reliable correlations (just like an article marked as stub is unlikely to be really a stub if it increased tenfold in size since being marked so). Nemo 19:00, 18 December 2017 (UTC)

I believe the assumption is that we need a more scalable and dynamic way to be able to identify sentences that probably need citations. The potential applications of the model (which are beyond the scope of the current project) could include a tool that lets people go around adding citation needed tags all over the place, but that isn't the point of any of the applications that have been sketched out so far. Jmorgan (WMF) (talk) 22:47, 20 December 2017 (UTC)
Thanks for this comment! This is a great observation. And possibly, the tool could be used to help reduce the backlog of sentences flagged with a 'citation needed' tag. If successful, the tool would consist in a classifier that, given a sentence, can output 1) A positive (needing citation)/ negative (not needing citation) 'citation_needed' label 2) A condifence score reflecting how likely it is that the statement actually needs a citation. When running the classifier on sentences already flagged as 'citation needed', we could then a) recommend tag removal when the classifier does not the detect the sentence as needing citation b) rank the 'positive' sentences according to the confidence score, thus surfacing the sentences that definitely need to be sourced. --Miriam (WMF) (talk) 11:26, 21 December 2017 (UTC)

English Wikipedia[edit]

Is this about the English Wikipedia only? I read "Wikipedia" but I only see English Wikipedia links. Nemo 19:00, 18 December 2017 (UTC)

See Research:Identification_of_Unsourced_Statements#Proposed_Solution. Model is intended to work across languages. Research:Identification_of_Unsourced_Statements/Labeling_Pilot_planning has more information. Jmorgan (WMF) (talk) 22:42, 20 December 2017 (UTC)
Please also see Research:Identification_of_Unsourced_Statements/Labeling_Pilot_planning for more details on how to annotate multilingual data for this task. --Miriam (WMF) (talk) 11:29, 21 December 2017 (UTC)