Research talk:Building automated vandalism detection tool for Wikidata

"Radical trust"[edit]

Wikis, in general, are based on the concept of radical trust; i.e., it is believed that individual participation, for the most part, includes correct information. Nevertheless, the identification of attempted fraud or vandalism is necessary.

Caminha, C., & Furtado, V. (2012, April). Modeling User Reports in Crowdmaps as a Complex Network. In Proceedings of 21st International World Wide Web Conference. PDF

--EpochFail (talk) 18:36, 8 February 2016 (UTC)[reply]

This is really good, we should use it. Amir (talk) 19:52, 8 February 2016 (UTC)[reply]

Neis et al. notes[edit]

Reviewing:

Neis, P., Goetz, M., & Zipf, A. (2012). Towards automatic vandalism detection in OpenStreetMap. ISPRS International Journal of Geo-Information, 1(3), 315-332. PDF

They include a discussion of the observed types of vandalism. We should provide our own observations of what we saw in the labeling of reverted-untrusted edits.

Otherwise, this paper lacks heavily on evaluation. They cite past work using machine learning as a means to tune signal, but then do their own tuning by hand without discussing how the weightings were actually implemented. Were they eyeballed? And what of the fitness of the model? After a while, they only talk about edits that are predicted to be vandalism. What proportion were *actually* vandalism!? I'm a little torn about citing this paper because it contributes so little (due to lack of replicability and evaluation). --EpochFail (talk) 19:11, 8 February 2016 (UTC)[reply]

I have the same feeling too Amir (talk) 19:52, 8 February 2016 (UTC)[reply]