User:WereSpielChequers/spot checking

From Meta, a Wikimedia project coordination wiki

This is a draft of a grant proposal, other contributors are welcome to pitch in on this page or critique on the talkpage. If it gets to a viable point I will move it to the correct place for a grant application. You are especially welcome to pitch in if you fancy applying for the grant and doing the work yourself. I want this to happen but have neoither the independence, nor the neutrality or some of the required skillsets to make this happen

Purpose[edit]

Measure the accuracy of Wikipedia by checking what proportion of a random group of factoids are true. Include subsets of data to check various common theories about wikipedia.

Methodology[edit]

Select statistically valid random samples of facts" from Wikipedia articles and check whether they are true, false, out of date, copyvio or unverifiable. Where appropriate remove, reference or amend and reference the factoid. Repeat the process annually on fresh samples in order to show quality change over time.

Theories to test[edit]

How accurate is Wikipedia?[edit]

There are various theories and assertions as to the accuracy of Wikipedia, and whether that accuracy is improving. This would give an objective measure of quality and a series of benchmarks to mesure progress or lack of it.

Which language version is more accurate?[edit]

Wikipedia has versions in over 200 languages with various different policies and size of community. At a minimum this study would run on at least one wikipedia that runs a flagged revisions system such as the German language one and at least one such as the English language one that does not. There is an unproven and disputed theory that the flagged revisions system is more effective at screening out vandalism because every edit has to either be done by a whitelisted editor or approved by a whitelisted editor.it should be possible to prove which system is the more effective at screening out vandalism, though this may require samples run on several versions of Wikipedia that run the two systems.

Is older data more accurate?[edit]

Some people argue that Wikipedia peaked in about 2007 and has been gently declining since, others that quality has steadiliy been improving and that any drop in measures such as number of volunteers simply a cost of qulaity. Not everyone is prepared to work to the higher standards

How much can we trust particular types of editor?[edit]

There are various groups of editors as measured by registration status and trust systems within the community. The prevailing theory in the core community is that the whitelisted editors can be trusted, and that IP editors and newbies are less trustworthy. This study isn't entirely going to answer that because it isn't looking at reverted edits, only ones that stick, but it should tell us the proportion of Wikiedia that comes from established, casual and unregistered Wikipedians and whether some of that is more or less true.

Are cited edits more accurate than uncited ones?[edit]

One big change to at least the English Wikipedia is the increasing prortion of edits that are reverted simply for being uncited. The assumption behind that is that vandals are unlikely to back fake info with a fake citation. This study should give an objective measure of this.