Research:Vandal fighter work load

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Nutshell.png
This page in a nutshell: This research looks at the changing nature of EN Wiki's vandal fighting team since 2007. The study found that there is a steeper decline in the number of vandal fighters than all editors, with the steepest decline amongst less active vandal fighters. The number of vandal reverts completed by individual fighters also appears to be declining, suggesting that though the number of hours we need coverage for hasn't changed, the overall workload of vandal fighters is decreasing.
Gnome globe current event.svg This project page documents a research project currently in progress.
Information may be incomplete and can change rapidly as science advances.
Research project
Vandal fighter work load
Main contact EpochFail
Start 2011-06
End 2011-06
Status in progress Icon 66 percent.png
Open data This project has published open-licensed data
Open access This project has open access publications
WMF support
Wikimedia research projects Wikimedia research projects

I'll be examining the workload of of vandal fighters and how it has been changing over the years. I'll be using this analysis to check whether the growth of new users and interest in Wikipedia is increasing the work that Wikipedians have to do, and therefore, making them stressed.

Topic[edit]

How has the total work load of vandal fighters changed over time and how is the work distributed among editors?

Process[edit]

To assess the work load of vandal fighters, all identity reverts[1] from the most recent dump of enwp by examining and comparing article text.

Identifying vandal fighting[edit]

For vandal fighter definition identity reverts were coded based on the comment left by the reverting editor. Work from Priedhorsky et al.[2] provides a regular expression with a known error rate that can be used to detect a large proportion of vandal related reverts.

For vandal fighter work load the sheer number of identity reverts was used regardless of the D_LOOSE or D_STRICT regular expressions. This decision was made under the assumption that, if an editor is performing over 2000 reverts in one year, the majority of them should be for vandalism. After filtering out bots from the analysis, spot checking supports this assumption.

Definition of vandal fighter[edit]

Since the commonly used definition of "active editor" is a user who performs at least 5 revisions per month[3], I've defined vandal fighters as editors that perform at least 5 reverts for vandalism per month.

See also[edit]

Results and discussion[edit]

Overall vandalism and reverts D_LOOSE/D_STRICT[edit]

Total reverts are plotted with the proportion that the D_LOOSE/D_STRICT regexp identified as being for vandalism.



The decreasing proportion of vandal fighters[edit]

Proportion of active editors who are vandal fighters (>=5 D_LOOSE/D_STRICT
reverts per month) with linear and logged (base 10) y axis.

Vandal fighters and active editors by month.
Logged vandal fighters and active editors by month. Logging the y axis to show variance in the amount of vandal fighters.

Vandal fighters (editors with more than 5 vandal reverts/month) represent a very small proportion of active Wikipedia editors. Representing the comparison between the number of active editors and vandal fighters is difficult in raw numbers due to the magnitude of difference between the two groups. For this reason, I've provided two plots. Vandal fighters presents the data on a linear y axis whereas Logged vandal fighters represents the data on a logged (base 10 y axis. e.g., "2" = 10^2). With careful observation, the decreasing trend in both active editors and vandal fighters can be seen. But the proportion of editors who fight vandalism is decreasing.


To test for a change in the proportion of vandal fighters regressions were performed to identify a trend. Proportion of active vandal fighters over time plots the proportion of vandal fighters per month with the fitted regression lines for three different definitions of active vandal fighter: >= 5, >=50 and >= 500 vandal reverts per month. The supplied coefficents can be read as the expected decline in proportion per month. It's interesting to note that for particularly active vandal fighters (>= 500), the proportion does not appear to be decreasing substantially (p=0.075). Since this analysis relies on an imperfect approximation (D_LOOSE/D_STRICT), I also performed the analysis with raw number of reverts.


Proportion of active reverters over time plots the proportion of active reverters per month with fitted regression lines for three definitions of active reverter: >= 5, >=50 and >= 500 reverts (vandalism or not) per month. These trends seem to reflect the same reduced effect as the threshold of reverting activity increases as was apparent for the proportion of active vandal fighters over time.

Together, these results suggest that the proportion of vandal fighters has been decreasing, but that the decrease has occurred most strongly with editors who perform less vandal fighting activity. The ranks of the most active vandal fighters appear to be remaining relatively stable.


The top vandal fighters and their work load[edit]

A ranked list of editors by their vandal reverting activity is plotted for 2007-2010.

Top 50 vandal fighters by year presents the top 50 vandal reverters per year by the number of revisions reverted (with bots removed). Since the D_LOOSE/D_STRICT approach for detecting vandal reverts could bias the ranking against editors who used tools that weren't detected by the regular expression, it was discarded for this chart under the assumption that any editor that reverts more than 10,000 revisions in a year is most likely primarily reverting vandalism.


Plots of average activity levels per editor are plotted for revisions,
reverts and vandal reverts (D_LOOSE/D_STRICT) are plotted by month.

Work load per user over time depicts the average amount of activity per editor-month (without bots) over time for three measures of activity: revisions, reverting revisions and vandal reverting revisions. These three charts represent the focus of this sprint, the workload of the average vandal fighter. For both revisions and reverting revisions, the average amount of activity per editor (as measured) has been decreasing. The results are not substantially different even when examining only the reverting activity top 50 reverting editors. This suggests that the workload of these vandal fighters has been decreasing over time. Curiously, it looks like the average amount of revisions per editor is increasing, which suggests that editors might be increasing the amount of work they do overall.


Summary[edit]

The activities of vandal fighters was examined with a focus on trends since 2007. A regular expression was used on the comment of reverting editors to determine which ones occurred for vandalism. From this study, there are three practical findings:

  1. The proportion of active editors who fight vandalism is decreasing and this decrease seems to primarily coming from the ranks of casual vandal fighters.
  2. The vandal fighting workload (and reverting in general) is decreasing per person.
  3. The overall work load per editor (as measured by number edits) is increasing. This could be a symptom of the lowered newbie retention and the fall of the casual editor.

Future work[edit]

References[edit]

  1. A revert action that returns the article exactly to a previous state
  2. See D_LOOSE & D_STRICT from Priedhorsky et al. Creating, Destroying and Restoring Value..., GROUP '07
  3. http://commons.wikimedia.org/wiki/File:Definitions_of_Research_Terms.pdf