Jump to content

Research talk:Autoconfirmed article creation trial/Work log/2017-09-03

Add topic
From Meta, a Wikimedia project coordination wiki

Sunday, September 3, 2017

[edit]

Today I'll wrap up the article creator analysis that I started on Friday by looking at how article deletion affects editor survival.

Q5: How does article deletion affect editor survival?

[edit]

In our Sept 1 work log, we saw that accounts that start out by creating an article has a lower survival rate. Given that most of these articles get deleted, we'd expect that deletion leads to lower survival, but we need to confirm this.

We create a dataset with registration date, type of registration, whether the created article was deleted or kept, and number of editors who edited in week 1 and week 5 (where the latter implies the former). Due to issues with the dataset like we discussed earlier, we only use data for the past three years, from July 1, 2014 to July 1, 2017. Then we calculate the survival rate depending on whether the article survived or not, and by type of account creation (autocreated vs others) and plot it by day in a 2x2 plot:

In the plot above, the "no" column means the article did not survive, "yes" means it did. Based on the plot, it seems fairly clear that having your article deleted is a demotivating experience. Let's explore that further by doing the same kind of contingency matrix analysis that we did previously. First, we'll use the raw numbers because they'll make it very clear what is going on:

Edited in week 1 Edited in week 5
Article deleted 80,870 1,310
Article survived 24,289 1,929

The key thing to note here is that the number of surviving accounts is larger for those where the article survived, than for those where it did not, even though the number of accounts in the top left cell is more than three times that of the bottom left cell. This difference is made even clearer when we calculate proportions per row:

Edited in week 1 Edited in week 5
Article deleted 98.4% 1.6%
Article survived 92.6% 7.4%

Unsurprisingly, a Chi-square goodness-of-fit test on these proportions is statistically significant: X2=5551.9, df=1, p < 0.001. More interesting is perhaps the question of whether there is a difference between autocreated accounts and the rest. First, we calculate the overall numbers for autocreated accounts:

Edited in week 1 Edited in week 5
Article deleted 886 123
Article survived 436 176

Like we saw before, these numbers are an order of magnitude (or more) smaller than the overall numbers, and the pattern of a larger survival proportion of non-deleted articles remains. This means that there's no need to calculate the numbers for the non-autocreated accounts because the results there should remain the same. Instead, we verify our result for just the autocreated accounts using the same Chi-square goodness-of-fit test: X2=156.9 df=1, p < 0.001.

Thus it seems reasonable to conclude that accounts that start out by creating an article are more likely to survive if the article they create are not deleted. Question is, do we have reason to believe that these accounts are meaningfully different from the accounts that create an article but have it deleted? E.g. it could be that those who create a surviving article exhibit other traits. I'll have to dig into that in the coming days.