Research talk:Autoconfirmed article creation trial/Work log/2018-02-03

Saturday, February 3, 2018

Today I'll work on wrapping up H14.

H14: Autoconfirmed article deletions

We saw in yesterday's work log that the proportion of articles deleted within 30 days appears to be higher during ACTRIAL than during similar periods of 2014, 2015, and 2016. At the same time, we saw an increasing trend of deletions prior to ACTRIAL, suggesting that ACTRIAL would not be the cause. First, let's do a contingency matrix and goodness-of-fit comparison of ACTRIAL with the same period three years prior:

Survived % Deleted % Row total %
pre-ACTRIAL 64,241 87.8 9,808 13.2 74,049 100.0
ACTRIAL 15,658 80.6 3,775 19.4 19,433 100.0

A Chi-square goodness-of-fit test indicates this increase in proportion is statistically significant: X2=646.0, df=1, p << 0.001. This result is unsurprising given the large number of creations and the big increase in the proportion that were deleted within 30 days.

The result above does not take into consider the increase in the proportion of deletions prior to ACTRIAL starting. In order to see if that has an effect, we calculate monthly data and use a forecasting model. First, let's look at the proportion per month. The graph looks like this:

In the graph above, the increase in the proportion prior to ACTRIAL seems clearer and suggests that the increase during ACTRIAL compared to previous years might not be caused by the trial itself. We analyze the time series and find that an ARIMA(0,1,0)(0,1,0)[12] model is the best fit. Using it to forecast the first three months of ACTRIAL gives the graph below:

As we can see in the graph, the actual proportion of deletions during ACTRIAL is as expected. This suggests that ACTRIAL did not cause the increase in proportion, instead there appears to be some event prior to ACTRIAL that resulted in the increase. Further study is needed to understand why the increase has occurred. In general, we find that H14 is supported.