Research talk:Autoconfirmed article creation trial/Work log/2018-02-01

From Meta, a Wikimedia project coordination wiki

Thursday, February 1, 2018[edit]

Today I'll continue the analysis of Draft creations and AfC submissions.

AfC submission rate[edit]

Yesterday's analysis of AfC submissions from the Draft namespace resulted in the following graph of submissions per day from mid-2014 onwards:

From this it seems fairly clear that AfC submissions are greatly affected by ACTRIAL. We see that the rate of submissions is fairly stable over time, the only time there tends to be a dip is around the holidays. It seems fairly obvious that the rate of submissions has significantly increased during ACTRIAL, and we'll substantiate that using two approaches. First, we'll examine the rate of submissions per day during ACTRIAL to the same time period in previous years, similarly as we did for deletion rates in Main and Draft (see the Jan 22 work log for an example).

We compare the number of submissions per day for September 15 to November 15, 2017 (ACTRIAL) to the same period of the three years prior to the trial. As shown in the graph above, the rate of submissions per day was fairly stable during those periods prior to ACTRIAL. We visually inspect a histogram of the distribution and find that while somewhat skewed, it does not strongly violate the assumption that the rate follows a Normal distribution around a mean. For pre-ACTRIAL the mean is 90.0 submissions per day, the median 95, and the IQR [70.5,108.5]. During ACTRIAL the mean is 209.1, median 204, and IQR [171,245]. The statistics do reflect the slight skew in the distribution, but as we see the mean and median are fairly close.

Given the large differences in means, it is unsurprising that a t-test finds a statistically significant difference: t=-17.8, df=69.9, p << 0.001. In other words, there is a significant increase in number of AfC submissions per day compared to what we would expect to see.

To further substantiate this, we also calculate the number of monthly submissions and train a forecasting model to predict the number of submissions. The number of submissions per month from July 2014 onwards is shown in the graph below:

The graph above shows that the number of AfC submissionshas been fairly stable over time with around 3,000 submissions per month. There's a dip in activity in mid-2016 and large increases in mid-2017. Lastly we see the number of submissions take off during ACTRIAL, where the number of submissions per month has more than doubled.

We remove the months where ACTRIAL is active (September, October, and November 2017) from the time series and calculate the autocorrelation and partial autocorrelation functions in order to understand the time series' periodicity and whether it is stationary:

The ACF and PACF plots suggests the time series is stationary, but an Augmented Dickey-Fuller test does not reject that hypothesis. This leads us to check models both with and without stationarity when looking for candidates. The output from R's auto.arima function indicates that we get an improvement in log-likelihood with a stationary model. We also check the performance of an ARIMA(1,0,1) model (as suggested by the ACF & PACF graphs above), but find no improvement over the one found through the automatic process. We therefore choose the model suggested by R, an ARIMA(0,1,0) model with a first-order integrated and autoregressive seasonal component with a 12-month period. The graph below shows the result of using that model to forecast the submissions for September, October, and November 2017 compared to the actual values:

The actual number of AfC submissions during ACTRIAL is much higher than what we would expect. We can see that the forecast does reflect the increase in submissions in mid-2017, but that the large increase in submissions during ACTRIAL are much higher than what we would expect.

Combining the results from the t-test and the forecasting model, we find that H16, the rate of new submissions at AfC will increase, is supported.