Research talk:Autoconfirmed article creation trial/Work log/2018-02-20

From Meta, a Wikimedia project coordination wiki

Tuesday, February 20, 2018[edit]

Today I'll work on an updated analysis of H16, aim to tackle H17 since it's the same dataset, and finalize our analysis of H20.

H16: The rate of new submissions at AfC will increase.[edit]

We did an initial data gathering and analysis of H16 in our February 1 work log. After working for a bit on H17, which is about the size of the AfC backlog, I discovered that the AfC dataset did not track submissions in a reliable way. Review timestamps were not captured correctly, leading to drafts appearing to not be reviewed until they were either published or deleted. Using a sample of recently created pages with a fair amount of submissions, in order to have pages that still exist, I went through their edit histories to understand what the actual submission history was. Comparing that to what our code was capturing, I fixed bugs related to things like not storing the review timestamp, not capturing the reviewer, and inconsistent submission status tags (the first parameter in the submission template). Once those were fixed, I deleted the existing datasets and gathered new data.

There is one potential issue with the current data gathering process in that it does not check if a submission template has been deleted. In other words, a contributor can make an AfC submission and then later delete it. We suspect that this problem is uncommon, partly due to how quickly most AfC submissions are reviewed (more about that below), and partly because it's not in the contributors interest to delete it. Improving this in a future version could be beneficial in case we're interested in continuing to analyze AfC.

Before we use the dataset to analyze the rate of submissions, we wrote R code to process the submissions and determine when a subsequent review happened. This is done in order to collapse subsequent submissions of the same page and correctly match those with a review, and ignore instances of the AfC submission template that were not actually submitted for review. We start out with 149,921 instances of AfC submission templates for Drafts created between July 1, 2014 and November 11, 2017. After processing, we end up with 76,116 reviewed submissions. Of these, seven appear to have negative review timestamps. These might be due to issues with the submission and review timestamps (e.g. it is stored as 18:00 but should be 06:00). All but one of these pages have been deleted, making it difficult to go through their edit histories to fix the timestamps, so we choose to ignore these seven data points.

We create an updated plot of the number of submissions per day, resulting in the following graph:

Looking at the graph prior to ACTRIAL, we can see it follow the general trends of Wikipedia activity. There are drops around the winter holiday and new year, a reduction in the summer months, and what appears to be a drop each weekend. Lastly, we see a substantial increase in the number of submissions once ACTRIAL starts.

How significant is the increase during ACTRIAL? We compare the first two months of ACTRIAL with similar time periods of 2014, 2015, and 2016. This is the same approach that we have used for several other hypotheses as well. We find some indications of skewness in the prior years and will therefore give more weight to the non-parametric Mann-Whitney test. For the years prior to ACTRIAL, the mean number of submissions per day is 53.9, and the median is 57 indicating there's a slight left-skewness to the data. During ACTRIAL the mean is 137 and the median 134, reflecting a much less skewed distribution. Unsurprisingly, this large difference is statistically significant (t-test: t=-18.35, df=67.98, p << 0.001; Mann-Whitney U test: W=22.5, p << 0.001).

We also want to investigate this from a time series perspective. To do so, we move from counting AfC submissions per day to counting them per month. The graph of number of submissions per month through November 2017 looks like this:

The graph again shows the drastic increase in number of submissions during ACTRIAL. In order to determine whether this is expected, we train forecasting models on the time series up until September 2017. First, we examine the stationarity and seasonality of the time series, finding that it is non-stationary and seasonal. This is also reflected in the model selected by R's auto.arima function, which is an ARIMA(1,0,0)(1,1,0)[12] that allows for drift. We decide to check alternative models and find that integrating the non-seasonal component has a much better fit to the training data (this model also does not allow for drift). Using that model to forecast the first three months where ACTRIAL was active gives the following result:

We can see that the forecast follows the increase in July and August 2017, and suggests a small increase in October and November. The true values are much higher, far outside of the 95% confidence interval. This supports the previous result of a significant increase in number of submissions, meaning that H16 is supported.