Research talk:Autoconfirmed article creation trial/Work log/2018-02-08

Thursday, February 8, 2018

Today I'll continue working on wrapping up analysis and getting results listed on the research page.

H6: The diversity of participation done by accounts that reach autoconfirmed status in the first 30 days is unchanged.

The preliminary analysis of historical data for H6 is found in the August 17 work log. In that analysis, we discovered that it was beneficial to examine accounts with at least one edit as otherwise we are reexamining H2. However, we note that H6 only concerns autoconfirmed accounts, meaning they will have to have done at least ten edits in the first 30 days. Secondly, it again makes sense to split the analysis by type of account, so we have one analysis for autocreated accounts, and one for non-autocreated accounts. We utilize two types of measures of diversity: number of namespaces edited, and number of pages edited.

We first updated the historical graph for number of namespaces edited in the first 30 days so it only shows autoconfirmed accounts:

Looking at the second half of 2017, it appears that autocreated accounts behave roughly as in previous years. When it comes to non-autocreated accounts, there's an increase in activity around September, something we have also seen in other parts of our analysis. Focusing in on the most recent two years and adding trend lines makes this perhaps easier to see:

In this plot the variation in participation for non-autocreated accounts is more apparent. We can also see that the level of activity for autocreated accounts is more stable.

Secondly, we measure the mean number of pages edited in the first 30 days. Here we use a geometric mean due to skewness the data. Otherwise we would be greatly affected by highly active accounts. We again create both a historical graph with data from 2009, and one of the two most recent years. First the historical graph:

There appears to again be some increased variation for non-autocreated accounts, but it is less apparent than when looking at number of namespaces. That might be due to the larger variation in autocreated accounts. Secondly, we can see that autocreated accounts tended to be more diverse in 2009–2013, whereas in the more recent years their diversity has settled down slightly above autocreated accounts. This is also seen in the graph of recent years:

Over the past two years, we see that both autocreated and non-autocreated accounts appear to have stable geometric means around 5 pages. There is again more variation in the autocreated accounts, which can be attributed to the low number of accounts reaching autoconfirmed status.

As we have done for some of the other hypotheses, we start by comparing the first two months of ACTRIAL to similar time periods in previous years. Based on the historical graph for the number of namespaces edited in the first 30 days, it appears that the variation in the second half of the year is larger in more recent years than it has been historically. This leads us to use the years 2014, 2015, and 2016 as comparison years for ACTRIAL. Note that this kind of comparison examines the whole first two months of ACTRIAL with a similar large time period prior to ACTRIAL.

For autocreated accounts, we find that the average number of namespaces edited has increased from 2.04 to 2.28. This is a significant increase (t-test: t=-4.03, df=925.95, p << 0.001; Mann-Whitney U test: W=634440, p < 0.001). For non-autocreated accounts, we find that the average has increased from 2.31 to 2.56, which again is a significant increase (t-test: t=-15.91, df=17317, p << 0.001; Mann-Whitney U test: W=156080000, p << 0.001).

We run similar tests for the number of pages edited. Here we again find a significant change for autocreated accounts. The geometric mean number of pages edited pre-ACTRIAL is 4.18, during ACTRIAL it is 4.76. Both of our tests indicate that this is significant (t-test: t=-2.69, df=968.48, p=0.007; Mann-Whitney U test: W=655560, p=0.011). For non-autocreated accounts we find an increase in the geometric mean from 4.52 to 5.00 pages. Again both tests find significant differences (t-test: t=-9.33, df=17993, p << 0.001; Mann-Whitney U test: W=161210000, p << 0.001).

Similarly as for other hypotheses, we also create models for forecasting monthly data. First, let's look at the monthly data:

In the monthly data, there appears to be a slow upwards trend in the more recent years, suggesting that the increases we've seen in the previous analysis is something we would expect if the trend continued. This is something the forecasting models might pick up.

We first model the time series of monthly data for autocreated accounts, measuring average number of pages. Our analysis finds that the time series does not appear to be seasonal, in other words there is not a readily apparent yearly cycle. The time series is definitely not stationary as-is, but the first differences are. We use R's auto.arima function to train a model, and test competing models as well. The ACF and PACF of the first difference of the time series suggests an ARIMA(2,1,0), while R's automatic process results in an ARIMA(0,1,1)(1,0,1)[12] model. While the seasonal model is more complicated, its statistics are much better than a non-seasonal model, and we therefore choose to use the seasonal model. Forecasting for the first three months of ACTRIAL gives the following graph:

In the graph above, we can see that the forecast is around 5 pages, but with a wide confidence interval. There is a large amount of variance in the first half of 2017, and the graph shows a peak in August of almost 6 pages. This continues in September, before we have a drop down towards 5 pages. The latter can be seen as a regression towards the mean. All three months are within the 95% confidence interval, suggesting that the increase we have seen during ACTRIAL is not unexpected.

We then also examine this for non-autocreated accounts. The time series for that data is found to be both non-stationary and seasonal. Examining the ACF and PACF suggests AR(3) or MA(3) in the non-seasonal part of the model. We again use both R's auto.arima function as well as manually built models, finding that the auto-generated ARIMA(0,1,2)(1,0,1)[12] model shows the best performance in the training set. Using it to forecast the first three months of ACTRIAL gives the following graph:

This graph is somewhat similar to the forecast we had of survival of non-autocreated accounts in our February 7 work log, although in this case we see a regression towards the mean in October and November. Looking at the same time period in 2015 and 2016 shows similar trends of peaks in the second half. If we train a model on just the last two years, the true value ends up being within the 95% confidence interval of the forecast, again suggesting that the increase during ACTRIAL is not unexpected.

In conclusion, we see evidence that H6 is supported. Similarly as we saw for H5, we note that the activity levels of contributors who registered in September 2017 appears to be higher than what we have seen in previous years, and further work would be needed to determine the underlying causes.