Research talk:Autoconfirmed article creation trial/Work log/2018-02-06

From Meta, a Wikimedia project coordination wiki

Tuesday, February 6, 2018[edit]

Today I'll continue working on results for all our hypotheses.

H2: Proportion of newly registered accounts with non-zero edits in the first 30 days is reduced.[edit]

We made a preliminary analysis of historic data for H2 in our August 16 work log. At that point, ACTRIAL hadn't started, so we could draw no conclusions. Now that we have ACTRIAL data, we examine the first two months of the trial and whether it has had an effect on participation.

First, let's revisit the plot of proportion of accounts making edits historically, as well as from 2016 onwards to see if ACTRIAL appears to have had an effect:

Looking at the historical data, we can see some large changes across time, but it is difficult to notice ACTRIAL having any effect. There appears to be some reduction in the proportion for non-autocreated accounts in late 2017, but is it not obvious that it happened right around ACTRIAL. The proportion of autocreated accounts editing appears to be completely unaffected by ACTRIAL. Let's focus in on the last two years, add a vertical line that shows when ACTRIAL started, and plot it up until November 15, which is our two month cutoff:

In the more recent data, it appear clear that ACTRIAL has had no effect on the autocreated accounts as their proportion shows little to no change. For the non-autocreated accounts, it looks like we had a drop at the start of ACTRIAL, but participation returns to "normal" levels shortly thereafter.

To further substantiate what has happened, we switch to monthly data and use forecasting models as we have done previously for other hypotheses. First, let's look at the historical plot of proportion of accounts editing:

The monthly plot make the higher participation trends for non-autocreated accounts in the early years of the plot easier to spot. There's a shift downwards in the second half of 2012, after which the level stays fairly consistent around 32.5%. As we mentioned in our preliminary analysis back in August, the SUL finalization project, as well as a call to register on the mobile app, greatly affects the proportion in 2014 and 2015. We can see that the proportion for autocreated accounts is generally stable, although perhaps on a slowly downwards trend overall. Lastly, it appears that ACTRIAL has had no effect on these proportion because we see no significant changes in the proportion in the most recent months. Our forecasting models should help determine that.

We split the data into separate time series for autocreated and non-autocreated accounts, limit the data to the end of August 2017, and then build ARIMA forecasting models that we use to forecast the proportion for the first three months where ACTRIAL was active (September, October, and November). Building a model for autocreated accounts was fairly straightforward due to the stability in the data. Examining the autocorrelation plots and taking seasonality into account, it was clear a yearly cycle was present, and that the model did not have to be very complex. Using R's auto.arima function, the best model fit was found to be an ARIMA(1,0,1)(0,1,1)[12] model with a drift term. Adding a drift term makes sense due to the slowly decreasing proportion over time that we have seen in both the daily and monthly plots. Using this model, we get the following forecasting plot:

The plot above only shows the autocreated accounts, making the variation in the data more visible. In particular, we can see the increase in proportion in July and August, which the model then accounts for. The true proportion was lower, around 3.75%, but that's about where it was in 2016 and first half of 2017. As we can also see, the true proportion is within the 95% confidence interval of the forecast, suggesting that ACTRIAL hasn't significantly affected the proportion.

Next, we apply the same approach to the non-autocreated accounts. Building a forecasting model for that data was slightly more complicated, the autocorrelation function plots revealed a peak at , but the auto.arima decided to create a simple ARIMA(1,1,1) model. Studying the residuals of that model showed the same peak as before, and the model fails a Ljung-Box test for remaining autocorrelations. We therefore switched to manually testing models to find ones that passed the Ljung-Box test, and then selected the one with the lowest BIC. This process resulted in finding an ARIMA(0,1,8)(0,1,1)[12] model. Using it to forecast the proportion during ACTRIAL results in the following plot:

The forecast plot shows that the actual values are what we would expect given the historic data. Based on the stability in the proportion in 2016 and 2017, this should not be much of a surprise.

In conclusion, we find that H2 is not supported. The proportion of accounts making edits during the first 30 days after registering appears to be unaffected by ACTRIAL.

H3: Proportion of accounts reaching autoconfirmed status within the first 30 days since account creation is unchanged.[edit]

We made a preliminary analysis of historic data for H3 in our August 16 work log. In that analysis, we noticed that measuring the proportion of accounts making at least one edit and reaching autoconfirmed status made most sense, as otherwise we would appear to just measure the proportion of accounts making edits. We also found that the proportion has been fairly stable across time.

Similarly as for H2, we update our historical plot with recent data in order to see if there is an inflection point at the start of ACTRIAL. The plot going back to 2009 looks as follows:

In this plot there appears to be a an increase in the proportion around the start of ACTRIAL for non-autocreated accounts, but it's not clear. Let's focus on the last two years and add a vertical line for ACTRIAL:

Looking at the trend line for non-autocreated accounts, it appears that we saw a similar increase in the proportion in September 2016, suggesting that the increase in 2017 is not related to ACTRIAL. For autocreated accounts, it appears that the proportion tends to increase when Wikipedia activity decreases, e.g. over the summer months and the winter holiday. As in previous analysis, we move to measuring the data on a monthly basis and see if ACTRIAL appears to have made a difference:

Based on the monthly plot, it appears that the proportion for autocreated accounts is fairly stable across time, while for non-autocreated accounts there is more variation. A further analysis of the time series for autocreated accounts finds that it is stationary and has a yearly seasonal component, meaning it can be modeled by an ARIMA(0,0,0)(0,1,1)[12] model. This model is not suggested by R's auto.arima function, which instead prefers a non-seasonal model. The issue with the non-seasonal model is that it fails the standard tests such as the Ljung-Box test. Using the seasonal model to forecast the first three months where ACTRIAL is active gives the following graph:

In the graph above, we can see that the actual data follows a downwards trend from the summer months, while the forecast instead follows the increasing trend from previous years. That being said, we can see that the forecast and actual values are close to each other, and the actual values are within the 80% confidence interval of the forecast. This suggests that ACTRIAL has not had any effect on the autocreated accounts.

We next analyzed the proportion of non-autocreated accounts that make at least one edit and reach autoconfirmed status within 30 days. In this case, the time series is more complex and shows differing yearly cycles. There is an increase in the second half of 2015 and 2016 that is not found in 2013 and 2014, for example. The result of this is that R's auto.arima function again selects a problematic model. However, finding a good model for this data ends up being difficult. Through systematic tests, we find that an ARIMA(1,1,1)(1,1,1)[12] model appears to provide reasonably good results. There might be an improvement to that model somewhere, but finding it is outside the scope of this project. Using the mentioned model to forecast non-autocreated accounts during ACTRIAL gives the following graph of the result:

We can see in the graph above that the actual values are outside of the 80% confidence intervals, and in September outside of the 95% confidence interval. This would suggest that ACTRIAL has had an effect. However, if we look at the trends from 2015 and 2016, we see a similar increase in the fall of those years, indicating that the trend in 2017 is not caused by ACTRIAL.

In conclusion, we find that H3 is supported. During ACTRIAL, the proportion of accounts making edits and reaching autoconfirmed status within 30 days after registration is unchanged.

H4: The median time to reach autoconfirmed status within the first 30 days is unchanged.[edit]

Our initial analysis of historical data for H4 was done in our August 17 work log. In the plots there, we found that typically, accounts that reach autoconfirmed status do so in four days. Updating the plots with recent data shows that this continues to be the case. Below is an updated historical graph:

For autocreated accounts, there are days where the median time to autoconfirmed status is well above four days. This might be driven by a low number of accounts reaching the threshold on a given day. We can also see that for non-autocreated accounts, the median is generally stable at four days. In the graph, there appears to be days around ACTRIAL where the median is above, so we make another graph looking at just 2016 and 2017, adding a vertical line at the start of ACTRIAL to make it easy to spot:

There looks to be some days in the 2016/17 plot that has a median above four days after the start of ACTRIAL, but there is not a consistent trend. In order to investigate, we compare the two first months of ACTRIAL with similar times of the year in 2014, 2015, and 2016. As there appears to be some difference between autocreated and non-autocreated accounts, we split the dataset into two. We also look at the mean and third quartile in order to understand if there appears to be shifts during ACTRIAL:

Caption: Summary statistics of days to autoconfirmed status.
Account type Period Median Mean Third quartile
Autocreated Pre-ACTRIAL 4 8.30 10.66
Autocreated ACTRIAL 4 8.41 10.96
Non-autocreated Pre-ACTRIAL 4 7.67 8.18
Non-autocreated ACTRIAL 4 8.02 9.26

As we can see, the median time to autoconfirmed status both before and during ACTRIAL is 4, indicating that H4 is supported.