Research talk:Autoconfirmed article creation trial/Work log/2018-02-21

From Meta, a Wikimedia project coordination wiki

Wednesday, February 21, 2018[edit]

Today I'll work first on H17, then check our analysis of H18 and H19 after improving our data gathering (ref this discussion on our talk page).

H17: The backlog of articles in the AfC queue will increase faster than expected.[edit]

We can utilize our dataset of AfC reviews to estimate the size of the queue of pending reviews at AfC. In our case, we choose to estimate this once a day (at midnight), meaning that if a page is submitted on one day and reviewed the next day, it will contribute to the queue at midnight the next day. This would behave in the same way as if we were to check the size of the queue every midnight. Our calculation assumes that the queue is empty at the start, and we will sanity check this assumption by comparing our results against the best source of data we have been able to find, User:Enterprisey's dataset of the size of the AfC pending submissions.

First, we calculate our estimate and plot it from July 1, 2014 through the first two months of ACTRIAL:

Based on the graph above, the size of the queue appears to vary greatly. There is a large drop from late 2014 into the first quarter of 2015 where it appears to be reduced by about 75%. Then it increases again until mid-215 before again dropping about 75%. The second half of 2015 is less varied, but with some increase in the queue during the busier fall months (e.g. Aug, Sept, and Oct). In 2016 and 2017, the queue appears to often increase followed by a sharp decrease. Lastly we see sharp increases in the second half of 2017.

Before we move forward, we seek to understand whether our estimate is in the ballpark of the actual size of the queue. To facilitate that, we compare our estimate against the second dataset mentioned earlier. We captured that dataset in mid-December 2017, prior to the data being erased. We note that the data was erased due to a bug with category sizes, and will therefore expect the data to be off for large values. Secondly, we know our estimate does not capture AfC submissions from the User namespace, meaning that there should always be some difference between the two datasets. We are therefore more interested in understanding whether we capture growth and reduction relatively well, rather than believing we have the right value at a specific time. Plotting the two datasets together gives the following graph:

Given the observations mentioned previously, we do expect the two datasets to not follow each other exactly, and that is also the case in the graph. We do note that our estimate follows the general trend in the other dataset well, which indicates that our calculations can give us an indication of how the AfC queue is behaving during ACTRIAL.

First, let's look at 2016 and 2017, with a dotted line that shows when ACTRIAL started:

In this more focused plot, we see the trend of general growth in 2016, followed by a substantial reduction in December and January. There is a lot of growth in the queue in the first half of 2017, and then a very large decrease around June/July. That decrease is followed by another large growth period prior to the start of ACTRIAL, followed by another large growth period during ACTRIAL. The question is whether this growth during ACTRIAL is larger than expected, as that is what H17 is concerned with.

To calculate the growth, we first calculate the slope of the trend in the queue, using the same approach as we did for the New Page Patrol queue in our February 16 work log. The slope is calculated on a 7-day basis and when plotted looks like this:

When the slope is positive, there is a growth in the queue over the 7-day period it was calculated. Similarly a decrease in the queue if the slope is negative. We can see periods of large decreases in the graph, particularly the one in June/July 2017 stands out. Secondly, we can see that the growth after ACTRIAL started has been high, at times higher than it any previous period.

Looking at the graph, it appears fairly clear that ACTRIAL is dissimilar to the same period in earlier years. To further substantiate that, we calculate the average slope for the first two months of ACTRIAL and compare it to the same time period in 2015 and 2016. Here we find that in previous years, on average the queue decreased, as the average slope was -1.6 submissions per day. During ACTRIAL, we instead find a positive average slope of 11.9 submissions per day. This difference is statistically significant (t-test: t=-8.63, df=111.96, p << 0.001). This suggests that H17 is supported.

Furthermore, we note the rapid increase in the queue between July 2017 and the start of ACTRIAL, and compare the two periods. Prior to ACTRIAL, we find the smallest queue size was 314 submissions and the largest was 944, for a difference of 630. It took 57 days for this increase to occur. During ACTRIAL, the smallest queue size was 846 submissions and the peak 1628, a difference of 782. This increase happened over the course of 48 days. Average increase per day for the two periods was 11.1 and 16.3 submissions per day. While we do not compare the two directly given that they are different times of the year, we do note that it appears that the increase during ACTRIAL is atypical. However, we could also take into consideration that the number of submissions per day has gone up drastically during the trial (ref our analysis of H16). Given the increase in submissions, is the increase during ACTRIAL larger than we would expect, perhaps in particular given the increase just prior to ACTRIAL? Further analysis could look into this to provide a better estimate of what "expected" means. In the meantime, the approach we have used finds that H17 is supported.

Revisiting H18 and H19[edit]

After rerunning our data gathering earlier today, I exported a new dataset from our database on Toolforge. There appears to be no significant changes in the per-reason counts for CSDs, but changes in PROD, AfD and other. These changes are most likely due to the improved capturing of redirects. There are also not significant changes in the general conclusions (e.g. deletions in Main have decreased, in Draft they have increased). I'll go through and document the new data tomorrow and reference that from the research page instead of the previous work logs.