Research talk:Autoconfirmed article creation trial/Work log/2018-01-22
Monday, January 22, 2018
Today I'll continue the analysis of deletions in Main, then apply the same type of analysis to the User namespace, which should wrap up that analysis. I'll then start writing the code to gather quality predictions for the Draft namespace.
Main namespace deletions
There are two parts left of the analysis of deletions in the Main namespace: investigating changes in the reasons for why pages get deleted, and studying the change in rate of deletions. Let's look at the changes in reasons for deletions first.
Measuring changes in the reasons for deletions continues our work from Friday Jan 19. There we have a table of total usage of various approaches to deleting articles. Now we are interested in understanding how the usage of these has changed with ACTRIAL. In order to understand that, we compare usage during the first month and a half of ACTRIAL with that during the same time periods in the five years prior (2012–2016). As mentioned in Friday's work log, we chose those years because total number of deletions in the Main namespace has been fairly stable during all of them. We then get the following changes in deletions:
|Reason||Deletions/day pre-ACTRIAL||Deletions/day ACTRIAL||Delta deletions/day||Delta deletions/day (%)||Proportion pre-ACTRIAL (%)||Proportion ACTRIAL (%)||Delta proportion (%)|
Change in rate of deletions
We use the same dataset to make a comparison between ACTRIAL and the same time period prior to it. Here we find a large and significant drop in the number of deletions per day from a median of 382 to a median of 163 (geometric mean changes from 393.2 to 173.7), or just about 220 pages per day. This change is statistically significant (t=19.31, df=77.2, p << 0.001).
Most of this change comes through a decrease in the number of speedy deletions. If we remove PROD, AFD, and "other" from the dataset and make the same comparison we find a change in median from 278 to 64, or 214. The geometric mean changes from 277.1 to 67.2, or about 210 pages per day. This change is also statistically significant (t=32.27, df=71.1, p << 0.001).