Research talk:Autoconfirmed article creation trial/Work log/2018-01-18
Add topicThursday, January 18, 2018
[edit]Today I'll present an updated to the Research Group, continue working on analyzing the deletion data for Main and User namespaces, and work on getting a dataset of article quality for Draft and AfC.
Main namespace deletions
[edit]I approach the analysis of the Main namespace deletions in our dataset slightly differently than what I did with Draft deletions yesterday. First, I'll look at the overall level of deletions across time. Then I'll look into specific reasons and plot those. In both cases I'll look at the general history before focusing in on the period around ACTRIAL.
We plot the number of deletions per day in the Main namespace from Jan 1, 2009 to July 1, 2017, adding a LOESS-smoothed line to show the trends as we've done in previous plots:
In the graph above, we can see the decline in activity until the second half of 2012. This decline corresponds with the similar decline in creations seen in our Sept 4 work log. From then on the activity is fairly stable, with what appears to be around 500 deletions per day. There are some peaks, which are likely different types of cleanup processes (e.g. redirects). Lastly, we see a dip in late 2017, which corresponds with the introduction of ACTRIAL. Let's look more closely at the data from January 1, 2017 onwards:
Here we can see more clearly the drop in activity per day when ACTRIAL starts. There is a peak of deletions towards the end of September, out of 5,300 deletions that day, 5,000 are cleanup of redirects. We also see an upwards trend at the start of the new year. We do not find a corresponding uptick in page creations in main in our dashboard. Since we know that NPP is doing a backlog drive during this time, it would be reasonable to conclude that the increase in deletions is related to that activity.
Next, we look at reasons for deletions. As discussed in yesterday's log, we consider all Article and General CSD tags, PROD, AfD, and put everything else into a "other" category. Sorting them by total number of deletions gives the following table:
Category | Reason | Number of deletions | % |
---|---|---|---|
Other | All other reasons | 522,029 | 24.7 |
A7 | No indication of importance | 477,544 | 22.6 |
G11 | Unambiguous advertisement or promotion | 171,926 | 8.1 |
G6 | Technical deletions | 152,275 | 7.2 |
AfD | Articles for Deletion | 148,749 | 7.0 |
G8 | Depends on nonexistent/deleted page | 116,406 | 5.5 |
G3 | Pure vandalism and blatant hoaxes | 96,322 | 4.6 |
G12 | Unambiguous copyright infringement | 74,269 | 3.5 |
G7 | Author requests deletion | 61,541 | 2.9 |
A3 | No content | 50,353 | 2.4 |
A1 | No context | 44,749 | 2.1 |
G10 | Attack pages | 44,321 | 2.1 |
G2 | Test pages | 31,355 | 1.5 |
G5 | Creations by banned or blocked users | 28,420 | 1.3 |
A10 | Duplicates existing topic | 24,655 | 1.2 |
G1 | Patent nonsense | 22,195 | 1.1 |
G4 | Recreated deleted page | 15,762 | 0.7 |
PROD | Proposed deletion | 12,114 | 0.6 |
A11 | Obviously invented | 7,892 | 0.4 |
A9 | No indication of importance (music) | 7,365 | 0.3 |
A2 | Foreign language | 2,739 | 0.1 |
A5 | Transwikied article | 500 | 0.0 |
G13 | Abandoned draft/AfC | 40 | 0.0 |
G9 | Office action | 0 | 0.0 |