Jump to content

Research talk:Autoconfirmed article creation trial/Work log/2018-01-18

Add topic
From Meta, a Wikimedia project coordination wiki

Thursday, January 18, 2018

[edit]

Today I'll present an updated to the Research Group, continue working on analyzing the deletion data for Main and User namespaces, and work on getting a dataset of article quality for Draft and AfC.

Main namespace deletions

[edit]

I approach the analysis of the Main namespace deletions in our dataset slightly differently than what I did with Draft deletions yesterday. First, I'll look at the overall level of deletions across time. Then I'll look into specific reasons and plot those. In both cases I'll look at the general history before focusing in on the period around ACTRIAL.

We plot the number of deletions per day in the Main namespace from Jan 1, 2009 to July 1, 2017, adding a LOESS-smoothed line to show the trends as we've done in previous plots:

In the graph above, we can see the decline in activity until the second half of 2012. This decline corresponds with the similar decline in creations seen in our Sept 4 work log. From then on the activity is fairly stable, with what appears to be around 500 deletions per day. There are some peaks, which are likely different types of cleanup processes (e.g. redirects). Lastly, we see a dip in late 2017, which corresponds with the introduction of ACTRIAL. Let's look more closely at the data from January 1, 2017 onwards:

Here we can see more clearly the drop in activity per day when ACTRIAL starts. There is a peak of deletions towards the end of September, out of 5,300 deletions that day, 5,000 are cleanup of redirects. We also see an upwards trend at the start of the new year. We do not find a corresponding uptick in page creations in main in our dashboard. Since we know that NPP is doing a backlog drive during this time, it would be reasonable to conclude that the increase in deletions is related to that activity.

Next, we look at reasons for deletions. As discussed in yesterday's log, we consider all Article and General CSD tags, PROD, AfD, and put everything else into a "other" category. Sorting them by total number of deletions gives the following table:

Category Reason Number of deletions %
Other All other reasons 522,029 24.7
A7 No indication of importance 477,544 22.6
G11 Unambiguous advertisement or promotion 171,926 8.1
G6 Technical deletions 152,275 7.2
AfD Articles for Deletion 148,749 7.0
G8 Depends on nonexistent/deleted page 116,406 5.5
G3 Pure vandalism and blatant hoaxes 96,322 4.6
G12 Unambiguous copyright infringement 74,269 3.5
G7 Author requests deletion 61,541 2.9
A3 No content 50,353 2.4
A1 No context 44,749 2.1
G10 Attack pages 44,321 2.1
G2 Test pages 31,355 1.5
G5 Creations by banned or blocked users 28,420 1.3
A10 Duplicates existing topic 24,655 1.2
G1 Patent nonsense 22,195 1.1
G4 Recreated deleted page 15,762 0.7
PROD Proposed deletion 12,114 0.6
A11 Obviously invented 7,892 0.4
A9 No indication of importance (music) 7,365 0.3
A2 Foreign language 2,739 0.1
A5 Transwikied article 500 0.0
G13 Abandoned draft/AfC 40 0.0
G9 Office action 0 0.0