Jump to content

Research talk:The Speed of Speedy Deletions

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 12 years ago by WereSpielChequers in topic Method/Sample size

Notified but not tagged[edit]

I'm curious about the phenomenon of people being notified about a speedy deletion but the article not being tagged for speedy deletion. Are these instance where the admin has simply deleted an article and then subsequently informed the editor? If so it makes a very big difference what sort of articles these were. If they were unsourced attack pages then this is perfectly OK, if they were goodfaith then this is possibly more problematic, especially if they were only just created. Alternatively could these possibly be instances where the article and its tag were deleted, but the article has since been recreated without the tag being restored? WereSpielChequers 14:12, 29 June 2011 (UTC)Reply

warning after deletion[edit]

Re the 18.8% who were given a CSD warning after their article was deleted

There are two common reasons why a notification could take place after deletion. Firstly, and I believe usually uncontentiously, a lot of speedy deletions will be done by admins without others having previously tagged the article. In these cases the deletion and warning will usually take place within seconds of each other and it would be better to treat them as simultaneous. Secondly there are a number of editors who don't notify editors and as a result there are bots that follow up after them notifying authors. This is something of a contentious area, I've tried and failed in the past to get a policy change to the effect that notification is part of the speedy deletion process. Opposition ranges from it being policy creep to the inadvisability of telling volunteers they have to do anything. Increasing the frequency with which the bots query the category speedy deletion could reduce the number of people who are notified after the deletion, but you don't want to make it too quick or you tread on the toes of the taggers. WereSpielChequers 09:42, 5 September 2011 (UTC)Reply

Hmm, interesting. Thanks WSC. I've left a short parenthetical note to that effect in the results. Steven Walling (WMF) • talk 22:57, 5 September 2011 (UTC)Reply
Among the admins who occasionally help out at NPP, many very often CSD tag articles for other admins to physically delete. I generally do this too, only immediately deleting articles that must be removed very quickly such as blatant vandalism, spam, copyvio, and attacks. In these cases, I apply a post-deletion warning template on the creator's talk page,usually within seconds. --Kudpung 04:41, 6 September 2011 (UTC)Reply
Twinkle notifies the page creator upon tagging and deleting, no? Σ 00:04, 7 September 2011 (UTC)Reply
Twinkle does indeed, though I don't know if anyone can overright the defaults. My experience is that it is generally non-Twinklers who tag but don't notify. WereSpielChequers 20:19, 7 September 2011 (UTC)Reply
Unfortunately - as we have discussed before - NPP is generally staffed mainly by the most inexperienced of all users; hence many of them don't even know what Twinkle is or how to install it. In hindsight, probably not a bad thing. — The preceding unsigned comment was added by Kudpung (talk)
Hi Kudpung, the newbies who do this are usually in my experience amenable to a little advice, at least when it comes to notifying. I'm more concerned about the old hands and some of the IPs who know it isn't compulsory and choose not to notify. I think the inexperienced editors are more likely to make incorrect tags. WereSpielChequers 19:01, 8 September 2011 (UTC)Reply

Method/Sample size[edit]

In order for the research statistics to be used effectively, particularly in discussions and other research relating to improvement in new user recruitment, user retention, New Page Patrolling, new article creation control, and other policy changes, I would suggest:

  • Basing the sample on a more recent time scale, such as for example August 2009 through August 2011, in order to reflect current trends in new page creation, new page patrolling, volume of deletions, type of deletions, new account creations.
  • Significantly increasing the sample size, perhaps even to take into account all items of research criteria during that period rather than a very small random sample.
  • Providing graphs, bar and pie charts of these stats.

Note: A similar set of statistics is being prepared by en.Wiki editors for use as metrics for a trial that is shortly to be implemented. Perhaps some liaison and pooling of resources might be helpful. --Kudpung 05:11, 6 September 2011 (UTC)Reply

Hey Kudpung. Just to clarify one part: the table of criteria cited as a percent of CSD and of all deletion is for complete English Wikipedia data, not a sample. (I'll note that more clearly in the results.) Steven Walling (WMF) • talk 16:38, 6 September 2011 (UTC)Reply
One trend we found during the Summer of Research is that there has been a longterm steady increase in Spam - still small compared to goodfaith editing or vandals, but much bigger than it once was. I would anticipate that a recent CSD sample would have a higher ratio of spam deletions than stats from a longer time period. Of course its also possible that the community interpretation of "blatant spam" has drifted over the years. WereSpielChequers 19:08, 8 September 2011 (UTC)Reply
I'd think that, as Wikipedia has grown, more people/organizations have tried to spam about themselves, which would lead to an increase in G11s and G12s. The Blade of the Northern Lights (話して下さい) 21:44, 8 September 2011 (UTC)Reply
Yeah, just for context, in the work WSC is referring to we classified things that would be tagged G11 as spam, so it's not just linkspam. Steven Walling (WMF) • talk 22:02, 8 September 2011 (UTC)Reply
Yes that's broadly the pattern as I see it - spam is related to readership as audience size is the metric that spammers go for. Of course not everything written by the marketing dept gets tagged as spam, I'm pretty sure that a lot of our A7 tags are written by PR people rather than fans. I'm not so sure about the G12 ones though, I think a lot of those are by kids from the cut and paste generation. WereSpielChequers 22:12, 8 September 2011 (UTC)Reply
I personally define spam any new page or content which enhances a non notable subject's chances of personal or corporate gain, such as company listings, charities, petitions, nn politicians during run up to elections, commercial websites, and publishers promoting their books/authors, etc. Many of these pages are created by professional marketing and SEO agencies. It's difficult to know if they do this in ignorance of our rules or in blatant disregard for them. In some, but far from all cases, we find out when they repeatedly recreated in spite of warnings. Because we have to AGF, a lot of spam articles are deleted as A7 rather than G11. Many NPPers don't understand or make the disctinction anyway, and deleting admins often do not correct the chosen criterion. Kudpung 01:47, 9 September 2011 (UTC)Reply
That's a somewhat broader interpretation of spam than current policy, and it would bring in the work of a lot of goodfaith editors. For example lots of writers of sports articles get caught out by the rule that someone needs to have actually played for the team to be notable, so they create articles on newly signed players for major teams. WereSpielChequers 08:36, 9 September 2011 (UTC)Reply
I certainly don't deny that a lot of inappropriate pages are created in good faith in ignorance of the rules, but it does not alter the fact that those pages are not wanted. The Wikipedia 'is the encyclopedia anyone can edit' but that does not absolve new page creators from observing the rules, nor does it excuse new page patrollers for not not knowing them and implementing them correctly. Unfortunately that leaves us in the unenviable position of having to review pages before they can go live. This would only take a moment, and it's practiced by most blogs and Internet forums. Kudpung 14:19, 9 September 2011 (UTC)Reply
Sadly we lost the flagged revisions/pending changes debate, the equivalent for newpages would be to noindex them until patrolled. I suspect that we would probably both support that. But surely you'd agree that judging the success of the system in terms of efficiency with which it deletes articles that merit deletion whilst ignoring those articles that are incorrectly deleted is to miss a key metric. WereSpielChequers 12:32, 17 September 2011 (UTC)Reply
The German Wiki uses a system similar to flagged revisions and it seems to work for them - at last there are no highlighted new pages at all in their special:new_pages list. However, they only have an average of 11,000 of so new pages per month to contend with, but they do have some stringent qualifications for the user right of 'reviewer'. I certainly agree that it is essential to judge the efficiency of any new page patrolling system not only on the pages that get correctly deleted, but also on the ones that should be deleted but get passed as fit for service by the patrollers. This latter metric is practically impossible to make a script for, and relies heavily on the empirical findings of the few experienced editors and admins who are monitoring the work of the patrollers. I'm not sure that many of the wrong pages actually get deleted - if they do, then it would be time to seriously review the quality of our admins. --Kudpung 13:03, 17 September 2011 (UTC)Reply
I recently saw a page speedied as A7 that had previously survived AFD, so yes we have a problem with incorrect deletions. EN:WP:NEWT was nearly two years ago now, but one of the big lessons from it was that some admins do indeed delete pages that they shouldn't. Including one deleted for poor formatting. I suspect that most of the incorrect speedy deletions are articles that would be deletes or close calls at AFD, but my suspicion is that any admin who starts deleting "articles that would probably be deleted at AFD" is bound to make more serious mistakes and amongst every batch of articles that would probably be deleted at AFd will be some that actually wouldn't. That isn't to dispute that most deletions are within policy, but we should remember that the babies are worth far more tan the bathwater. WereSpielChequers 12:47, 27 November 2011 (UTC)Reply


In view of the discussions that are taking place at http://www.mediawiki.org/wiki/Talk:New_Page_Patrol_Zoom_Interface, is this research ongoing and is it to be concluded? --Kudpung 09:17, 26 September 2011 (UTC)Reply

Hey, sorry if the status seemed cryptic, but basically you can consider any of the projects labeled as part of the summer research project over. We left them open to the individual researchers as to whether they wanted to mark them official closed or not, since some of them may continue them as part of their academic research outside the WMF. Steven Walling (WMF) • talk 20:13, 26 September 2011 (UTC)Reply
Thanks Steve. I was just curious because I don't think the stats are very helpful for what we are working on now, such as user retention, and NPP. Because Wikipedia has grown and evolved enormously since 2001, trends will have changed over time and the stats are colored by too many different periods in the sample.
I would like to see a table based on the complete data from the English Wikipedia database for only the period Sept 2009 through Sept 2011. Is this available, or can we get hold of the guy who did this and ask him to extract that data for us - or provide up with the script he use to search the db? Cheers, --Kudpung 22:19, 26 September 2011 (UTC)Reply
Yeah, I'll ask him (it's User:EpochFail) if he can add it to the Research:Query Library or here. Steven Walling (WMF) • talk 17:30, 27 September 2011 (UTC)Reply
Thanks again; I hope he will respond. It would greatly help us to more accurately define today's New Page Patroller profile, discover what we can best do for or about them, and ultimately lead to more focussed development of the Zoom and the other solutions we are looking at for new page quality control, and new user retention. Kudpung 23:31, 27 September 2011 (UTC)Reply