Research:Projects/2011 Summer of Research/New User Participation in Deletion Processes
This week's sprint builds on the previous week's work on the speed of speedy deletions, which examined what happens when new users create a new article which is subsequently tagged for speedy deletion. In that study, it was found that the median time between when an article was created and tagged for speedy deletion was two minutes, that the median time between when an article was created and deleted was 34.5 minutes, and about 95% of the new users' articles which were tagged for speedy deletion were subsequently deleted.
This week's sprint will continue to examine last week's dataset of speedy deletions in order to answer further questions about what kinds of rationales administrators are giving for deleting such articles, as well as the extent to which users edit in various spaces (including the tagged article and its talk page) between when the article is created, tagged, and deleted. In addition, this week's sprint will begin an analysis of the Articles for Deletion (AfD) process, examining these variables, with an emphasis on understanding the extent to which users (and specifically new users) participate in AfDs when an article they created is nominated for deletion.
Within the Summer of Research
These sprints are part of an overarching project building on earlier quantitative and qualitative research focusing on the experiences new users. In those studies (on community participation and deletion notifications), it was found that many good-faith users are immediately thrust into high-level processes such as article deletions, image copyright issues, username concerns, conflict of interest allegations, and edit wars. Many of these processes require a working understanding of not just Wikipedia's encyclopedic standards (such as NPOV or reliable sources), but the social norms and procedures around the administration of the encyclopedia project. Furthermore, participation in such spaces requires a technical understanding, as users must edit specific pages in a rigidly-defined manner in order to properly make themselves visible to Wikipedia's administrative corps.
Within Wikipedia Research
Previous research on deletion processes has been done by Dario and Giovanni on herding effects, and Lam et al on decision quality. Those studies were focused on the specific outcomes of AfDs, and various aspects which predicted the result of these decisions. This sprint is instead focused on relative levels of participation, examining this as a case not of decision-making but new user participation.
Broadly, what are the various ways in which new users encounter and interact with Wikipedia's deletion processes, specifically speedy deletions and Articles for Deletion?
- How many new, tagged/nominated for deletion articles are kept versus deleted?
- How long does it take for a newly-created article to be tagged/nominated for deletion, for a user to be warned of this, and for the article to be deleted?
- Are users editing articles and talk pages -- both the of the tagged/nominated article and other articles -- after a deletion tag and subsequent deletion?
- How many users are receiving speedy/AfD deletion notifications (raw and a percentage of those who create new articles)?
- Are users responding to speedy deletion notices, and if so, how?
- Are all these figures different for new users compared to active, highly active, and longstanding users?
Additional questions which can be easily answered using this dataset:
- Who are the most active AfD nominators?
- Who are the users who get their articles nominated for deletion the most?
- Do these users have their articles deleted at a higher rate than others?
- Length of the deleted page
- How many re-create the article
- Filter out spam
- What kinds of articles are created/deleted over time?
- Deletion sorting logs
- Reclosing afds
- Out of all new articles created in a month, what % of those articles which are AfDed within a month
- Out of all new articles created by new users in a month, what % of those articles are AfDed within a month
This sprint will involve two distinct methodologies due to the differences in the Articles for Deletion and speedy deletion processes. The questions regarding AfDs can be analyzed statistically and in mass, while speedy deletions require a more qualitative coding of new user experiences.
Articles for Deletions
This section of the sprint involves a comprehensive analysis of Articles for Deletion discussions, as well as the articles which have been nominated for deletion. A script has been written which has queried the API and downloaded a list of all AfDs (all subpages of Wikipedia:Articles for Deletion, which were then manually cleaned to exclude logging, archive, maintenance, and other non-AfD debates in this set. Another script is currently running which is downloading the contribution history of both the AfD debate and the page which has been nominated for deletion. If the page has been deleted, the deleted contributions are retrieved instead.
This script is being used to generate the following variables:
- Users who participated in the AfD discussion, and the number of times they edited the AfD page
- The user who created the page, and if they edited the AfD page
- The user who nominated the page for deletion (assuming they created the AfD page)
- The date when the page was nominated for deletion (creation of the AfD page)
- The date when the article in question was created
- Whether the article was deleted or not (articles which become redirects, etc. and do not require deletedcontribs histories are marked as kept)
With an additional script (which can be used to analyze speedy deletions below, given timestamps), the following variables will be collected:
- The number of edits by the nominating author between when the article was created, nominated, and deleted
This project will continue using the previous dataset of 2,600 randomly-selected new users who registered between July 2004 and December 2010 (inclusively) who made at least one edit and received at least one edit to their user talk page. Currently, there are around 300 users left to be coded in the sample, 150 each from the second half of 2009 and the first half of 2010. I will seek to finish coding each of these 3,000 users based on the schema previously used in studies of first messages and deletion notifications -- specifically whether a user's first message was a template, personalized, welcome, warning, or deletion notification. (Note: This will enable a re-analysis of the previous weeks' questions, as well as create a comprehensive corpus of new users for further projects.)
In addition, I will seek to finish the first round of coding new user experiences, which involves marking whether or not a user was notified of a speedy deletion on their user talk page. At present, there are 258 new users in the sample of 2100 users from other time periods who received a speedy deletion notice; 125 of these are coded. The following variables are hand-coded in this analysis:
- Time of the user's first speedy deletion notification
- Time when the page was first tagged for speedy deletion (if any)
- Time when the user created the page
- Time when the article was deleted (if at all)
- Whether the article was userfied instead of deleted
Results and discussion
- Less than 10% of article authors participate in their article's AfD [precise figure coming]
- When anonymous users were allowed to create articles, their deleted articles were re-created at a higher rate than registered users.