Research:Autoconfirmed article creation trial

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Light-Bulb icon by Till Teenck.svg

This page documents a proposed research project.
Information may be incomplete and may change before the project starts.

The goal of this study is to run an experiment on English Wikipedia where we examine the effects of disabling article creation for non-autoconfirmed newly registered editors.

Contents

Research Questions[edit]

We are interested in understanding the effects this change in permissions will have on newly registered accounts, the English Wikipedia's quality assurance processes (in particular New pages patrol), and the quality of Wikipedia's articles. These three main themes are also reflected in the following three research questions:

RQ-New accounts
How does requiring autoconfirmed status to create new articles affect newly registered accounts, and what impact will it have long-term on the quality of the wikis overall?
RQ-Quality Assurance
How does requiring autoconfirmed status affect Wikipedia's quality assurance processes?
RQ-Content quality
How does requiring autoconfirmed status affect the quality of Wikipedia's articles?

Hypotheses[edit]

RQ-New accounts[edit]

H1: Number of accounts registered per day will not be affected.[edit]

Because the system is designed so that restrictions and limitations are communicated and placed at the time of action, rather than described up front, we should see little to no change in the number of accounts that are registered.

H2: Proportion of newly registered accounts with non-zero edits in the first 30 days is reduced.[edit]

Some proportion of new accounts start out by creating a new article, a path that will no longer be available. Instead, they are required to make other types of edits to reach autoconfirmed status, something which might affect the number of accounts that have made any kind of edits. We also expect that some of the newly registered accounts before the experiment are so-called "single purpose accounts" where someone registers only to be able to create an article. Because these are no longer able to create an article that is instantly visible on the English Wikipedia, we expect that some proportion of these will give up and leave, leading to a reduction in the number of accounts that have made any edits.

H3: Proportion of accounts reaching autoconfirmed status within the first 30 days since account creation is unchanged.[edit]

We have so far not found any existing research on the proportion of users who reach autoconfirmed status within a specific time period. Studies on Wikipedia contributors that mention autoconfirmed status have looked at for example spam attack vectors[1] and page protection.[2] Two studies of roles in Wikipedia[3][4] did not distinguish between accounts with and without autoconfirmed status, instead combining both into a "new user" group.

There is presently no requirement to make a certain number of edits/contributions in order to unlock privileges on the English Wikipedia, and the site is not gamified in the way that for example Stack Overflow is.[5] We do not know to what extent newly registered contributors seek to unlock these privileges. If a large proportion of newly registered accounts make edits that are subsequently deleted (e.g. because they created a non-encyclopedic article) and that this behaviour leads to their account not reaching autoconfirmed status, it means that those who reach said status can be regarded as productive members of the community. Secondly, H2 proposes that the proportion of accounts with non-zero edits is reduced, suggesting that these newly registered accounts will never reach autoconfirmed status. In combination, there are reasons to hypothesize that the proportion of accounts reaching autoconfirmed status within 30 days of registering will be unchanged.

H4: The median time to reach autoconfirmed status within the first 30 days is unchanged.[edit]

Similarly, we hypothesise that the time it will take for a newly registered account to reach autoconfirmed status does not change.

H5: The proportion of surviving new editors who make an edit in their fifth week is unchanged.[edit]

When it comes to measuring editor retention, the WMF research staff has at least two definitions: Returning new editor, and Surviving new editor. The difference between the two is what they measure (edit sessions and edits, respectively), and what kind of timespan is defined. Given the other kinds of measures and hypothesis we have the measure of "returning new editor" is most likely covered elsewhere, meaning we are mainly concerned about surviving new editors. Due to the time constraints of the trial, we propose measuring retention by measuring surviving new editors who make at least one edit in the first week, who also make at least one edit in the fifth week (as that week crosses the 30 day threshold after registration). From the previous hypotheses, it follows that the proportion of surviving new editors should not change.

H6: The diversity of participation done by accounts that reach autoconfirmed status in the first 30 days is unchanged.[edit]

There are no restrictions on where to make the edits needed to reach autoconfirmed status. A newly registered account might make five copy edits to articles, a couple of edits to create a user page to describe themselves, and some edits to discussion pages (e.g. the Teahouse) in order to reach that limit, while another account might make ten articles edits. Earlier we did argue that the newly registered accounts that reach autoconfirmed status will not change significantly, meaning that the diversity (e.g. number of different pages edited) of work done by these accounts will also not change.

H7: The average number of edits in the first 30 days since registering is reduced.[edit]

We hypothesized that the proportion of accounts with non-zero number of edits will be reduced because some of these accounts will be abandoned. This will likely also affect the average number of edits made in the first 30 days since registering. Note that we in this case will also count edits to pages that are subsequently deleted (e.g. non-encyclopedic articles).

H8: The number of requests for "confirmed" status is increased.[edit]

The English Wikipedia has a page where users can request "confirmed" status: Requests for permissions/Confirmed A newly registered account might want to go there to request said status in order to not have to meet the requirements for autoconfirmed status. At the same time, someone who is new to Wikipedia might not know that this possibility exists. We hypothesized earlier that some newly registered accounts have a single purpose, and we might expect some proportion of these to seek out other venues than reaching autoconfirmed status, particularly if the venue is perceived as low cost. This can lead to a small but significant increase in the number of requests for "confirmed" status.

RQ-Quality Assurance[edit]

H9: The workload of New Page Patrollers is reduced.[edit]

Data gathered and analyzed during the discussion of this trial in 2011 as well as prior to the design phase of the current experiment indicates that accounts that have not reached autoconfirmed status create about 400–500 articles every day.[6][7] On average, 75% of these articles are deleted, and the vast majority of them (87%) are speedily deleted, meaning they contain content that is obviously not fit for the encyclopedia. During the experiment these articles cannot be created, which should thereby reduce the amount of articles entering the New Page Patrol (NPP) queue, and subsequently the workload of the contributors patrolling this queue.

H10: The size of the backlog of articles in the New Page Patrol queue will decrease faster than expected.[edit]

The reduction in the workload of New Page Patrollers will free up resources to go through and process the backlog of articles that still need reviewing. This does to some extent happen already (e.g. because patrollers coordinate and focus on reducing it), but we expect the rate of change to be higher than typical, even without coordinated efforts.

H11: The survival rate of newly created articles by autoconfirmed users will remain stable.[edit]

When it comes to article survival, we are obviously only concerned with the survival of articles created by autoconfirmed users. As we earlier hypothesized that these types of users will continue to be productive members of the community, we also hypothesize that the survival rate of their articles will remain stable.

H12: The rate of article growth will be reduced.[edit]

The previously mentioned statistics on article creation and survival[6][7] indicated that around 50–100 surviving articles are created each day by non-autoconfirmed users. As before, we have hypothesized that it is unlikely that newly registered accounts will do the work necessary to reach autoconfirmed status, meaning that these articles will no longer be created and reduce the rate of article growth on the English Wikipedia.

H13: The rate of new submissions at AfC will increase.[edit]

We expect some proportion of newly registered accounts to make a failed attempt at creating an article, and upon doing so will choose to use the Article Wizard instead, meaning their proposed article will be added to the review queue at Articles for Creation (AfC). This should result in an increase in the rate of proposed articles reaching AfC.

H14: The backlog of articles in the AfC queue will increase faster than expected.[edit]

The process of handling AfC submissions is similar to NPP in that it requires human intervention. Since we proposed that the NPP backlog will decrease due to the lower rate of influx of articles there, we must similarly propose that the backlog of articles at AfC will increase.

H15: The reasons for deleting articles will remain stable.[edit]

When it comes to reasons for why articles get deleted, we again refer to our previous hypothesis that newly created accounts that reach autoconfirmed status will be productive members of the community. This means that article creations will remain stable and the reasons for why articles get deleted will not change significantly.

H16: The reasons for deleting non-article pages will change towards those previously used for deletion of articles created by non-autoconfirmed users.[edit]

When it comes to reasons for why non-article pages get deleted, we expect to see a change as newly registered accounts can create pages outside the main (article) namespace (e.g. AfC as mentioned above, or their user page or sandbox). Some of this content will likely be candidates for speedy deletion due to copyright infringement (G12) or "Blatant misuse of Wikipedia as a web host" (U5). In other words, we expect to see a change in the reasons for deletions of non-article pages.

RQ-Content quality[edit]

H17: The quality of articles entering the NPP queue will increase.[edit]

The quality of articles entering the NPP queue will change due to the removal of articles created by non-autoconfirmed users. Those articles that enter the queue that are originally created by non-autoconfirmed users will have gone through the Afc process, meaning they should be of higher quality than earlier, leading to an increase in the average quality of articles in the NPP queue.

H18: The quality of newly created articles after 30 days will be unchanged.[edit]

Most of the English Wikipedia's articles are lower quality (Stub- and Start-class) with low readership,[8] and over time the articles that are created are increasingly on niche topics.[9] We therefore expect the quality of newly created articles after 30 days to remain unchanged.

H19: The quality of articles entering the AfC queue will be unchanged.[edit]

Little is currently known about the quality of content entering the AfC queue. Research on AfC has instead focused on whether the process leads to greater success in publishing articles or retaining contributors.[10][11] We have hypothesized that the influx of new drafts into AfC will increase somewhat due to newly registered accounts not being able to create articles in the main namespace, but we do not know to what extent those who choose to do so are those who create higher quality content to begin with. Given this uncertainty about the current state of articles entering AfC as well as the creators of these drafts, we hypothesize that the quality of articles entering AfC will not change.

Methods[edit]

todo

Results[edit]

todo

References[edit]

  1. Andrew G. West, Jian Chang, Krishna Venkatasubramanian, Oleg Sokolsky, and Insup Lee, "Link Spamming Wikipedia for Profit", 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference, 152–161. September 2011. DOI
  2. Benjamin Mako Hill and Aaron Shaw. 2015. Page Protection: Another Missing Dimension of Wikipedia Research. In Proceedings of OpenSym DOI
  3. Arazy, Ofer, Oded Nov, and Felipe Ortega. "The [Wikipedia] World is Not Flat: on the organizational structure of online production communities." (2014).
  4. Ofer Arazy, Felipe Ortega, Oded Nov, Lisa Yeo, and Adam Balila. 2015. Functional Roles and Career Paths in Wikipedia. In Proceedings of CSCW DOI
  5. Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2013. Steering user behavior with badges. In Proceedings of WWW DOI
  6. a b User:Scottywong/Article_creation_stats
  7. a b User:MusikAnimal_(WMF)/NPP_analysis
  8. Warncke-Wang, M., Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities", ICWSM 2015. pdf See also: Signpost/Research Newsletter coverage
  9. Shyong (Tony) K. Lam and John Riedl. 2009. Is Wikipedia growing a longer tail?. In Proceedings of GROUP DOI
  10. Jodi Schneider, Bluma S. Gelley, and Aaron Halfaker. 2014. Accept, decline, postpone: How newcomer productivity is reduced in English Wikipedia by pre-publication review. In Proceedings of OpenSym DOI
  11. Research:Wikipedia article creation

See also[edit]