Research:Autoconfirmed article creation trial
The goal of this study is to run an experiment on English Wikipedia where we examine the effects of disabling article creation for non-autoconfirmed newly registered editors.
- 1 Research Questions
- 2 Hypotheses
- 2.1 RQ-New accounts
- 2.1.1 H1: Number of accounts registered per day will not be affected.
- 2.1.2 H2: Proportion of newly registered accounts with non-zero edits in the first 30 days is reduced.
- 2.1.3 H3: Proportion of accounts reaching autoconfirmed status within the first 30 days since account creation is unchanged.
- 2.1.4 H4: The median time to reach autoconfirmed status within the first 30 days is unchanged.
- 2.1.5 H5: The proportion of surviving new editors who make an edit in their fifth week is unchanged.
- 2.1.6 H6: The diversity of participation done by accounts that reach autoconfirmed status in the first 30 days is unchanged.
- 2.1.7 H7: The average number of edits in the first 30 days since registering is reduced.
- 2.1.8 H8: The number of requests for "confirmed" status is increased.
- 2.2 RQ-Quality Assurance
- 2.2.1 H9: Number of patrol actions will decrease.
- 2.2.2 H10: Number of active patrollers will decrease.
- 2.2.3 H11: The distribution of patrolling activity evens out.
- 2.2.4 H12: The number of Time-Consuming Judgement Calls will decrease.
- 2.2.5 H13: The size of the backlog of articles in the New Page Patrol queue will remain stable.
- 2.2.6 H14: The survival rate of newly created articles by autoconfirmed users will remain stable.
- 2.2.7 H15: The rate of article growth will be reduced.
- 2.2.8 H16: The rate of new submissions at AfC will increase.
- 2.2.9 H17: The backlog of articles in the AfC queue will increase faster than expected.
- 2.2.10 H18: The reasons for deleting articles will remain stable.
- 2.2.11 H19: The reasons for deleting non-article pages will change towards those previously used for deletion of articles created by non-autoconfirmed users.
- 2.3 RQ-Content quality
- 2.1 RQ-New accounts
- 3 Methods
- 4 Results
- 5 References
- 6 See also
We are interested in understanding the effects this change in permissions will have on newly registered accounts, the English Wikipedia's quality assurance processes (in particular New pages patrol), and the quality of Wikipedia's articles. These three main themes are also reflected in the following three research questions:
- RQ-New accounts
- How does requiring autoconfirmed status to create new articles affect newly registered accounts?
- RQ-Quality Assurance
- How does requiring autoconfirmed status affect Wikipedia's quality assurance processes?
- RQ-Content quality
- How does requiring autoconfirmed status affect the quality of Wikipedia's articles?
H1: Number of accounts registered per day will not be affected.
Because the system is designed so that restrictions and limitations are communicated and placed at the time of action, rather than described up front, we should see little to no change in the number of accounts that are registered.
H2: Proportion of newly registered accounts with non-zero edits in the first 30 days is reduced.
Some proportion of new accounts start out by creating a new article, a path that will no longer be available. Instead, they are required to make other types of edits to reach autoconfirmed status if they wish to create articles. This can be regarded as an increased barrier to entry similar to those studied by Drenner et al, who found that requiring more effort from new users resulted in a lower completion rate. We expect some of the newly registered accounts to be "single purpose accounts", ones who are registered only in order to be able to create an article. Because these accounts are unable to create articles, we expect those to no longer make any edits. In line with Drenner et al's work, we expect some of the remaining accounts to not continue their work and leave, which will also contribute to a reduction in the proportion of accounts with non-zero edits.
H3: Proportion of accounts reaching autoconfirmed status within the first 30 days since account creation is unchanged.
We have so far not found any existing research on the proportion of users who reach autoconfirmed status within a specific time period. Studies on Wikipedia contributors that mention autoconfirmed status have looked at for example spam attack vectors and page protection. Two studies of roles in Wikipedia did not distinguish between accounts with and without autoconfirmed status, instead combining both into a "new user" group.
There is presently no requirement to make a certain number of edits/contributions in order to unlock privileges on the English Wikipedia, and the site is not gamified in the way that, for example, Stack Overflow is. We do not know to what extent newly registered contributors seek to unlock these privileges. If a large proportion of newly registered accounts make edits that are subsequently deleted (e.g. because they created a non-encyclopedic article) and that this behaviour leads to their account not reaching autoconfirmed status, it means that those who reach said status can be regarded as productive members of the community. Secondly, H2 proposes that the proportion of accounts with non-zero edits is reduced, suggesting that these newly registered accounts will never reach autoconfirmed status. In combination, there are reasons to hypothesize that the proportion of accounts reaching autoconfirmed status within 30 days of registering will be unchanged.
H4: The median time to reach autoconfirmed status within the first 30 days is unchanged.
Similarly, we hypothesise that the time it will take for a newly registered account to reach autoconfirmed status does not change.
H5: The proportion of surviving new editors who make an edit in their fifth week is unchanged.
When it comes to measuring editor retention, the WMF research staff has at least two definitions: Returning new editor, and Surviving new editor. The difference between the two is what they measure (edit sessions and edits, respectively), and what kind of timespan is defined. Given the other kinds of measures and hypothesis we have the measure of "returning new editor" is most likely covered elsewhere, meaning we are mainly concerned about surviving new editors. Due to the time constraints of the trial, we propose measuring retention by measuring surviving new editors who make at least one edit in the first week, who also make at least one edit in the fifth week (as that week crosses the 30 day threshold after registration). From the previous hypotheses, it follows that the proportion of surviving new editors should not change.
We would like to analyze surviving new editors further by segmenting this population into those who start out by creating a new article, and those who start out by editing existing articles. When looking at historical data it would also be useful to split the former group depending on whether their article survives or not. Taken together, these measures should tell us more about the extent to which newly registered accounts stick around, and what the effect of new article creation is on contributions and retention.
H6: The diversity of participation done by accounts that reach autoconfirmed status in the first 30 days is unchanged.
There are no restrictions on where to make the edits needed to reach autoconfirmed status. A newly registered account might make five copy edits to articles, a couple of edits to create a user page to describe themselves, and some edits to discussion pages (e.g. the Teahouse) in order to reach that limit, while another account might make ten articles edits. Earlier we did argue that the newly registered accounts that reach autoconfirmed status will not change significantly, meaning that the diversity (e.g. number of different pages edited) of work done by these accounts will also not change.
H7: The average number of edits in the first 30 days since registering is reduced.
We hypothesized that the proportion of accounts with non-zero number of edits will be reduced because some of these accounts will be abandoned. This will likely also affect the average number of edits made in the first 30 days since registering. Note that we in this case will also count edits to pages that are subsequently deleted (e.g. non-encyclopedic articles).
H8: The number of requests for "confirmed" status is increased.
The English Wikipedia has a page where users can request "confirmed" status: Requests for permissions/Confirmed A newly registered account might want to go there to request said status in order to not have to meet the requirements for autoconfirmed status. At the same time, someone who is new to Wikipedia might not know that this possibility exists. We hypothesized earlier that some newly registered accounts have a single purpose, and we might expect some proportion of these to seek out other venues than reaching autoconfirmed status, particularly if the venue is perceived as low cost. This can lead to a small but significant increase in the number of requests for "confirmed" status.
H9: Number of patrol actions will decrease.
The number of articles created per day will be reduced since non-autoconfirmed users can no longer create them. This leads to a reduction in the influx of articles into the New Page Patrol queue, and a subsequent reduction in the number of patrol actions.
The ratio of patrol actions to created articles is unchanged. In other words, patrollers notice that there is less work to do and adjust their efforts accordingly.
H10: Number of active patrollers will decrease.
With a lower influx of articles there is less work to do, leading some patrollers to stop patrolling. Given the lower workload, it is unlikely that new patrollers will be recruited.
The ratio of active patrollers to created articles is unchanged. This follows logically from H10.
H11: The distribution of patrolling activity evens out.
Because of the reduction in the number of articles that enters the patrol queue, there is less need for the most active patrollers to do as much "heavy lifting". This in turn means the patrol work is distributed more evenly across the active patrollers.
H12: The number of Time-Consuming Judgement Calls will decrease.
The analysis of New Pages Patrol by the Wikimedia Foundation describes the job of patrolling certain types of newly created articles as "Time-Consuming Judgement Calls" (TCJC). These articles are characterized as follows:
- They're probably notable but badly written
- They're well-written but have questionable notability
- They're a weird mix of both (because life is complicated)
From the WMF's analysis we know that 15% of the articles that are not quickly patrolled are created by non-autoconfirmed users. Because these users are no longer able to create new articles, we can hypothesize that the number of TCJCs will be reduced. We expect the influx of TCJCs created by autoconfirmed users to remain stable.
H13: The size of the backlog of articles in the New Page Patrol queue will remain stable.
It is not certain whether the the reduction in the workload of New Page Patrollers will free up resources to go through and process the backlog of articles that still need patrolling, or if the patrollers will focus on other areas of the encyclopedia. Because we hypothesized that the number of patrol actions and patrollers would decrease in H9 and H10, also relative to the number of created articles, it would follow that the size of the backlog remains stable.
H14: The survival rate of newly created articles by autoconfirmed users will remain stable.
When it comes to article survival, we are obviously only concerned with the survival of articles created by autoconfirmed users. As we earlier hypothesized that these types of users will continue to be productive members of the community, we also hypothesize that the survival rate of their articles will remain stable. We adapt the definition of survival used by Schneider et al, meaning an article survives if it is not deleted during the first 30 days of its life.
H15: The rate of article growth will be reduced.
The previously mentioned statistics on article creation and survival indicated that around 50–100 surviving articles are created each day by non-autoconfirmed users. As before, we have hypothesized that it is unlikely that newly registered accounts will do the work necessary to reach autoconfirmed status, meaning that these articles will no longer be created and reduce the rate of article growth on the English Wikipedia.
We are also interested in understanding more about how articles come to be. To what extent are they created as drafts in the user namespace and then moved into the article namespace? Secondly, are creations and moves done by recently autoconfirmed accounts? The proportion of articles created as moves might increase due to it becoming an alternative approach to article creation. Since we hypothesized that the proportion of accounts reaching autoconfirmed status in 30 days will remain unchanged, we would also hypothesize that creations and moves by recently autoconfirmed accounts would remain stable.
H16: The rate of new submissions at AfC will increase.
We expect some proportion of newly registered accounts to make a failed attempt at creating an article, and upon doing so will choose to use the Article Wizard instead, meaning their proposed article will be added to the review queue at Articles for Creation (AfC). This should result in an increase in the rate of proposed articles reaching AfC.
H17: The backlog of articles in the AfC queue will increase faster than expected.
The process of handling AfC submissions is similar to NPP in that it requires human intervention. Since we proposed that the NPP backlog will decrease due to the lower rate of influx of articles there, we must similarly propose that the backlog of articles at AfC will increase.
H18: The reasons for deleting articles will remain stable.
When it comes to reasons for why articles get deleted, we again refer to our previous hypothesis that newly created accounts that reach autoconfirmed status will be productive members of the community. This means that article creations will remain stable and the reasons for why articles get deleted will not change significantly.
H19: The reasons for deleting non-article pages will change towards those previously used for deletion of articles created by non-autoconfirmed users.
When it comes to reasons for why non-article pages get deleted, we expect to see a change as newly registered accounts can create pages outside the main (article) namespace (e.g. AfC as mentioned above, or their user page or sandbox). Some of this content will likely be candidates for speedy deletion due to copyright infringement (G12) or "Blatant misuse of Wikipedia as a web host" (U5). In other words, we expect to see a change in the reasons for deletions of non-article pages.
H20: The quality of articles entering the NPP queue will increase.
The quality of articles entering the NPP queue will change due to the removal of articles created by non-autoconfirmed users. Those articles that enter the queue that are originally created by non-autoconfirmed users will have gone through the Afc process, meaning they should be of higher quality than earlier, leading to an increase in the average quality of articles in the NPP queue.
H21: The quality of newly created articles after 30 days will be unchanged.
Most of the English Wikipedia's articles are lower quality (Stub- and Start-class) with low readership, and over time the articles that are created are increasingly on niche topics. We therefore expect the quality of newly created articles after 30 days to remain unchanged.
H22: The quality of articles entering the AfC queue will be unchanged.
Little is currently known about the quality of content entering the AfC queue. Research on AfC has instead focused on whether the process leads to greater success in publishing articles or retaining contributors. We have hypothesized that the influx of new drafts into AfC will increase somewhat due to newly registered accounts not being able to create articles in the main namespace, but we do not know to what extent those who choose to do so are those who create higher quality content to begin with. Given this uncertainty about the current state of articles entering AfC as well as the creators of these drafts, we hypothesize that the quality of articles entering AfC will not change.
- Sara Drenner, Shilad Sen, and Loren Terveen. 2008. Crafting the initial user experience to achieve community goals. In Proceedings of RecSys. DOI
- Andrew G. West, Jian Chang, Krishna Venkatasubramanian, Oleg Sokolsky, and Insup Lee, "Link Spamming Wikipedia for Profit", 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference, 152–161. September 2011. DOI
- Benjamin Mako Hill and Aaron Shaw. 2015. Page Protection: Another Missing Dimension of Wikipedia Research. In Proceedings of OpenSym DOI
- Arazy, Ofer, Oded Nov, and Felipe Ortega. "The [Wikipedia] World is Not Flat: on the organizational structure of online production communities." (2014).
- Ofer Arazy, Felipe Ortega, Oded Nov, Lisa Yeo, and Adam Balila. 2015. Functional Roles and Career Paths in Wikipedia. In Proceedings of CSCW DOI
- Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2013. Steering user behavior with badges. In Proceedings of WWW DOI
- Jodi Schneider, Bluma S. Gelley, and Aaron Halfaker. 2014. Accept, decline, postpone: How newcomer productivity is reduced in English Wikipedia by pre-publication review. In Proceedings of OpenSym DOI
- Warncke-Wang, M., Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities", ICWSM 2015. pdf See also: Signpost/Research Newsletter coverage
- Shyong (Tony) K. Lam and John Riedl. 2009. Is Wikipedia growing a longer tail?. In Proceedings of GROUP DOI
- Research:Wikipedia article creation