Research talk:Wikipedia article creation

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Work log[edit]

Archive

2013

2014


Discussion[edit]

Pre or post-hoc assessment[edit]

Thank you for including a point about investigating other languages. There is something that makes the current text sound a bit weak to me, but I'm not sure what it is, yet. I've followed the breadcrumbs this way:

  • "make sure that new article creators are not subject to unnecessary deletion" hmm, if a page must be deleted it must be deleted,
  • and indeed "improve the tools for creating a quality new article" is not the point if you should not be creating that article in the first place,
  • so the origin of the problem may be in that "make article creation a rewarding experience": not e.g. for vandals, please. ;-)

The problem you are proposing to measure is that users understand what is (or not) an acceptable new article only after creating it, which is inefficient; ideally, they should (mostly) understand that before creating it, so that we reduce friction for those who do and we attract more of those who could create a good article but don't know. This makes perfect sense. However, if we read those three bullets as being about "pre-publishing article creation education" of users, they may use some rewording, because "entry points" and "tools for creating" contain assumptions on what such education could look like and "unnecessary deletion" misses the point (deletion makes the creation of the article unnecessary, not the deletion itself).
This brings me to a more general point: what we really want to achieve here is an increase of good article creation, the rest are just correlates; the hidden assumption is that we can achieve that by reducing the positive energy wasted in fruitless activities (a waste which generates vicious circles). So what you should measure is how much positive energy we are wasting (in potentially good new articles which get deleted and not worked on any longer; but possibly also in articles which are never created e.g. because you first need to register) and what makes a good faith user more likely to understand and succeed in the creation of an acceptable new article. I understand, of course, that this is not easy, but it should be translatable into concrete metrics/actionable questions (where do acceptable creations currently come from? what previous editing experience do good creators have, if any? what understanding of the project do they have and how did they acquire it? how does an acceptable first article look like? etc.).
Practically speaking, to mention a specific issue, I'm rather sure that en.wiki has a higher "noise" of surely-vandalic or self-promotional new article creation, compared to all the other wikis; if you include all this noise in your measures, you will not see what's really going on on en.wiki and you won't understand where the differences lie in other wikis. --Nemo 10:48, 27 September 2013 (UTC) P.s. Sorry, I didn't even want to write this comment and it ended up a poem.

Just responding to your bullet points... I will give a try at clarifying the language of the draft. Let me put it this way: we know for a fact that many many good faith contributors are simply confused by our process for article creation, and don't understand what makes a good first article. For this work, that's the kind of new editor I want to focus on helping. I'm not interested in transforming deletion criteria to lower the quality bar, or changing the experience for people who know what they're doing. Thanks for the comments Nemo. Steven Walling (WMF) • talk 21:30, 27 September 2013 (UTC)
Yes, I know you aren't, of course. :) Thanks for trying to make sense out of the train of thought above. --Nemo 06:21, 28 September 2013 (UTC)

Some thoughts from the field[edit]

In my experience, the most embarrassing moments are deletions during edit-a-thons, and I have tried to avoid these by explaining how your first article shoulld look as much as possible like an existing article in the same category. This way, the user does not need to "understand what is (or not) an acceptable new article", but is just filling in the blanks, using the other article as a template. Usually in a themed edit-a-thon, you are working on a few categories, and this narrows the scope of your "how-to" speech quite considerably. The new article wizard should be like an interactive search, whereby the user may be presented with articles along the way that meet the criteria. This way you could avoid double biographies and place names based on different spelling used in search, and the end result could be a new redirect. Jane023 (talk) 07:52, 6 November 2013 (UTC)

@Jane023: Thanks for commenting here, I really appreciate it. The idea of having a template to "fill in the blanks" with is a good one. This follows a usability principle that says that the best way to deal with potential missteps by a user is to simply design a system that prevents them from making major mistakes (as opposed to generating warnings after they make a mistake). Steven Walling (WMF) • talk 05:05, 7 November 2013 (UTC)
@Steven (WMF): I'm going to echo Jane's suggestion to use a fill-in-the-blanks type template for article creation. One way to do this would be to use the Manual of Style suggestions at en:WP:MOS and their sub-pages. For example, a new article on a TV show could use a template based on the suggestions at en:MOS:TV. Since the Manual of Style for various articles are fairly detailed, it should be relatively easy to set up the various templates. 64.40.54.179 04:10, 8 November 2013 (UTC)
One issue with subject-based approach to templates for page creation though, is that it scales poorly. This method can be quite specific to large wikis like English, German, French, and Spanish that have WikiProjects, many versions of the Manual of Style, and so on. It doesn't adapt well to smaller Wikipedias, or to edge cases regarding subjects we don't cover heavily. In that way, it can also potentially reinforce our systemic bias, by making it extra easy to create certain article subjects while disadvantaging others. If we can, I would prefer to consider what a "minimum viable article" might look like regardless of subject, and shoot for making sure all newly-created pages (or at least, the majority) meet this basic standard. Steven Walling (WMF) • talk 05:20, 8 November 2013 (UTC)
Categories exist everywhere. I'm a fan of docs, but people mostly learn by imitation as Jane said. --Nemo 08:08, 8 November 2013 (UTC)
@Steven (WMF): Sorry I am so slow to react. I meant to link a small recap of an edit-a-thon I uploaded to Commons here. There I show how translating articles from one language Wikipedia to another entails translating the metadata (categories, references, pictures, etc). You could create an article wizard for translation of cultural heritage topics this way. I was thinking of starting with (short) article translations from non-English to the English wikipedia, alerting the proper country portals on their talkpage each new article added. Also, I have been working on a matchup list (10,000 names and counting) for the BBC Your Paintings project. It has brought me some interesting insights into the world of fine art (at least half of all 1000 painters in the collection of the Rijksmuseum of Amsterdam are also listed in public collections in the UK, and of that 1000, only 24 are women). The en:List of painters in the collection of the Rijksmuseum shows a bunch of redlinks, and for this group, at least half are also represented in public collections in the UK. So for the group of painters in the Your Paintings list, it would be nice to create an article wizard as a test pilot for other dedicated article wizards. Jane023 (talk) 11:21, 12 January 2014 (UTC)
  • One thought from the field: When AfC-ing, the reviewer really has three options. Accept, reject, or pass. It's very tempting to pass.
  • Clear rejections are easy. One can skim through submissions, and get about a rejection done per minute with AFCH.
  • Clear acceptions are rarer, "dimonds in the rough". It probably takes about 3-5 minutes to accept a perfect draft.
  • A great many articles are quite borderline, and the width of this border depends on the reviewer's confidence. They may require a lot of research to determine whether they are notable or not. They may be on generally notable topics, but are in a poor state of style and/or language. The incentive for the reviewer to be bold is low, they do not necessarily want to add an item to their watchlist they want to shepherd for a week, and there's quite a potential for flax from submittees for a "harsh" rejection, or from the wider community for a "lenient" acception. It is very tempting to pass on reviewing these.
I'm not proud of my sheepishness but I do this frequently. It is very clear from AfC's in-tray that a lot of reviewers do this. The 0 and 1 day old lists are full of very-easy-to-reject articles. Those left in the 7+ days old categories are almost all borderline. They've almost certainly been read by at least one reviewer who decided to pass.
A product of this is that writers of terrible articles get a much quicker review than those who write halfway-decent drafts. It is the latter who AfC should really be helping, as they're getting close to getting a good article out. However they end up with week(s) between comments on their drafts, which is not a very supportive environment.
--LukeSurl (talk) 15:25, 12 November 2013 (UTC)
@LukeSurl: Thanks for the comments. I did a bit of AfC reviewing over the weekend, and I can see what you mean. Do you think this case of borderline articles could be improved by notifying reviewers with some knowledge of the subject? We've kicked around the idea of having people sign up in some way to review only articles in a certain topic, like medicine, history, or sports. That way, WikiProject members or others interested could be notified when there's a new article in their areas of expertise, and they could give an article more attention. I've certainly seen that people have a hard time knowing what's notable or not in my favorite area, which is WikiProject Agriculture. Steven Walling (WMF) • talk 07:49, 13 November 2013 (UTC)
AfC drafts aren't categorised, so I'm guessing you're looking at a "refer to expert" button in AFCH? Could be worth experimenting with perhaps.
The simple "solution" to AfC's issues would be more manpower. If there were more volunteers working on the project, the backlog would dissipate, we could give more detailed and useful reviews, and could spend more time improving the articles that get accepted to mainspace. At the moment the high volume of submissions and the small number of reviewers mean that reviewers have to work at maximum possible efficiency, which means mostly just clicking a button on AFCH and moving to the next one. --LukeSurl (talk) 21:51, 13 November 2013 (UTC)
Proto-articles in Draft-space can and should have relevant Wikiproject banners placed on their talk pages as soon as possible. That way Projects will be (passively) notified of the draft's existence. To change the passive notification, the page is simply added to the project's list, to an active; "Attention! Here is a new proto-article within the scope of this project, please help to make it fit for mainspace" the way the "class= Draft" banner parameter is handled needs to be changed. I have tried a number of time in recent weeks to argue that the class=Draft parameter needs to be one of the standard set used by all Wikiprojects instead of being optional as at present. If the Draft parameter is recognised by all WikiProjects then the mechanisms used by projects to sort by class can be used to generate an actual notification. However my attempts at raising this issue have been met with responses ranging from misunderstanding through indiference to active rejection. If such a parameter is implemented it would be an easy matter to combine it with a "subject specialist needed" tag to attract the attention of the relevant editors. Dodger67 (talk) 12:23, 2 March 2014 (UTC)

En's Articles for creation question[edit]

@Steven (WMF): asked the English Wikipedia's Articles for creation team for help in measuring how many articles come through the AFC process.

en:Wikipedia:WikiProject Articles for creation and the pages it links to at the top, especially en:Wikipedia:WikiProject Articles for creation/Submissions and en:Wikipedia:WikiProject Articles for creation/Showcase, may be helpful.

Please be advised that these statistics may or may not include pages that are deleted. Copyright violations and certain en:WP:Biographies of living persons issues such as attack pages are deleted immediately. "Abandoned" submissions are routinely deleted after 6 months under en:WP:Criteria for speedy deletion#G13. These deletions may seriously disrupt statistics based on time-frames more than 6 months ago.

For further help, please ask on en:Wikipedia talk:WikiProject Articles for creation. Davidwr/talk 16:39, 7 November 2013 (UTC)

  • There are about 200 submissions every day give or take. Many articles are counted more than once because they are re-submitted several times. There are about 10-20 accepted submissions per day. Anything that can discourage people from wasting reviewers' time with stuff that just doesn't belong without discouraging them from submitting anything with real possibilities would be good. Things like extremely short submissions, submissions with no references at all, copyright violations, and blatant advertising just waste time. If there are automated ways to either discourage editors from making such submissions or categorizing them for a quick screening by editors who specialize in such things would be helpful.
One issue that really can't be automatically detected is the problem of vanity submitters who are writing about a borderline-maybe-notable-maybe-not subject whose initial draft is too promotional but not blatantly so. The reviewers have to simultaneously decide if the topic is notable and spend time trying to convince the submitter to make the submission more encyclopedic in tone. Davidwr/talk 03:58, 8 November 2013 (UTC)
I firmly believe that the AfC team already knows how best to address its issues of backlogs and quality of reviewing: 1) that it needs some kind of proof of competency level for reviewing , and 2) a new 'draft' namespace that will enable the reviewing process to be more coherent. The research proposed (or already begun) by the Foundation is surplus to those requirements. Kudpung (talk) 05:00, 9 November 2013 (UTC)
Yes, the problem is behavioral, not technical, and needs to be addressed by improving the quality of work at enWP. We know what we need to do--what is unknown is whether we have enough concerned editors with high standards to do it. Unless the policy on open editing changes, we cannot keep people from submitting impossibly unacceptable articles. Although I am certainly prepared to accept the possibility and even desirability of research to elucidate editor behavior and provide information that will be helpful to our work, I am not at all sure the Foundation has the capability to provide research that will lead to improvements in editor or article quality. DGG (talk) 05:50, 10 November 2013 (UTC)
If you think the problem is social, then the community has the power to change social constructs around article creation at any time. I would urge you to go forth and solve whatever social problems you think do not require technical intervention. However, I do not think the problem is social, and has everything to do with the structure and tools afforded to new authors and existing experienced editors. On the one hand, we have thousands of first time authors, who are on the whole well-intentioned. On the other hand, we have hundreds or thousands of people who want to review content and make sure it meets a basic quality standard. With these two groups with the best of intentions, why does the process fall down? Because we are doing these social interactions via software, not face to face, and if the software is broken then no amount of social good will can improve things. I intend to test improvements to the platform for creating and reviewing new articles (and/or new drafts), with this understanding in mind. Steven Walling (WMF) • talk 00:19, 11 November 2013 (UTC)

Categories[edit]

@Halfak (WMF): like I mentioned today, I think the three top-level categories we're interested are... Steven Walling (WMF) • talk 23:18, 8 November 2013 (UTC)

  1. Category:AfC submissions by date
  2. Category:Declined AfC submissions, which is segmented by reason for decline
  3. and Category:Accepted AfC submissions, which is actually a list of Talk pages. This is because accepted AfC submissions have a Talk page template on them that marks them as such.
Thanks Steven Walling (WMF). I'll need to get those categories historically which is going to take some substantial text processing. I'll look into getting some preliminary results based on a sample. --Halfak (WMF) (talk) 23:29, 8 November 2013 (UTC)
Yeah, we just want to do the last month, maybe two, of submissions, then that's fine. Steven Walling (WMF) • talk 00:05, 9 November 2013 (UTC)
Comment about statistics: Although many of the declined submissions to Afc are deleted, the stored copies of the deleted material should still contain the category tags and Afc templates that they had before being deleted, so it should be possible for those who have access to this material to search and compile data about the deleted submissions as well as the still active submissions and the accepted articles, and have a fairly complete picture of AfC. The only pages that may be missed are those which have been copied and pasted manually by impatient or disappointed submitters from Afc into the encyclopedia without being accepted. Sometimes the AfC submissions left over after this has been done end up being history-merged into the mainspace articles, but there are no corresponding Articles for creation project banners on the resulting articles, since they were never accepted. Some of these end up being sent to Afd by the New Pages Patrol and disappear. Anne Delong (talk) 03:00, 13 November 2013 (UTC)
How often would you say the manual copy/paste move scenario happens a month? Less than 10? Dozens? Hundreds? If it is very common, it's something we'd need to account for not just in our metrics, but in building a Draft namespace per the Village Pump proposal. Steven Walling (WMF) • talk 07:45, 13 November 2013 (UTC)

Top 8 Wikipedias[edit]

How were those 8 wikis chosen? They don't follow the order of Top 10 Wikipedias. --Nemo 11:05, 20 January 2014 (UTC)

Nemo Fair question. I'm having issues extracting data from some wikis, so I have been updating this page as data comes in. We may not get all 10, but I think that's acceptable since we're just trying to get a sense for the diversity in top wikis. As for how the set was selected, User:Steven (WMF), will need to comment. --Halfak (WMF) (talk) 20:49, 22 January 2014 (UTC)
They don't follow the order within the top 10, but they are all in the top 10. As Aaron says, we may not get all top ten because of limitations of some of the slave databases we use for research purposes. The point isn't to be comprehensive necessarily but to get a representative sample from our larger projects, so we're not just focusing on enwiki, which has some bizarre characteristics like AFC. Steven Walling (WMF) • talk 01:30, 23 January 2014 (UTC)
Ah, ok, then a line to note you had issues extracting data for some of the selected wikis will suffice (though I hope it.wiki, in s2 like pl.wiki, doesn't present any peculiar resistance to being analysed in the future!): thanks for the clarification. The page already notes that you ordered by number of new articles (manually?) created, so no problem on that side. --Nemo 12:08, 23 January 2014 (UTC)
Thanks for updating the graphs with the missing wikis. --Nemo 15:42, 13 February 2014 (UTC)
Np Nemo. I should have pinged you when it was done. We're working hard on changing our analytics infrastructure so that we won't have so much trouble doing this cross wiki work again. --67.6.71.132 16:03, 13 February 2014 (UTC)

"desirable new users tend to leave the wiki when their work is deleted"[edit]

I don't see any proof of this yet, only a correlation. I've not read the paper cited as source again, but I remember it didn't show this at all. --Nemo 11:09, 20 January 2014 (UTC)

The results are conclusive. The cited paper is my own work. I believe that I made this conclusion clear, but if you have specific concerns, I'll try to address them. --Halfak (WMF) (talk) 20:51, 22 January 2014 (UTC)
I'm rather sure the paper didn't include any manual (or even automated) check of the quality of the deleted pages, but it's been a long time since I read it so I'll need to read it again to comment. Anyway that's offtopic here, would the talk of Research:Newsletter/2012/September be a good venue for you? --Nemo 12:08, 23 January 2014 (UTC)
We manually reviewed the quality of newcomers' edits including the creation of pages that were deleted. I disagree with you. This topic is relevant here. --Halfak (WMF) (talk) 15:53, 23 January 2014 (UTC)

User sandbox example[edit]

The sandbox example (en:Fredrick_Kúmókụn_Adédeji_Haastrup) was actually moved from a userspace draft to AfC, then finally to main. The current text makes it seem like it's a direct move. Superm401 | Talk 01:47, 29 January 2014 (UTC)

I don't see that as relevant to the point being made. Is there a concern that I'm missing? --EpochFail (talk) 17:09, 11 February 2014 (UTC)

Survival rate for more experienced users[edit]

Is "the survival rate of userspace drafts created by slightly more experienced newcomers ("day-week" & "week-month") and experienced editors have a surprising low survival rate -- lower than even direct article creations" referring to German Wikipedia? Superm401 | Talk 04:33, 29 January 2014 (UTC)

Yup. Fixed. --EpochFail (talk) 17:12, 11 February 2014 (UTC)
This probably means that experienced editors create articles as user pages first when they're at risk/controversial, and more often than not they end up being deleted anyway? There might also be some major missing pattern here though (like deleted articles (re)created in user subpages and then (re)deleted or something). --Nemo 16:09, 13 February 2014 (UTC)

"In German Wikipedia, the survival of newcomer articles has been rising steadily since 2008"[edit]

I suppose you mean the anons here?

In Image:Survival_prop.smoother.by_original_namespace_and_tenure.dewiki.svg one rather sees up and downs, despite a quick raise after 2008. The increase for anons can be seen in other wikis, what seems special is the stable survival rate for articles by (0,1] days old users. In all the other languages, except English which can't be compared, it goes down (and is particularly bad in it.wiki and pt.wiki?). However:

  • what's really missing here is some absolute number (though there are some current ones in "What is the success rate of articles created by new editors?"), which as in the case of en.wiki AfC may uncover something hidden by proportions, e.g. a possible decrease of new article creation by newcomers after FlaggedRevs was enabled;
  • the editor classes may make less sense for de.wiki due to FlaggedRevs again, more important being their FlaggedRevs permissions.

Thanks for all these cross-language comparisons, I love them and I think they are the way to go. I really missed seeing some more of them, after Felipe Ortega's thesis which had plenty. --Nemo 16:09, 13 February 2014 (UTC) P.s.: Misaligned y axis/scale for it.wiki here, the 0 is outside the image and the rest is not in proportion.

Hey @Nemo:. Sorry to miss this note. Re. raw page creators, see Media:New_editor_article_page_creators.relative.funnel_props.dewiki.svg. It looks like there was a dip in the proportion of new editors who create pages in 2010, but if anything, there was a corresponding rise in the proportion of new page creators who published articles. Do you know when tagged revisions was enabled and how it might have affected the process of creating new articles?
Re. "Misaligned y axis/scale for it.wiki here", I disagree. I think that, if you are plotting points and lines, you get to trim the y axis and not include the zero marker. The only time where this becomes a problem is when plotting values with a barchart. My school of thought on visualization of data is based on en:Edward Tufte's work on the subject. --Halfak (WMF) (talk) 14:57, 4 March 2014 (UTC)
Wikimedia Research & Data Showcase - February 2014

Hello Halfak (WMF) and Nemo! I want to second Nemo: Thank you very much for these cross-language comparisons, they really are the way to go! :-) I think this will often save us from superficial interpretations of data based only on enWP. Great work! And thanks for the video presentation! Some observations:

  • 1) I think i can explain the german anomaly of 79.5% survival rate for user space drafts by (0,1] days old users. A brand new user on deWP can't "move" pages, only after 4 days/96 hours he gets this feature/"move dropdown" (de:Hilfe:Seite verschieben#Wer kann verschieben?, "autoconfirmed"). The new user can request to move his draft to main article space at de:Wikipedia:Verschiebewünsche, (all in all in October 2013 there were 25 requests by different users and for different reasons, de:Wikipedia:Verschiebewünsche/Archiv-2013-2) or somewhere else (mentorship program etc.) or just wait. So these drafts are probably moved to main space by more experienced users. The high survival rate could be thus explained by a) maybe the more experienced users move only drafts of reasonable quality to main space or b) those moved drafts "escape" the more rigorous and more experienced quality control at the "new article feed". (I actually worry that this is exploited by paid editors.)
  • 2) Re: FlaggedRevs. I think Nemo is generally right to point at FlaggedRevs to explain differences on deWP, FlaggedRevs has huge impact for new users and IPs. But in this case he's not right i think ;-P . FlaggedRevs does not fully apply to created new articles, meaning: The new article is immediately visible ("live"), regardless of user status. And FlaggedRevs doesn't apply to user space. FlaggedRevs/"gesichtete Versionen" was enabled on deWP in 2008. A new user needs a lot of accepted edits (50-150 edits) and at least 30 days to get autoreviewer status ([1]), so this is outside of the new user categories of this analysis. FlaggedRevs on deWP means: If a new user makes an edit to an existing article, this is not immediately visible, but needs to be reviewed. On the other hand, if a new user creates a new article it is immediately "live" (but it appears in the feed/lists of pages needing review). So how does FlaggedRevs have an impact? Does it incentivize to create new articles instead of editing existing articles? Or maybe if an experienced user reviews/"sichtet" the new article, it is less likely to get nominated for deletion/fast track deletion? I think if there is some effect of FlaggedRevs, it will be indirect, like a "yougottadealwithit, reject or accept" mindset.
  • 3) Re: IPs. I think there is a higher rate of experienced regular IP editors on deWP than enWP. Why? a) deWP has less readers (= less drive-by IP editors). b) IP addresses "feel" anonymous. In Germany almost all private internet access is on dynamic IPs (24 hours), in Austria it's also pseudo-static IP addresses. Some may feel IPs have more privacy. IP editors don't get a lot of interaction/communication and some people hate "socializing". Some are banned editors or in self-imposed exile. Some don't want to log in at work. etc. So, i think some very prolific regular IP editors may affect the german IP survival rate (more than on enWP I got carried away, that makes no sense for new enWP articles, obviously).

I'd love to here how polish Wikipedians explain their "anomalies" ;-P --Atlasowa (talk) 16:13, 5 March 2014 (UTC)

Hi Atlasowa! Thanks for your comments on dewiki. It seems like your explanation for why user userspace drafts created by editors with more than a day of experience have a lower survival rate once moved to main than articles that are created directly in the main namespace could apply to enwiki as well. In enwiki, it looks like drafting articles in userspace or AfC clearly increases survival rate of article created by all but the most experienced editors -- and even then the survival rate of drafts is still high.
There's another thing I'm curious to get your thoughts on. As Nemo points out, there's a bit of fluctuation happening in the survival rate of articles for all classes of newcomers. From 2008 - mid 2009, there seems to be a dip in survival followed by a rise in survival rates that continues until at least 2011 (see File:Survival prop.smoother.by original namespace and tenure.dewiki.svg plot). All of the non de/en wikis I looked at had steady declines in the survival rates of newcomer created articles over the observation period. I suspect the reason that enwikis survival rate is rising is because a lot of the articles that might have been deleted were lost to AfC's backlog. However, dewiki doesn't have AfC. I'd be very curious about what strategies dewiki is employing to keep newcomer article survival rates high without incurring a backlog (or are they?). --Halfak (WMF) (talk) 21:00, 5 March 2014 (UTC)
(Note that I just came in to reword my last comment so that it made more sense.) --Halfak (WMF) (talk) 23:37, 5 March 2014 (UTC)
Slowly, please, Halfak (WMF) ;-) I reread the first part of your comment several times and i don't understand. That's not what i explained, or is it? This is really confusing.
I have tried to challenge german users to come up with an explanation of the anomaly, didn't really work. That was a discussion about how an editor left Wikipedia because of deletions, and there is a widespread contempt for all those other terrible "deletionists" (while "...BTW why don't we delete those porn star and soccer stubs..."). Pretty bad discussion. Anyway, that may be "the" cause that you are looking for: The scarring, epic, deletionist-inclusionist conflict of ~2008(?). There were a lot of RfCs and discussions. There is a 1 hour waiting time for deletion requests (obvious speedies excepted) since 2010. And the german deletion stats confirm that trends changed:
The density of time between first and last edits is plotted for deleted articles created between 2008 and 2013 in the English Wikipedia.
Article lifetime. The density of time between first and last edits is plotted for deleted articles created between 2008 and 2013 in the English Wikipedia.
For english Article lifetime/deletion time you have provided this Figure, showing "a strong cluster between one minute and one hour. While some deletions take more than one year, 87.3% of deletions occurred within one month." Did you compare this to german stats? It may be very different: A regular deWP AfD starts earliest 1 hour after article creation and it takes 1 week minimum until admin decision (up until a month, in summer?). Speedies are very often turned into regular AfD. It's not uncommon to see new articles tagged both for quality control by a bot and for deletion by a user. The AfD has nicknames, 1) "Löschhölle"/"deletion hell" because of the not so friendly/inviting atmosphere, formalism, unintelligible acronyms and brutally direct comments ("delete. not relevant.") and 2) "Turbo Qualitätssicherung"/"quality control on steroids" - while regular QS is often apathetic, some users try to "save" many AfD during the 7 day discussion. HTH --Atlasowa (talk) 17:54, 11 March 2014 (UTC)
Hey Atlasowa. Sorry for the confusion. I was trying to be a bit too explicit about what was actually measured. I was trying to say was this. Your explanation for the low survival rate of drafts created by "week-" and more experienced editors could be applied to English Wikipedia, but we don't see the same pattern there.
On comparing the time to deletion in dewiki, I think that is a good idea. I've started the work to look at time to deletion on a per-wiki basis. I'll ping here once I have the data ready. --EpochFail (talk) 14:30, 13 March 2014 (UTC)
Woops. ^ That's me posting from my personal account. --Halfak (WMF) (talk) 16:19, 13 March 2014 (UTC)

┌─────────────────────────────────┘

OK! I've got some new figures on time to deletion for En and De. @Atlasowa and Nemo bis: see below.

The density of article lifetime by original namespace is plotted for articles in English Wikipedia by the namespace from which they originated.
Article lifetime (enwiki). The density of article lifetime by original namespace is plotted for articles in English Wikipedia by the namespace from which they originated.
The density of article lifetime by original namespace is plotted for articles in English Wikipedia by the namespace from which they originated.
Article lifetime (dewiki). The density of article lifetime by original namespace is plotted for articles in English Wikipedia by the namespace from which they originated.

In enwiki, drafts don't get deleted right away. In dewiki, even userspace drafts are more likely to be deleted quickly. --Halfak (WMF) (talk) 18:00, 17 March 2014 (UTC)

Wow, that is quite impressive, Halfak (WMF)! A bit shocking actually. What do we see there? For deWP:
  • There is a first "article death" peak after seconds, probably recent changes/ new pages patroller and speedy deletion by admin of obvious non-articles.
  • The second "article death" peak is after ~1 minute, maybe nominated speedy deletion by non-admin newpagepatroller and swift speedy deletion by admin.
It gets really interesting with the third "article death" peak.
  • At ~1 hour there is no peak but rather a tide, not surprising but should be expected, because that would the timeframe for regular AfD nomination (and 7 days discussion)
  • The third "article death" peak for "direct to main space" is after ~1 year! Hmm? Hard to explain. Those articles must have passed the creation process and maybe the semi-automatic quality control funnel (they may have been wikified, categorized) and were nominated for deletion much later? Hm.
  • The third "article death" peak for "user space created articles" is <1 day. Interesting, how different this is from "direct to main space". Looks like a timeframe that could be explained by the semi-automatic quality control funnel and gesichtete Versionen, with the deletion nomination by users from wikiprojects and users that do Sichtungen. Just a guess, though.
This opens up even more questions, hehe ;-)
Durchschnittlicher Sichtungsrückstand in Tagen 2009-2013 (Grafiken zu den Versionsmarkierungsstatistiken (dewiki) dapete/toolserver)
If you look at gesichtete Versionen patrol backlogs in deWP (Sichtungsrückstand), there is a big difference of backlogs for IP edits (gesichtet after some hours on average) and for registered users (gesichtet after average of days, high fluctation) Have a look at de:Benutzer:Atlasowa/gesichtete_Versionen#Grafiken_Galerie. New pages show up at "Seiten, die noch nie gesichtet wurden", so users that do Sichtungen would either accept the article (sichten) or ... leave it in the backlog? Or nominate for deletion (i doubt that)? Or tag it (tagging is not popular)? Or move it to the user space of the article creator? I don't know. --Atlasowa (talk) 09:37, 19 March 2014 (UTC)

Article creation process / funnel in deWP[edit]

Let me try to describe article creation processes in deWP:

  • Let's take person A that wants to write a deWP article and registers an account as newbie A. He writes his article in his user space (user:newbieA/draft). He realizes at some point that his article is not yet "in Wikipedia" (for example via google or thanks to de:Hilfe:Artikelentwurf, created in April 2013) and that it needs to be moved into article namespace. But newbie A can't move his article there himself because he is less than 4 days old (not "autoconfirmed"). So, what will happen? Another experienced ("autoconfirmed") user X could move the draft to article space, if X is asked (through the mentor program?), or if newbie A makes a "request to move" (de:Wikipedia:Verschiebewünsche may be hard to find for newbies though), or if newbie A asks at some high-traffic page (like de:WP:Fragen von Neulingen/ questions by newbies, or de:WP:Fragen zur Wikipedia, a pseudo-village pump for questions) etc, and if the draft is not crap user X might move it to article space.
  • Person B registers as newbie B, writes his draft in user space and can't move it into main space. Newbie B finds the "move"-link 3 days later and moves his user space draft into article space.
  • Person C registers as newbie C, writes his draft in user space and, again, can't move it into main space. Newbie c just leaves his article helplessly in user space (and leaves Wikipedia disgruntled).
  • Person D registers as newbie D, writes his draft in user space and, again, can't move it into main space. Newbie C therefor later copies his userspace draft and pastes it into main space as a new article.
  • Person E registers as newbie E and creates his article direct to main space/ article space.
  • Reader F decides to write an article and creates this article as IP editor directly in article space.
  • Reader G clicks on a redlink, de:whatever was redlinked, is confronted with an edit window (to his surprise) and writes "Badbadbad. Did I just break Wikipedia?" into it and saves his, ahem, "article".
  • Experienced IP editor H writes another article and creates it directly in article space.
So, we now have new articles in main space by newbie A, newbie B, newbie D, newbie E, IP-editor F, IP-editor G and IP-editor H. (We "lost" the forgotten draft of newbie C?) These new articles all show up at de:Spezial:Neue_Seiten. Except for the article by newbie A that could be reviewed already by experienced user X, these articles are all "ungesichtet"/not reviewed by FlaggedRevs, so they also show up at de:Spezial:Ungesichtete_Seiten, articles that have never been reviewed.
So, what happens to the new articles?
  • Really crappy new "articles" like the redlink vandalism get a speedy deletion, de:Wikipedia:Schnelllöschantrag.
  • Uncategorized new articles are pushed to Quality Control by bot de:Benutzer:MerlBot/AutoQS: "Please wikify, categorize etc."
  • New categorized articles are pushed to wikiprojects/portals by bot.
  • Articles tagged for quality control may also get tagged for deletion/AfD.
  • Of a mean ~1.200 new "articles" per day, this leaves ~350-400 actual new articles.
One big questions is: How many new articles are moved back to user space? I don't know. According to de:Benutzer:FzBot/Statistiken "moving to user space" has increased in AfD outcomes over the years. --Atlasowa (talk) 17:54, 11 March 2014 (UTC)
For enwiki and dewiki, I consider moves that remove an article from the main namespace to be "deletions". The plots I just posted respect this assumption. For other wikis, I couldn't track moves, so the page actually has to be deleted in order to be considered a "deletion". --Halfak (WMF) (talk) 18:02, 17 March 2014 (UTC)

Article redirects?[edit]

I have yet another question: How did you count redirects? Article death or survival? If someone creates a new 2-sentences-article about "Institution X branch in Hamburg" it may happen to become a redirect to existing article "Institution X" and the 2 new sentences added to this existing article. Same for article "Product Y of company Z" merged to "company Z" etc. Article death or survival? And how about new "redirect pages" created by synonyms-OCD-users ;-) ? New article or not? --Atlasowa (talk) 09:39, 19 March 2014 (UTC)

Hi Atlasowa. Sorry I missed this question. I didn't discriminate between redirects in this study. Sadly, it's nearly impossible to get a page's redirect status historically. I'd have to process all of the text of every revision in Wikipedia (which I can get from the XML dumps) and every deleted revision (which I'd have to get from the API). For future work, it might be a good idea to set up an mw:Extension:EventLogging schema to start capturing edits that convert pages to and from redirect status. We've got a new Schemas to help capture some of the data that was complex/impossible to extract during this analysis so that it will (homefully) be easier in the future (see Schema:PageCreation, Schema:PageMove, Schema:PageDeletion and Schema:PageRestoration). I'll bring this up the potential for a redirect Schema with the other analysts so that we can do better in future analyses. --Halfak (WMF) (talk) 20:32, 9 April 2014 (UTC)

Cutoff date for comparison?[edit]

Surviving articles per new page creator.by month.enwiki.svg

Hi Halfak (WMF), I was just directed to this study. I'm glad to see WMF has looked into these issues, and wish the results were better known! I'm curious about the choice of a date in 2011 as the cutoff date, for comparing "pre- and post-AFC" phenomena. I realize there have been several milestones relating to AFC and the draft feature; but AFC has been around in some form or another since 2004, and the Draft feature was announced in late 2013. Can you comment on your choice of a date in 2011 to make this distinction -- and perhaps more generally on what you see as the significant milestones?

(I see from the file revision history that there was some adjustment to the date already, by EpochFail; while I'm curious about that, it seems like a smaller detail than the 2011 vs. 2013 question.) -Pete F (talk) 18:43, 23 April 2015 (UTC)

To clarify my question -- you state "Figure #Surviving article per new page creator plots this proportion with loess fits for before and after newcomers were directed toward AfC." And from the accompanying image, I do see spikes correlating to that. But what I'm really curious about is this -- in what specific way were "newcomers directed" to AFC? I know how it works in the post Draft: space world, but I don't know what change was made in 2011 to direct newbies that way. If you have links to discussions around the changes, that would be especially helpful. -Pete F (talk) 18:51, 23 April 2015 (UTC)
Wizard, before Draft: namespace (?)
Trying to answer my own question here. Trying to leave a trail of breadcrumbs, which you needn't follow -- Halfak (WMF) my main question is at the end below:
Question: I am guessing that in 2011, a change was made so that when a user went to create an article, a new Wizard gave them a prominent option to use AfC, more or less like the image at right. That's what you referred to, "newcomers were directed toward AFC." Then, in 2013, the Draft: namespace was introduced, and the Wizard was adjusted to first point toward Draft:, and then direct users from there toward AFC. Is that an accurate history of how this went, and why you chose 2011 as the significant date? Thanks for any insights -- I'm trying to make sense of a very complicated discussion history, and its opacity makes it tough to know how to interpret this research! I think I've finally got what I need, but your confirmation (or clarification) would be much appreciated. -Pete F (talk) 20:36, 23 April 2015 (UTC)
IIRC, the cutoff date is simply the point at which users were directed away from main namespace action=edit. The main take away (again IIRC) is that editing the main namespace directly, both for new and existing pages, is more productive and constructive. --Nemo 00:25, 24 April 2015 (UTC)

┌─────────────────────────────────┘
Indeed. I based that dashed line off of a sudden change in the proportion of newcomer articles that were created directly in the main namespace. I've looked at edits to the article wizard to try to find what caused the change in behavior, but I haven't found an edit that lines up with the change. It looks like the start of the "AfC" period was actually dominated by draft creations in the user space, so it's not totally fair to call it "AfC", but AFC does take over by October 2011. See my quick and dirty query of datasets that shows the non_main article creations begin to spike in mid 2011.

> SELECT LEFT(created, 6), SUM(original_namespace != 0) AS non_main, SUM(original_namespace = 2) AS user, SUM(original_namespace = 5) AS afc, COUNT(*) AS creations FROM nov13_article_page INNER JOIN nov13_creation USING (page_id) INNER JOIN nov13_user_stats USING (user_id) WHERE DATEDIFF(created, user_registration) <= 30 AND account_creation_action = "create" AND created BETWEEN "2011" AND "2012" GROUP BY 1;
+------------------+----------+------+------+-----------+
| LEFT(created, 6) | non_main | user | afc  | creations |
+------------------+----------+------+------+-----------+
| 201101           |      591 |  547 |   40 |     18618 |
| 201102           |      718 |  652 |   58 |     18308 |
| 201103           |      648 |  579 |   57 |     19378 |
| 201104           |      600 |  524 |   67 |     17620 |
| 201105           |      595 |  536 |   51 |     17063 |
| 201106           |      793 |  732 |   56 |     16222 |
| 201107           |      931 |  873 |   51 |     15886 |
| 201108           |     1127 | 1067 |   51 |     15688 |
| 201109           |      922 |  865 |   42 |     15533 |
| 201110           |      470 |  294 |  168 |     13991 |
| 201111           |      407 |  146 |  242 |     12721 |
| 201112           |      354 |  218 |  129 |     12153 |
+------------------+----------+------+------+-----------+
12 rows in set (1 min 32.77 sec)

Note that AfC (Wikipedia_talk namespace) seems to take over non_main draft creations in Oct. 2011. I think that might be due to this edit that removed the user-space draft option from the article wizard. However, I don't know what change initially drove new article creators out of the main namespace in the first place. --Halfak (WMF) (talk) 00:47, 24 April 2015 (UTC)

Thank you both -- especially for that diff, Halfak (WMF). I'm having trouble following the numbers and what they mean, but the diff does help me see where deliberations were taking place, and I had not found that page -- very helpful. I guess I have a followup question -- while I understand you were approaching this from an outsider's perspective similar to mine, do you know who at WMF would be most qualified to describe the arc of software development around AfC, the Draft space, page curation? What I'm trying to do, basically, is put together a timeline -- this change in the user experience occurred on this date, prompted by this decision-making process. I'm getting closer, but it's hard to have confidence that I've got it right -- and I feel like I'm reproducing work that somebody (at least I would hope) has already done. -Pete F (talk) 02:15, 24 April 2015 (UTC)
There's a bit of background about the Draft namespace in these quarterly review minutes (see also talk page). Regards, Tbayer (WMF) (talk) 06:00, 30 April 2015 (UTC)
Thanks Tbayer (WMF)! -Pete F (talk) 00:04, 2 May 2015 (UTC)