Grants:IdeaLab/Fast and slow new article review

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Revision Scoring as a Service logo.svg
Fast and slow new article review
Concerns about the introduction of spam into Wikipedia has lead Wikipedians towards implementing high speed new article review/curation processes. The speed at which editors tag articles for deletion via these processes is great for dealing with spam, but it might also be faster than good-faith new article creators can build their articles. We could build a machine learning classifier that is tuned to detect spammy article drafts. This would allow the new pages queue to be split into a high-speed spammy article review, and a low-speed article review that allows creators time to make a better first draft.
Hex icon with lightning white.svg
idea creatorEpochFail
Hex icon with fire white.svg
Hex icon with hand white.svg
this project needs...
Hex icon with hand black.svg
Hex icon with hexes black.svg
Hex icon with circles black.svg
Hex icon with bee black.svg
community organizer
created on20:18, 29 February 2016 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]

The average time between edits when an editor is "in-session" is 7 minutes. But the median time to deletion tagging a new article is 2 minutes. The most common reason for deletion tagging (in English Wikipedia) is A7: "No indication of importance". It seems likely, that newcomer article creators are *adding* a credible assertion of importance in a second edit that is blocked by a deletion tagging edit conflict. Research suggests that this early, negative feedback is one of the leading predictors that a newcomer will stop editing Wikipedia entirely.

What is your solution?[edit]

The reason we need to review new page creation so quickly is to get rid of spam and egregious vandalism. Most other types of potentially undesirable new articles would not cause damage were they to be left alone for a little while -- enough time to allow the creator to finish their initial sequence of edits. We can split the feed of newly created pages using a machine learning classifier so that we can have two review backlogs: one for fast review of spam and egregious vandalism and another for slower review of all other new articles.

The ORES service would be a great place to build and host such a model and the Research:Revision scoring as a service project team would be interested in providing support & advisement.


Get Involved[edit]

About the idea creator[edit]

I'm a Senior Research Scientist at the Wikimedia Foundation. I've personally performed research studies around the retention of newcomers, the quality of new article creations, and the design of machine learning tools for supporting curation practices.


  • I have made multiple attempts to get this problem fixed over the years this was my 24 hour pause suggestion in 2009. I think that extra time for goodfaith new articles is a good idea, as is the idea of creating a safe space for articles to be in some sort of draft status, not very visible outside Wikipedia but with the benefits of mainspace collaboration. None of the attempts to create a safe and collaborative space outside of mainspace have worked, all have failed for various reasons including because the gnomes who help improve articles either search in mainspace or look at mainspace categories that aren't allowed outside it. I still think we could create a safe space for article creation in mainspace on wikipedia, but it would need all non patrolled new articles to be NoIndex, perhaps even only visible to logged in editors. That would take the incentive away from spammers, whilst still including new articles in the collaborative editing of mainspace. Though to really make a difference we also need to tackle the bigger problem of edit conflicts. WereSpielChequers (talk) 20:45, 2 March 2016 (UTC)
  • Volunteer Willing to do anything that can be done as a volunteer to review and cover up any gaps:) Yulun5566 (talk) 02:05, 15 March 2016 (UTC)
  • As one of the major proponents for getting the en.Wiki Page Curation and its New Page Feed developed by the WMF, I am still very concerned about its use by very inexperienced users. I have made several attempts to address this but they were rejected by the involved WMF staff (all of whom have now left, leaving an important gap in the Foundation's institutional memory). I am curently about to launch an RfC on en.Wiki that will address some of the points, and I will be at Wikimania 2016 and hoping to facilitate a cross-Wiki discussion on the topic.Kudpung (talk)
  • If you do this, I'll help you judge how well it classifie, but I think it will only be vey rough; it will ned to be supplemented by manual checking of each article, but it might possibly still help as a preliminary step. DGG (talk) 02:22, 19 March 2016 (UTC)
  • Volunteer Anywhere you would need me. TJH2018 (talk) 18:19, 20 March 2016 (UTC)


  • endorse yes we need to automate spam response and curate / coach new article creation, without templates. Slowking4 (talk) 15:39, 1 March 2016 (UTC)
  • endorse I love the idea that we can build in structures to reduce trigger-finger deletionism. I like the idea of building queues: these look like spam, these people need to be welcomed as new editors, etc., though I wonder if one editor's spam (i.e. non-notable person) is another editor's valuable contribution (i.e. a new editor working in good faith, or a subject with few references, or a gray area). But I think at least in en.wp this is a huge barrier for new editors. -- phoebe | talk 20:45, 1 March 2016 (UTC)
  • Isn't this remedied by Article Incubator? Why not send all new articles to the incubation first? SSneg (talk) 12:21, 2 March 2016 (UTC)
    • Incubator is a defunct process, we now have en:Wikipedia:Drafts but I wouldn't force goodfaith newbies into it. Maybe if we had an accurate definition of spam we could send possible spammers to drafts. WereSpielChequers (talk) 19:45, 2 March 2016 (UTC)
    • AfC (and incubator) is an epoch fail. it is a Prod for all new articles. we council newbies to stay far away, put it in sandbox, and we will review; but this is a workaround. cultural fail, need to train reviewers to collaborate rather than reject with templates. there is some movement, but miniscule fraction. Slowking4 (talk) 16:11, 3 March 2016 (UTC)
  • endorse Please also see my proposal, Grants:IdeaLab/Bot to detect and tag advocacy editing, which has similar aims but was aimed more at individual edits, not whole articles. I wonder if one bot could do both things? Jytdog (talk) 22:20, 14 March 2016 (UTC)
  • endorse great idea of filtering out the pages so that in the end each page can be created accurately and with enough coverage Yulun5566 (talk) 02:04, 15 March 2016 (UTC)
  • endorse Make it a filter within NewPagesFeed, and let it speedy-tag the most obvious garbage; an admin still has to agree before deleting. Swpb (talk) 22:19, 17 March 2016 (UTC)
  • endorse Smallbones (talk) 20:24, 22 March 2016 (UTC)
  • Comment: This is all well and good, but this granr will be just throwing monwy after another palliative. PLease see my long comment on the talk page. Kudpung (talk) 09:15, 19 March 2016 (UTC)

Expand your idea[edit]

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into an Individual Engagement Grant
Expand into a Project and Event Grant