Grants:IEG/Automated Notability Detection: Difference between revisions

There is a large volume of articles added to Wikipedia, in both the Main namespace and the new Draft namespace. All these articles require some form of review, and one of the major elements of that review is whether or not the article is about a notable topic. Notability is difficult to determine without a significant amount of familiarity with the notability guidelines, and sometimes domain expertise. In general, there are not enough reviewers, particularly in the Draft namespace (many of whose articles were formerly in Articles for Creation), and reviewing takes them too long and is a very laborious process.

What is your solution?

We propose to use machine learning to automatically determine whether articles are notable or not. Our determination (perhaps as a binary yes/no, perhaps as a threshold) will be presented to human reviewers to help them make better and faster decisions.

Project goals

We hope that making reviewing easier will both attract more reviewers and decrease the workload and stress of the current reviewers.

Project plan

Activities

Create a sample of articles and/or drafts and present them to Wikipedians (found via WikiProjects relevant to the articles' content) to determine if they are notable or not.*Identify features that can model notability and build a classifier using the features and the articles hand-coded in the previous step.
Run the classifier on a held-out training set and evaluate results.
Eventually, we would like to see this integrated into Wikipedia as a tool for reviewers.

Budget

Graduate student salary for BG for the duration of the research: 20 USD/hour for 20 hours a week * 12 weeks: 4,800 USD
Travel to CSCW 2015 in March to present the results to the Wiki academic community and get input and advice: 1,500 USD

Community engagement

We plan to ask members of various WikiProjects to help in constructing a training set. These volunteers would read a number of articles and mark them as notable or not notable. In this way, we can create a gold standard data set based on expert judgement. We will also continuously solicit help and advice from the community as to what they would like to see in such a tool; we would love if members of the community would suggest possible features for the classifier. These would be based on their experience as to what aspects of an article they look at when determining notability.

Sustainability

We hope that by the time the grant period ends, we will have a working, robust classifier that can be easily implemented in code. We hope to find volunteers to implement the classifier as an on-Wiki tool or extension to help reviewers make better decisions. We will provide detailed documentation of how the system works and open-source the code so that it can be improved by anyone who wishes. We will also solicit help from the community in continuing to label articles as notable or not so we can keep expanding our training set to make more accurate predictions. (This is the paradigm used by Cluebot NG; see here.)

Measures of success

We will be using Machine Learning, so we can measure our success by the precision and recall of our classifier. Since this is a hard problem, we will consider around 75% accuracy to be good. This is more than high enough to use as a first step in the review process to help reviewers with their work.

Get involved

Participants

Bluma Gelley : I am a PhD student at New York University and have done a significant amount of research on Wikipedia. In particular, my research looked at some of the problems with the deletion/New Page Patrol process, and with the Articles for Creation/Draft process. Both these processes could be improved by making automated notability detection available to those reviewing/vetting articles.

I have previously published a paper (available here) about automatically detecting articles for deletion. In that paper, I attempted to detect notability, but I suspect that I was successful only in predicting deletion. I would like to expand on that work using a better training set (see above) and better features. I already have the framework for the classifier, so part of the work is done already.

Jodi Schneider has done research on the deletion process and problems with AfC/Draft process. She brings the perspective of qualitative research, which enhances the proposed quantitative work.

Aaron Halfaker is a Research Scientist at the Wikimedia Foundation and has done extensive research on problems related to deletion, AfC, and newcomer socialization in general.

Community Notification

Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

notification to AfC

Endorsements

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).

Community member: add your name and rationale here.

As a long-term participant and closer at AfD, and a frequent participant at AfC on ENWIKI, I have long experience the grave difficulties involved both working to our notability guidelines and explaining them to new editors. This effort holds promise for moving us not only in the direction of assisting reviewers and new editors at a process I currently describe as a "whisper chipper for new editors." I do not mean to suggest that I expect an immediate panacea here, but this is the right first step towards what I hope will be a longer series of efforts leveraging what technologies we can leverage into solving some of Wikipedia's biggest user engagement hurdles. I strongly endorse this effort. --Joe Decker (talk) 15:35, 16 September 2014 (UTC)
I am supportive of this idea, as I would like to start seeing more formal ways to judge notability on our site outside of individual ideas on notability. Kevin Rutherford (talk) 02:04, 17 September 2014 (UTC)
Support the idea. Additionally, if this tool could also pick out unsourced articles (which often goes hand-in-hand with being non-notable), and integrated into the helper script, this could massively improve the reviewing workflow. Mdann52 (talk) 07:38, 17 September 2014 (UTC)

@@ Line 110: / Line 110: @@
 * As a long-term participant and closer at AfD, and a frequent participant at AfC on ENWIKI, I have long experience the grave difficulties involved both working to our notability guidelines and explaining them to new editors. This effort holds promise for moving us not only in the direction of assisting reviewers and new editors at a process I currently describe as a "whisper chipper for new editors."  I do not mean to suggest that I expect an immediate panacea here, but this is the right first step towards what I hope will be a longer series of efforts leveraging what technologies we can leverage into solving some of Wikipedia's biggest user engagement hurdles. I strongly endorse this effort. --[[User:Joe Decker|Joe Decker]] ([[User talk:Joe Decker|talk]]) 15:35, 16 September 2014 (UTC)
 *I am supportive of this idea, as I would like to start seeing more formal ways to judge notability on our site outside of individual ideas on notability. [[User:Ktr101|Kevin Rutherford]] ([[User talk:Ktr101|talk]]) 02:04, 17 September 2014 (UTC)
+*Support the idea. Additionally, if this tool could also pick out unsourced articles (which often goes hand-in-hand with being non-notable), and integrated into the helper script, this could massively improve the reviewing workflow. [[User:Mdann52|Mdann52]] ([[User talk:Mdann52|talk]]) 07:38, 17 September 2014 (UTC)