Research talk:Automated classification of article importance/Work log/2017-04-21

From Meta, a Wikimedia project coordination wiki

Friday, April 21, 2017[edit]

Today I will continue the work on identifying active WikiProjects, using the datasets I gathered yesterday as well as an additional dataset on article activity. I will also start developing a pipeline for handling global data.

WikiProjects[edit]

I wrote a Python script to gather edit statistics for all importance-rated articles in WikiProjects where we have data on both their article categories as well as edits to their WikiProject pages. There are 1,043 such projects, of which less than 50 appear to have a significant amount of activity, but that number might change when we look at article statistics as well. Another thing I noticed when I was looking at the statistics is that a lot of projects have a large number of articles without importance assessments. That is something we should be able to help with, as we know already that we can do well on the WikiProject level.