Research talk:Autoconfirmed article creation trial/Work log/2017-08-21

From Meta, a Wikimedia project coordination wiki

Monday, August 21, 2017[edit]

Today I'll be working on instrumenting the page creation log data, documenting the various historic data sources and datasets we've gathered, and continue our data gathering process.

Page creation log data[edit]

Page creation events are now logged, ref T150369. We want to use this data to gather datasets and visualize it in order to better understand page creation. It will also be useful for H9, H10, H14, and H15.

Our main concern is the main namespace, and it appears that the table only logs creations. We will therefore have to augment this data with data on page moves into the main namespace in order to better track "article publications". Since we're already doing that, it's not a problem.

The table allows us to do some useful segmentation:

  • page_is_redirect column allows us to filter out all articles that are created as a redirect (e.g. when a page is moved)
  • performer_user_groups allows us to identify users who have "autoconfirmed" or "autopatrolled" status, and conversely identify users without those rights.
  • performer_user_is_bot allows us to remove any bot-created articles
  • rev_minor_edit flags the edit as a minor edit (this flag appears to never be in use for page creations, though)

In our case, I think we are mainly interested in the following metrics (all regarding pages created in the main namespace (page_namespace=0), where the user is not flagged as a bot (performer_user_is_bot=0)):

  • Total number of pages created
  • Number of pages created that are not redirects
  • Number of non-redirect pages created where the user has autoconfirmed status
  • Number of non-redirect pages created where the user has autopatrolled status
  • Number of non-redirect pages created where the user is neither autoconfirmed nor autopatrolled

I'll write a SQL query to get that.

 SELECT DATE(rev_timestamp) AS log_date,
 COUNT(*) AS num_created_pages,
 SUM(IF(page_is_redirect = 0, 1, 0)) AS num_non_redirects,
 SUM(IF(page_is_redirect = 0 AND LOCATE('autoconfirmed', performer_user_groups) > 0, 1, 0)) AS num_autoconfirmed,
 SUM(IF(page_is_redirect = 0 AND LOCATE('autopatrol', performer_user_groups) > 0, 1, 0)) AS num_autopatrolled,
 SUM(IF(page_is_redirect = 0 AND LOCATE('autoconfirmed', performer_user_groups) = 0, 1, 0)) AS num_non_autoconfirmed
 FROM mediawiki_page_create_2
 WHERE `database`='enwiki'
 AND page_namespace = 0
 AND performer_user_is_bot = 0
 GROUP BY log_date;