Research talk:Automated classification of draft quality/Work log/2016-09-28

From Meta, a Wikimedia project coordination wiki

Wednesday, September 28, 2016[edit]

OK looks like my last queries have failed, so I'm going to run them directly on labsDB in PAWS.

https://quarry.wmflabs.org/query/12795 -- Surviving articles https://quarry.wmflabs.org/query/12796 -- Deleted articles

--EpochFail (talk) 17:26, 28 September 2016 (UTC)[reply]


Here's what I've got:

SELECT
  page_title,
  rev_id AS creation_rev_id,
  rev_timestamp AS creation_timestamp,
  FALSE AS archived,
  "OK" AS draft_quality
FROM revision 
INNER JOIN page ON
  rev_page = page_id 
WHERE
  rev_timestamp BETWEEN "20150827" AND "20160827" AND
  rev_parent_id = 0 AND
  page_namespace = 0

UNION ALL

SELECT
  ar_title,
  ar_rev_id AS creation_rev_id,
  ar_timestamp AS creation_timestamp,
  True AS archived,
  IF(log_comment REGEXP "WP:CSD#G3\\|", "vandalism",
       IF(log_comment REGEXP "WP:CSD#G10\\|", "attack",
       IF(log_comment REGEXP "WP:CSD#G11\\|", "spam",
       IF(log_comment REGEXP "WP:CSD#A11\\|", "hoax", "OK")))) AS draft_quality
FROM archive 
LEFT JOIN logging speedy_delete ON
  log_namespace = ar_namespace AND
  log_title = ar_title AND
  log_type = "delete" AND
  log_action = "delete" AND
  log_comment LIKE "[[WP:CSD#%" AND
  log_comment REGEXP "WP:CSD#(G3|G10|G11|A11)\\|" AND
  log_timestamp > ar_timestamp
WHERE
  ar_timestamp BETWEEN "20150827" AND "20160827" AND
  log_timestamp BETWEEN "20150827" AND "20160827" AND
  ar_parent_id = 0 AND
  ar_namespace = 0;

It should theoretically run on labsDB, but I think it might have more of a chance to finish, so I'll be running on both labs and analytics databases. --EpochFail (talk) 17:34, 28 September 2016 (UTC)[reply]


Well, that's still running. So I guess I'll just let it go overnight and see what we've got in the morning. --EpochFail (talk) 22:33, 28 September 2016 (UTC)[reply]