Jump to content

Research talk:Wikipedia article creation/Work log/Tuesday, December 10th

Add topic
From Meta, a Wikimedia project coordination wiki

Tuesday, December 10th

[edit]

I'm rushing to capture my progress today. I was working on metrics standardization stuff this morning, so I got a late start today.

First, I want to know when page moves started to be recorded in revision comments.

> select left(timestamp, 4) AS year, count(*) FROM nov13_move group by 1;
+------+----------+
| year | count(*) |
+------+----------+
| 2005 |    15907 |
| 2006 |   517014 |
| 2007 |   554246 |
| 2008 |   565643 |
| 2009 |   585088 |
| 2010 |   534849 |
| 2011 |   551479 |
| 2012 |    85404 |
+------+----------+
8 rows in set (7.43 sec)

It looks like it started before 2008 which is all I really care about. Next up, I want to use this move table to track where pages originally started.

SELECT
	page.page_id,
	page.page_namespace,
	page.page_title,
	IFNULL(first_move.from_namespace, page.page_namespace) AS original_namespace,
	IFNULL(first_move.from_title, page.page_title) AS original_title
FROM halfak.nov13_page AS page
LEFT JOIN (
    SELECT
        page_id,
        page_namespace,
        page_title,
        MIN(rev_id) AS rev_id
    FROM halfak.nov13_move
    GROUP BY 1,2,3) AS first_move_id USING (page_id, page_namespace, page_title)
LEFT JOIN halfak.nov13_move first_move USING (rev_id);

Yuck. I just figured out that I wasn't properly extracting page namespace from the move titles. I just restarted that. Hopefully, I'll have my original titles and namespaces ready within an hour. --Halfak (WMF) (talk) 23:02, 10 December 2013 (UTC)Reply