Research talk:Are the bots really fighting/Work log/2017-03-15

From Meta, a Wikimedia project coordination wiki

Wednesday, March 15, 2017[edit]

I'm doing some data cleanup today. I have two problems.

  1. I was only including bot reverts that included bots that currently are flagged.
  2. I was not grabbing the last reverted edit before the reverting edit but rather the most distant reverted edit from the reverting edit.

So let's fix both!

First, I grabbed all of the usernames from en:Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots and combined that with a list that Staeiou curated, the list of bots flagged in enwiki.user_groups and the bots that used to be flagged from enwiki.user_former_groups.

dbstore staging -e "SELECT user_id, user.user_name FROM enwiki.user INNER JOIN tmp_geiger_bot_usernames_enwiki_20170315 g ON user.user_name = g.user_name UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN tmp_unflagged_bot_usernames_enwiki_20170315 u ON user.user_name = u.user_name UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN enwiki.user_groups ON ug_user = user_id WHERE ug_group = 'bot' UNION SELECT user_id, user.user_name FROM enwiki.user INNER JOIN enwiki.user_former_groups ON ufg_user = user_id WHERE ufg_group = 'bot';"

This gave me a dataset of 1417 bots (compared to the 314 I had before).

I also cleaned up the bot2bot query.

SELECT reverted.*, page.page_namespace AS page_namespace
FROM staging.enwiki_reverted_20140820 AS reverted
INNER JOIN enwiki.page ON rev_page = page_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverted_bot ON
        rev_user = reverted_bot.user_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverting_bot ON 
        reverting_user = reverting_bot.user_id
WHERE 
        rev_user != reverting_user AND
        rev_revert_offset = revisions_reverted
UNION ALL
SELECT reverted.*, archive.ar_namespace AS page_namespace
FROM staging.enwiki_reverted_20140820 AS reverted 
INNER JOIN enwiki.archive ON rev_id = ar_rev_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverted_bot ON
        rev_user = reverted_bot.user_id
INNER JOIN staging.enwiki_unified_bot_20170315 AS reverting_bot ON
        reverting_user = reverting_bot.user_id
WHERE   
        rev_user != reverting_user AND
        rev_revert_offset = revisions_reverted;

The trick is in the lines that say rev_revert_offset = revisions_reverted. This makes sure that the reverted edit we select is the *last* edit in the string of reverted edits. The query is running now. I should have data soon. --EpochFail (talk) 22:30, 15 March 2017 (UTC)[reply]