Research talk:Teahouse long term new editor retention/Work log/2015-10-01

From Meta, a Wikimedia project coordination wiki

Thursday, October 1, 2015[edit]

Let's gather a sample of all of the edits that these 14766 editors made between registration and when they were invited to the Teahouse (or would have been invited, in the case of the control sample).

First, all of there edits to pages that currently exist on the wiki, and a bunch of metadata about those edits. Include deleted edits.

#138856 rows
create table th_retention_sample_preinvite_edits
select * from enwiki.revision r join jmorgan.th_retention_sample trs on r.rev_user = trs.user_id join enwiki.page p on r.rev_page = p.page_id where r.rev_timestamp < DATE_FORMAT(trs.sample_date, '%Y%m%d%H%i%s') order by user_id, rev_id asc;

Then add in edits they made to pages that were subsequently deleted.

#21425 rows
insert into jmorgan.th_retention_sample_preinvite_edits (rev_id, rev_page, rev_text_id, rev_comment, rev_user, rev_user_text, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, rev_content_model, rev_content_format, user_registration, sample_date, sample_group, page_namespace, page_title) select ar_rev_id, ar_page_id, ar_text_id, ar_comment, ar_user, ar_user_text, ar_timestamp, ar_minor_edit, ar_deleted, ar_len, ar_parent_id, ar_sha1, ar_content_model, ar_content_format, user_registration, sample_date, sample_group, ar_namespace, ar_title from enwiki.archive a join jmorgan.th_retention_sample trs on a.ar_user = trs.user_id where a.ar_timestamp < DATE_FORMAT(trs.sample_date, '%Y%m%d%H%i%s') order by user_id, ar_rev_id asc;