Research talk:Productive new editor/Work log/Monday, January 27th

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 10 years ago by Halfak (WMF) in topic Monday, January 27th

Monday, January 27th[edit]

I'm trying to get a sense for what productive newcomers look like over time. Since it takes a lot of time to identify reverts, I'll down-sampling aggressively. Also, I'm working with an issue that resulted in missing user_registration for users who registered before Dec. 26th, 2005. You'll notice that I am making use of the "approx_registration" table. I'll have more written up on that later. Suffice it to say that I'm trying to interpolate approximate registration dates based on the dates that editors make their first edit.

Now onto the query that samples users. I settled on a monthly stratified sample. This will let me make inferences about the number of productive new editors that start editing per month and look for trends.

Stratified sample SQL
/* 
Results in a sample of new_user_info.

Note that users sampled before 2007 are not filtered because the requisite 
logging was not in place at that time. 
*/
/*********************************** 2001 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200101" AND "200102" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200102" AND "200103" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200103" AND "200104" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200104" AND "200105" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200105" AND "200106" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200107" AND "200108" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200109" AND "200110" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200111" AND "200112" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2002 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200201" AND "200202" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200202" AND "200203" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200203" AND "200204" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200204" AND "200205" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200205" AND "200206" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200207" AND "200208" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200209" AND "200210" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200211" AND "200212" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2003 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200301" AND "200302" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200302" AND "200303" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200303" AND "200304" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200304" AND "200305" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200305" AND "200306" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200307" AND "200308" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200309" AND "200310" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200311" AND "200312" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2004 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200401" AND "200402" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200402" AND "200403" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200403" AND "200404" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200404" AND "200405" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200405" AND "200406" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200407" AND "200408" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200409" AND "200410" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200411" AND "200412" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2005 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200501" AND "200502" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200502" AND "200503" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200503" AND "200504" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200504" AND "200505" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200505" AND "200506" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200507" AND "200508" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200509" AND "200510" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200511" AND "200512" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2006 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200601" AND "200602" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200602" AND "200603" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200603" AND "200604" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200604" AND "200605" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200605" AND "200606" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200607" AND "200608" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200609" AND "200610" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200611" AND "200612" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2007 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200701" AND "200702" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200702" AND "200703" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200703" AND "200704" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200704" AND "200705" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200705" AND "200706" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200707" AND "200708" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200709" AND "200710" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200711" AND "200712" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2008 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200801" AND "200802" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200802" AND "200803" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200803" AND "200804" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200804" AND "200805" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200805" AND "200806" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200807" AND "200808" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200809" AND "200810" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200811" AND "200812" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2009 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200901" AND "200902" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200902" AND "200903" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200903" AND "200904" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200904" AND "200905" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200905" AND "200906" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200907" AND "200908" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200909" AND "200910" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "200911" AND "200912" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2010 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201001" AND "201002" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201002" AND "201003" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201003" AND "201004" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201004" AND "201005" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201005" AND "201006" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201007" AND "201008" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201009" AND "201010" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201011" AND "201012" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2011 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201101" AND "201102" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201102" AND "201103" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201103" AND "201104" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201104" AND "201105" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201105" AND "201106" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201107" AND "201108" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201109" AND "201110" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201111" AND "201112" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2012 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201201" AND "201202" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201202" AND "201203" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201203" AND "201204" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201204" AND "201205" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201205" AND "201206" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201207" AND "201208" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201209" AND "201210" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201211" AND "201212" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
/*********************************** 2013 *************************************/
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201301" AND "201302" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201302" AND "201303" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201303" AND "201304" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201304" AND "201305" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201305" AND "201306" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201307" AND "201308" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201309" AND "201310" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)
UNION
    (SELECT user_id, user_name, approx_registration FROM staging.new_user_info
    WHERE approx_registration BETWEEN "201311" AND "201312" AND creation_type = "self"
    ORDER BY RAND() LIMIT 300)

Now, I've run a script over these users that aggregated activity at two values of : 1 day and 1 week.

The proportion of newly registered users who a productive edit is plotted monthly with an loess fit.
Productive new editor proportion. The proportion of newly registered users who a productive edit is plotted monthly with an loess fit.

So the proportion of productive new editors has been highly constant over time. Further, it looks like the value of doesn't really matter very much. I wonder what it looks like if only measure the proportion of editors who actually edit (e.g. new editors).

The proportion of new editors who make a productive edit is plotted monthly with an loess fit.
Productive new editors per new editor. The proportion of new editors who make a productive edit is plotted monthly with an loess fit.

That's surprising. The proportion of new editors who edit productively dropped between 2006 and 2008. This could be related to the increasing use of counter-vandalism tools. Evidence from Research:The Rise and Decline, specifically Media:Desirable_newcomer_reverts_over_time.png, suggests that, during this period, good-faith new editors were about 4 times more likely to be reverted in 2008 than in 2006.

But wait a second... if the proportion of registered users who edit productively hasn't changed, but the proportion of new editors who edit productively has changed, we should see a change in the proportion of new users who edit.

The proportion of newly registered users who make an article edit in their first day (new editors) is plotted monthly with an loess fit.
New editors per newly registered user. The proportion of newly registered users who make an article edit in their first day (new editors) is plotted monthly with an loess fit.

Sure enough, there it is. One thing to consider here is that I'm filtering autocreated users out after 2008, but sadly, I can't filter them out before then. This could be related to the transition that we see, but if it was causing something substantial, we'd expect to see something like a step right in the beginning of 2008. --Halfak (WMF) (talk) 18:21, 27 January 2014 (UTC)Reply