User:EpochFail/Journal/2011-08-09

From Meta, a Wikimedia project coordination wiki

Tuesday, Aug. 9th[edit]

I just got back from a trip to Minnesota. Lots of good conversations with my lab mates. I got them excited about the work that we are doing here and played some squash.

Now back to work. Today I am finishing up some loose ends for RCom (CMU study) and the first edit session stuff. I want to check something with my survival metric (possible wrongness) and in order to do that, I have to go back to the database. I've already modified a script and I'm just about to kick it off. I highly doubt there will be any change to my outcomes, but I must know. For Science!!

After I am done with this, User:Staeiou, Steven, Maryana and myself are going to kick off some qualitative coding of huggled editor outcomes. This should feed directly into our huggle analysis. We are looking at something on the order of 800-1500 codings (possibly 800+1500). So... this might take a while. --EpochFail 17:44, 9 August 2011 (UTC)

New survival script is running and I'm gathering the new hugglings that happened since last Thursday. I'd like to start thinking about how to understand Fabian's result about the 2007 cohort taking over Wikipedia namespace so I'll try to get a little thinking about that done while I wait.

So... It looks like editors who joined in 2007 dominate the Wikipedia namespaces (4 and 5) even today (~90% of content added). I want to make sure that:

  1. This isn't some rogue bot that we didn't catch.
  2. This isn't the result of auto-wiki browser or a huge amount of manual, janitorial work.

I want to look for:

  1. Whether this activity is representative of the proportion of admins from the cohort.
  2. The top byte contributors of the most recent month we have data and look to see what they are doing.
  3. Where are the bytes being added? Policy pages? Essays? Discussions? AIV/AfD/ANI? WikiProjects? There might be a nice way to see how the overwhelming activity of 2007ers affects each group of Wikipedia namespace articles.

I wish I had time to code a massive amount of edits to the Wikipedia namespaces by editors in post-2007 cohorts to see why they don't stick around and keep editing. I suspect that newer editors are having a bad initial experience when they get involved in the Wikipedia namespaces and that's why they aren't sticking around. I also suspect that the old technologies being used (IRC, Mailing lists, etc) are overwhelming to new would-be Wikipedians. --EpochFail 18:08, 9 August 2011 (UTC)

I just finished checking my first_session/survival stuff and, as expected, nothing changed. I'm still waiting to get to hand coding huggled editor outcomes, so I'm going to start poking at 2007 Wikipedians to figure out what they are doing. --EpochFail 20:43, 9 August 2011 (UTC)

I just produced a result that refutes Fabian's result about the activity level of editors in the Wikipedia namespaces (ie. 4,5). It looks like it was a bug in detecting bots that caused the problem.

The following query sums bytes added (by not-reverting edits) by editor cohort (first_edit_year) for namespaces 4 and 5 in December of 2010.

SELECT 
    b.user_id IS NOT NULL as bot, 
    first_edit_year, sum(len_added) 
FROM halfak.fabian f 
LEFT JOIN halfak.bot_20110711 b 
    ON f.user_id = b.user_id 
WHERE rev_year = 2010 
AND   rev_month = 12 
AND   namespace IN (4,5) 
GROUP BY b.user_id IS NOT NULL, first_edit_year;
+-----+-----------------+----------------+
| bot | first_edit_year | sum(len_added) |
+-----+-----------------+----------------+
|   0 |            2001 |          52070 |
|   0 |            2002 |         461852 |
|   0 |            2003 |        1475028 |
|   0 |            2004 |        4486109 |
|   0 |            2005 |       14997503 |
|   0 |            2006 |       21268645 |
|   0 |            2007 |       14691717 |
|   0 |            2008 |        9066249 |
|   0 |            2009 |        6959055 |
|   0 |            2010 |       14841890 |
|   1 |            2004 |            142 |
|   1 |            2005 |         189807 |
|   1 |            2006 |        7766474 |
|   1 |            2007 |      127051748 |
|   1 |            2008 |       20689724 |
|   1 |            2009 |        2261342 |
|   1 |            2010 |        5629154 |
+-----+-----------------+----------------+

Wednesday, Aug. 10th[edit]

I started today by picking up on a list of project namespace articles where the Steven/Maryana/Zack want the top contributors. I started by loading their list into db42 and checking the complexity of gathering the relevant pages for the listed titles. There are a couple of problems such as "No_original_research/Noticeboard" is a subpage of "No_original_research". Stuart is helping me discover which titles should not include subpages and which ones should. It seems to me that all projects shouldn't have their subpages included. We'll see. --EpochFail 17:44, 10 August 2011 (UTC)

Still waiting to get to coding so I am working on nice Welcomers--editors who welcome future Wikipedians. As I'm generating this table, I'm thinking about how I want to document datasets. I'm going to try creating a page documenting a dataset. --EpochFail 20:25, 10 August 2011 (UTC)