Research talk:The Co-op pilot study/Work log/2015-04-04

From Meta, a Wikimedia project coordination wiki

Saturday, April 4, 2015[edit]

Today, I want to look at the learners who have signed up at the co-op and take some basic measurements of their behavior.

It looks like I can use en:Category:Co-op_learner to gather a set of learners who have completed a profile page. --EpochFail (talk) 17:26, 4 April 2015 (UTC)[reply]


OK... well that required more cleanup than expected, but I think I'm effectively gathering all of the legitimate learners. I found 39.

SQL
CREATE TEMPORARY TABLE staging.learner_profile_names
SELECT
  page_id,
  CONCAT(UCASE(LEFT(CONVERT(profile_title USING utf8), 1)),
         REPLACE(SUBSTRING(profile_title, 2), "_", " ")) AS profile_name
FROM (
  SELECT
    page_id,
    SUBSTRING_INDEX(page_title, "/", -1) as profile_title
  FROM categorylinks
  INNER JOIN page ON page_id = cl_from
  WHERE
    cl_to = "Co-op_learner" AND
    page_namespace = 4 AND
    page_title LIKE "Co-op/%"
) AS profile_titles;
CREATE TEMPORARY TABLE staging.learner_profile_names2
SELECT * FROM staging.learner_profile_names;

SELECT * FROM (
  SELECT
    profile_name,
    user_id,
    user_name,
    user_registration,
    "profile name" AS method
  FROM staging.learner_profile_names
  INNER JOIN user ON user_name = profile_name
  UNION ALL
  SELECT
    profile_name,
    user_id,
    user_name,
    user_registration,
    "first edit" AS method
  FROM (
    SELECT page_id, min(rev_id) AS first_rev_id, profile_name
    FROM staging.learner_profile_names2
    LEFT JOIN user ON profile_name = user_name
    INNER JOIN revision ON page_id = rev_page 
    WHERE user_name IS NULL
    GROUP BY page_id
  ) AS missing_profile_pages
  INNER JOIN revision ON first_rev_id = rev_id
  INNER JOIN user ON rev_user = user_id
) AS learners 
WHERE 
  profile_name NOT IN ("Learner profile", "Learner preload") AND
  user_name NOT IN ("Jtmorgan", "Jmorgan (WMF)") AND
  user_registration >= "20150304";
+--------------------+----------+--------------------+-------------------+--------------+
| profile_name       | user_id  | user_name          | user_registration | method       |
+--------------------+----------+--------------------+-------------------+--------------+
| MColton3           | 24324556 | MColton3           | 20150305051617    | profile name |
| LLRungegordon      | 24316702 | LLRungegordon      | 20150304132432    | profile name |
| Maximus2929        | 24334881 | Maximus2929        | 20150306035009    | profile name |
| NXKMartialArts     | 24368037 | NXKMartialArts     | 20150309100430    | profile name |
| Blue Gnome Tree    | 24374357 | Blue Gnome Tree    | 20150309213439    | profile name |
| MLODROB            | 24362877 | MLODROB            | 20150308204407    | profile name |
| IngridBetancourt   | 24391736 | IngridBetancourt   | 20150311123359    | profile name |
| Saundra monterroso | 24393463 | Saundra monterroso | 20150311154134    | profile name |
| Jackheart314       | 24376996 | Jackheart314       | 20150310035227    | profile name |
| Bossofbossess      | 24393171 | Bossofbossess      | 20150311151112    | profile name |
| Md.Saiful Azad     | 24400513 | Md.Saiful Azad     | 20150312080114    | profile name |
| Gfrederic          | 24393246 | Gfrederic          | 20150311151925    | profile name |
| Koopa24            | 24404941 | Koopa24            | 20150312163816    | profile name |
| Tfan101            | 24408313 | Tfan101            | 20150312232356    | profile name |
| Acad1989           | 24418291 | Acad1989           | 20150313223248    | profile name |
| Jnav7              | 24417269 | Jnav7              | 20150313200414    | profile name |
| EditorRazu         | 24393601 | EditorRazu         | 20150311155638    | profile name |
| Harvey1257         | 24408217 | Harvey1257         | 20150312230948    | profile name |
| Anand kotwal       | 24503064 | Anand kotwal       | 20150322045411    | profile name |
| Shiteshsachan      | 24495812 | Shiteshsachan      | 20150321110632    | profile name |
| Kat.pambid         | 24512909 | Kat.pambid         | 20150323033330    | profile name |
| JamusDoore         | 24528575 | JamusDoore         | 20150324130911    | profile name |
| Brotothextrem      | 24510647 | Brotothextrem      | 20150322213109    | profile name |
| Pmahon628          | 24332327 | Pmahon628          | 20150305212118    | profile name |
| Mothusi Masibi     | 24539122 | Mothusi Masibi     | 20150325131032    | profile name |
| Newrunner769       | 24544211 | Newrunner769       | 20150325211804    | profile name |
| Morg4kele          | 24502509 | Morg4kele          | 20150322031751    | profile name |
| Lauraejpress       | 24563560 | Lauraejpress       | 20150327175117    | profile name |
| Vhernandez20       | 24586977 | Vhernandez20       | 20150330030700    | profile name |
| Daniellecart1      | 24560324 | Daniellecart1      | 20150327112233    | profile name |
| Bodb Derg          | 24590083 | Bodb Derg          | 20150330111337    | profile name |
| Grokjtrip          | 24602657 | Grokjtrip          | 20150331142032    | profile name |
| CityofGlendale2015 | 24615102 | CityofGlendale2015 | 20150401172529    | profile name |
| Jachterberg9       | 24629277 | Jachterberg9       | 20150403121404    | profile name |
| Swasthikprabhu     | 24621495 | Swasthikprabhu     | 20150402141802    | profile name |
| SAVIOR59           | 24368925 | Savior59           | 20150309120420    | first edit   |
| Eugene Issac       | 24380164 | Eugene issac       | 20150310115721    | first edit   |
| Uzi Oz             | 24555588 | UziOz              | 20150326222509    | first edit   |
| Robco311           | 24613350 | BeeCeePhoto        | 20150401141350    | first edit   |
+--------------------+----------+--------------------+-------------------+--------------+
39 rows in set (0.05 sec)

Registration date[edit]

I wonder if there were bursts within which learners joined the coop.

> SELECT LEFT(user_registration, 8) AS date, COUNT(*) FROM staging.coop_learner GROUP BY 1;
+----------+----------+
| date     | COUNT(*) |
+----------+----------+
| 20150304 |        1 |
| 20150305 |        2 |
| 20150306 |        1 |
| 20150308 |        1 |
| 20150309 |        3 |
| 20150310 |        2 |
| 20150311 |        5 |
| 20150312 |        4 |
| 20150313 |        2 |
| 20150321 |        1 |
| 20150322 |        3 |
| 20150323 |        1 |
| 20150324 |        1 |
| 20150325 |        2 |
| 20150326 |        1 |
| 20150327 |        2 |
| 20150330 |        2 |
| 20150331 |        1 |
| 20150401 |        2 |
| 20150402 |        1 |
| 20150403 |        1 |
+----------+----------+
21 rows in set (0.00 sec)

It looks like the co-op got a steady stream of 1-2 learners per day except for a time period between March 13th and March 22nd. What happened there? --EpochFail (talk) 17:35, 4 April 2015 (UTC)[reply]


Time to profile creation[edit]

OK. So I want to include the profile page creation timestamp as well, so I'm going to update the table generating query I have above.

SQL
CREATE TABLE IF NOT EXISTS staging.learner_profile (
  profile_page_id INT,
  profile_name VARBINARY(255),
  profile_creation VARBINARY(14),
  user_id INT,
  user_name VARBINARY(255),
  user_registration VARBINARY(14),
  method VARCHAR(50)
);
TRUNCATE TABLE staging.learner_profile;
INSERT INTO staging.learner_profile
SELECT 
  page_id, 
  profile_name, 
  min(rev_timestamp) as profile_creation,
  user_id,
  user_name,
  user_registration,
  method
FROM (
  SELECT
    page_id,
    profile_name,
    user_id,
    user_name,
    user_registration,
    "profile name" AS method
  FROM staging.learner_profile_names
  INNER JOIN user ON user_name = profile_name
  UNION ALL
  SELECT
    page_id,
    profile_name,
    user_id,
    user_name,
    user_registration,
    "first edit" AS method
  FROM (
    SELECT page_id, min(rev_id) AS first_rev_id, profile_name
    FROM staging.learner_profile_names2
    LEFT JOIN user ON profile_name = user_name
    INNER JOIN revision ON page_id = rev_page
    WHERE user_name IS NULL
    GROUP BY page_id
  ) AS missing_profile_pages
  INNER JOIN revision ON first_rev_id = rev_id
  INNER JOIN user ON rev_user = user_id
) AS learners
INNER JOIN revision ON rev_page = page_id
WHERE
  profile_name NOT IN ("Learner profile", "Learner preload") AND
  user_name NOT IN ("Jtmorgan", "Jmorgan (WMF)") AND
  user_registration >= "20150304"
GROUP BY page_id;

Now to ask the last question, but about profile creation -- not registration.

> SELECT LEFT(profile_creation, 8) AS date, COUNT(*) FROM staging.learner_profile GROUP BY 1;
+----------+----------+
| date     | COUNT(*) |
+----------+----------+
| 20150306 |        2 |
| 20150308 |        1 |
| 20150310 |        3 |
| 20150311 |        2 |
| 20150312 |        2 |
| 20150313 |        4 |
| 20150314 |        2 |
| 20150315 |        2 |
| 20150316 |        1 |
| 20150317 |        1 |
| 20150323 |        2 |
| 20150324 |        3 |
| 20150326 |        3 |
| 20150328 |        1 |
| 20150329 |        1 |
| 20150330 |        2 |
| 20150331 |        3 |
| 20150402 |        2 |
| 20150403 |        2 |
+----------+----------+
19 rows in set (0.00 sec)

OK. Basically the same answer. I wonder how long between registration and profile creation.

> SELECT DATEDIFF(profile_creation, user_registration) AS days, COUNT(*) AS learners FROM staging.learner_profile GROUP BY 1;
+------+----------+
| days | learners |
+------+----------+
|    0 |        5 |
|    1 |       14 |
|    2 |       12 |
|    3 |        2 |
|    4 |        2 |
|    5 |        2 |
|    6 |        1 |
|   21 |        1 |
+------+----------+
8 rows in set (0.01 sec)

Looks like most profile creations happen 1-2 days after registration. I wonder about those profile creations that happened less than a day after registration. HostBot shouldn't allow for that.

> SELECT user_name, TIMEDIFF(profile_creation, user_registration) FROM staging.learner_profile WHERE DATEDIFF(profile_creation, user_registration) < 1;
+------------------+-----------------------------------------------+
| user_name        | TIMEDIFF(profile_creation, user_registration) |
+------------------+-----------------------------------------------+
| IngridBetancourt | 06:19:30.000000                               |
| JamusDoore       | 04:41:57.000000                               |
| Vhernandez20     | 16:45:47.000000                               |
| Grokjtrip        | 03:07:46.000000                               |
| Jachterberg9     | 05:08:58.000000                               |
+------------------+-----------------------------------------------+
5 rows in set (0.00 sec)

It looks like en:User:Jachterberg9 got a post from HostBot [1] at 2015-04-03 17:11:25 (UTC) --EpochFail (talk) 18:00, 4 April 2015 (UTC)[reply]