Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-02-21

From Meta, a Wikimedia project coordination wiki

Friday, February 21st[edit]

Finally back to work on this project. I've been away for a couple of days to attend CSCW[1]. OK. So last time i worked out a query that handles the first and last parts of the funnel -- registration and making edits. Now I need to set up the two middle parts. What kind of page do newly registered users get redirected back to and what kind of CTA are they seeing when they get there. Luckily, these bits of the data are from EventLogging which means that all language wikis live together (in harmony) in one database. That makes my life much easily.

So, in order to figure out which namespace new users are redirected to, I get to make use of the ServerSideAccountCreation schema.

> select event_returnTo from ServerSideAccountCreation_5487345 ORDER BY RAND() limit 10;
+-------------------------------------------------------------+
| event_returnTo                                              |
+-------------------------------------------------------------+
| Estadio Alberto Jacinto Armando                             |
| Wikipedia:Portada                                           |
| Wikipedia:Sandbox                                           |
| NULL                                                        |
| Emmanuel Adebayor                                           |
| NULL                                                        |
| Звенящие кедры России (движение)                            |
| File:Ahafez.jpg                                             |
| Nobru                                                       |
| Value at risk                                               |
+-------------------------------------------------------------+
10 rows in set (0.04 sec)

No. No no no no no. Why are we storing the namespace with the title? *facepalm* Looks like we need to solve this problem. So, I need to extract the namespace in order to figure out if it is a special page, talk page, etc. that the user is being returned to. Bah!

> select event_returnTo from ServerSideAccountCreation_5487345 WHERE wiki = "eswiki" ORDER BY RAND() limit 10;
+----------------------+
| event_returnTo       |
+----------------------+
| Fórmula química      |
| Especial:Buscar      |
| NULL                 |
| Estaciones del año   |
| Especial:Seguimiento |
| Especial:Libro       |
| Sistema heterogéneo  |
| Correo electrónico   |
| JonBenét Ramsey      |
| Wikipedia:Portada    |
+----------------------+
10 rows in set (0.04 sec)

So, notice the "Especial:Libro". I assume that is in the "Special" namespace. In order to be able to process these namespaces, I'm going to need to know how each Wiki says "Talk", "Special" and Main (which is a special case that requires me to know all other namespaces). Time to go fix this problem so that it doesn't come back to bite me later. --Halfak (WMF) (talk) 21:41, 21 February 2014 (UTC)[reply]


OK. Change made. See [2]. Now, in the meantime, I'm not going to be able to make use of what we have. That means I need to know the full set of namespaces for each of these wikis including aliases. --Halfak (WMF) (talk) 21:56, 21 February 2014 (UTC)[reply]


So, this query will get the set of namespaces that we need: [3]

Time for some python. --Halfak (WMF) (talk) 21:56, 21 February 2014 (UTC)[reply]