Schema talk:MobileWebWikiGrokResponse

From Meta, a Wikimedia project coordination wiki
Maintainer:Jon Katz & Sam Smith
Team:Reading - Web
Project:Mobile microcontributions
Status:
inactive
Purge:Auto-purge pageId + eventCapsule PII after 90 days

String type for userId[edit]

Since events logged with this schema are for logged-in users only, why is the userId a string and not an integer? – Phuedx (WMF) (talk)

@Phuedx (WMF): the rationale was to make the schema agnostic about the user class, so that we wouldn't have to change the type and bump the schema ID in production if and when this goes live for anons. Since nobody knows when this is going to happen, I'm fine switching to integer for now. cc Kaldari Maryana (WMF) --Dario (WMF) (talk) 17:00, 20 October 2014 (UTC)[reply]
@Dario (WMF): Hmm, actually, a few of us just met to discuss possible ways to get more eyeballs on this in the alpha/beta testing phase, and showing it to logged out users was on the table... We may not be able to do it in stable for quite some time, but we could in alpha/beta for testing purposes, so let's not paint ourselves into the logged-in-only corner if possible. Maryana (WMF) (talk) 00:22, 21 October 2014 (UTC)[reply]
@Maryana (WMF): agreed. --Dario (WMF) (talk) 15:12, 21 October 2014 (UTC)[reply]
Done

Do we really need the full host?[edit]

Rather than the full host – which, granted, are all well known – could we use the dbname, e.g. enwiki, dewiki, frwiki, etc? – Phuedx (WMF) (talk)

@Phuedx (WMF): both fields are logged by default in the Schema:EventCapsule and should not be included in this schema. I'll go ahead and remove the host field. cc Kaldari --Dario (WMF) (talk) 17:02, 20 October 2014 (UTC)[reply]
To be clear, this assumes server-side instrumentation on the project that receives the response, not Wikidata. If the receiving host is different from what the capsule captures, we would obviously need to log it explicitly. --Dario (WMF) (talk) 17:04, 20 October 2014 (UTC)[reply]
Done

valueSelected[edit]

The schema assumes valueSelected as required and boolean. This is because we're not storing "not sure" responses (in the binary question version) and we're only storing true and false values otherwise (tagging task). Any kind of "missing" or "unsure" or "I'm bored" or "what's this" response should be instrumented in the event log with dedicated actions, as you guys see fit. cc Kaldari, Phuedx (WMF), Maryana (WMF), MSyed (WMF). --Dario (WMF) (talk) 17:16, 20 October 2014 (UTC)[reply]

Per discussion with Kaldari and Moiz, we turned the field into an optional boolean and added an explanation on what values specific designs should store --Dario (WMF) (talk) 00:26, 22 October 2014 (UTC)[reply]
Done

Event lookup for users with a completed response[edit]

Given that taskToken is a key shared between this log and Schema:MobileWebWikiGrok it will be possible to associate a userId from the (public) response log with the corresponding (private) set of events, not just for the same task (as identified via the taskToken) but also for all other WikiGrok interactions by the same user (assuming the userToken is persistent), including page impressions. Is this desirable / intended and if not I'd like to hear from you guys how we could further tighten these logs to avoid this. --Dario (WMF) (talk) 18:22, 20 October 2014 (UTC)[reply]

@Maryana (WMF) and Dario (WMF): Hmm, this is indeed a problem. Since the taskToken is shared between the tables it will allow mapping a userToken (MobileWebWikiGrok) to a userId (MobileWebWikiGrokResponse), and thus allow constructing page impression history for users who have submitted responses. The only 2 solutions I can think of are:
  1. Decouple the 2 tables and record response data as a JSON blob in the MobileWebWikiGrok table in addition to here. This will make any kind of deep analysis of user behavior very difficult/expensive.
  2. Use a userToken for both tables. This will make analyzing behavior by user characteristics impossible unless we explicitly include the user characteristics we want in this table as well (e.g. user registration date).
The 2nd solution is probably preferable for now. Any thoughts? Kaldari (talk) 00:31, 22 October 2014 (UTC)[reply]
@Kaldari and Dario (WMF): Definitely option 2 sounds more attractive. I discussed this a bit with Kaldari in person. I think it's ok for now to not have access to userid for post-hoc/long-term analysis because:
  • at this stage this isn't being thought of as a long-term retention feature
  • we might go ahead and open it up to anons in alpha/beta during the coming sprint to get more eyeballs 'groked, so we're already moving beyond the userid world
  • all we really need right now is edit count as a quick proxy for "is this a new-to-Wikipedia person or an old-timer?"
The only outstanding issue is how we filter out staff test edits, which at this point potentially comprise a non-trivial chunk of the events being fired. Kaldari proposed adding an is_test parameter that looks for use of the query string (see bottom of the schema); that works for me, and maybe as an extra level of data QA the team could also spend 10 minutes firing off events while logged in so we can get a quick tally of our own internal user hashes to filter out later. @Phuedx (WMF): I believe Kaldari left comments on your patch w/all of this stuff, so hopefully it's clear how to move forward (yay round the clock development!). Maryana (WMF) (talk) 01:52, 22 October 2014 (UTC)[reply]
@Maryana (WMF): excellent. Do you expect we will want to segment respondents by other dimensions (like registration date)? If not–and assuming Baha, Kaldari and Sam have no other concerns–let's freeze the schema as it is. I had a chat with Kaldari regarding the testing field, it's a hack but it should do the job. I am posting a bugzilla ticket to ask Analytics Dev to add proper "debug" support without having to tweak a field in the schema --Dario (WMF) (talk) 04:19, 22 October 2014 (UTC)[reply]
Done
@Dario (WMF): re: segmenting by reg date, I thought about that, but it doesn't seem all that important for now (esp. with regard to the second bullet point – if we start showing WG to readers, they'll be responsible for the bulk of the usage, so that'll be moot anyway). Maryana (WMF) (talk) 17:54, 22 October 2014 (UTC)[reply]

Test mode[edit]

See this bug for a proposal to handle test events without having to hardcode a testing field in a schema. --Dario (WMF) (talk) 16:53, 22 October 2014 (UTC)[reply]