Research talk:Newcomer survival models

From Meta, a Wikimedia project coordination wiki

Edits vs. Edit session[edit]

Really cool work Aaron. I'm curious about the correlation between edits, session, and time spent, and how the lenses can tell us more when viewed together than separately.

For example, there is a relationship between the number of edits and number of sessions. If every edit corresponded to one session, then the edit hazard plot would be identical to the session hazard plot. So when we say that the there is only a 42.9% drop-off after the second session shouldn't the comparison to the edits metric be the drop-off after [average edits/session] * [2 sessions] edits? What's the additional insight we get from the sessions metric that we don't get from the edits metric? Howief (talk) 00:24, 11 October 2013 (UTC)[reply]

That's a good question Howief. There's two ways to explain the benefit: statistical and intuitive. So, let's start with intuitive. The only place that a newcomer can be lost is between sessions. This is, of course true with the between edits measure, but my work with Research:Metrics/edit sessions suggests that the more time between edits, the more likely it is that the user has walked away from the computer (or at least Wikipedia). The divide between edit sessions is intended to capture these special gaps between edits. Now, from a statistical point of view, a mean doesn't really exists. It's really just the center of some error distribution. While I can give you an average number of edits per session, that doesn't mean that a session is very likely to end once that number of edits has been completed. In fact, the error in the number of edits per session is quite wide (and log-normally distributed) so counting edits is a pretty bad way of determining session cutoffs. What this means is that the divide between edits 2 and 3 is far blurrier (in terms of statistical error) than the divide between sessions 2 and 3. Right now, it's hard to say from the measurements that I have taken thus far that this (theoretical) benefit will be of any use to us. Yet, with a better measuring device, we ought to be able to measure more nuanced (not necessarily small) effects. I plan to make use of these survival measures in the next round of Growth experimentation to determine which metrics measure with the highest consistency. --Halfak (WMF) (talk) 22:04, 11 October 2013 (UTC)[reply]

Edit session time[edit]

@Halfak (WMF): this is great work and very informative. I have a question. In Research:Edit session you pick an edit session cutoff duration of 60 minutes ( = 1 hour). In this paper, both Figure 9. Population proportion by session duration and Figure 10. Hazard by session duration show an edit session duration of 85.68 minutes. I expect these are equivalent from a statistical standpoint, but I thought I'd ask about it just to see if I'm missing something. Thanks for your input. 64.40.54.29 20:12, 24 November 2013 (UTC)[reply]

Good question and I'm sorry that this isn't clear. A session "cutoff" has nothing to do with the duration. The cutoff refers to the time between edits (see Research:Edit_session#The_arbitrary_value_of) whereas the "duration" refers to the time between the first and last edits in the session (plus an estimated time buffer [1]). In other words, the cutoff is about clustering edits together into sessions and the duration is the time spent editing during a session. --Halfak (WMF) (talk) 18:06, 25 November 2013 (UTC)[reply]
Ahhh, I see. I was confused. Thanks very much for the clarification. Best. 64.40.54.90 04:20, 26 November 2013 (UTC)[reply]

Termination events[edit]

Nice work. I was wondering if you'd looked at things that trigger a departure? I'm assuming blocks, deletion/rejection of content and edit conflicts will be the main triggers. Though the less we want to lose the editors the more difficult it seems to be to identify the cause of losing them, we don't seem to log edit conflicts..... WereSpielChequers (talk) 15:09, 10 April 2017 (UTC)[reply]