Research talk:HHVM newcomer engagement experiment

From Meta, a Wikimedia project coordination wiki

Time spent editing[edit]

"measure time spent editing and edits per session" What's good? Spending more time or less? Doing more edits or doing more edits per time spent? --Nemo 09:21, 9 October 2014 (UTC)[reply]

Nemo From a scientific point of view, my goal is to understand. We can decide what's good afterwards. --Halfak (WMF) (talk) 21:33, 17 October 2014 (UTC)[reply]
That gets problematic, however, if "afterwards" you discover that the data you collected doesn't give any information useful to determine whether A is better than B. --Nemo 09:46, 25 October 2014 (UTC)[reply]
Well, I think we are collecting data that will have important signal. The point of this exercise is not to determine whether A is better than B. That would be an A/B test. We're conducting an experiment. --2607:EA00:104:3C00:B8FB:E8D2:820D:9D67 19:43, 4 November 2014 (UTC) (Woops. I'm Halfak (WMF) (talk))[reply]

Why just newcomers?[edit]

Seems it would be very worthwhile to study the impact of HHVM on productivity of existing contributors.--Erik Moeller (WMF) (talk) 08:08, 14 October 2014 (UTC)[reply]

Erik Moeller (WMF), I agree. We didn't decide against experimenting with existing contributors so much as we decided to focus on new editors for this first study. It's much easier to deploy a controlled experiment to them in which we know that they have experienced the treatment. Registered users can browse the site without leaving an activity trace to let me know they were there. Also, the experience of newcomers is much easier to control for since the entirety of their logged in experience would be within the treatment. Another reason for focusing on newcomers is that I have built up a collection of strategies for reasoning about newcomer engagement that I know are robust.
TL;DR: It's easier to run this experiment and do a good job of it, so we chose to start here. We should totally run more later. --Halfak (WMF) (talk) 21:33, 17 October 2014 (UTC)[reply]
  • Going back to my original signpost article from last August, and specifically the comment that Erik Zachte had "pointed to the late 2014 speed-up of editing on the Wikimedia sites as a potential contributor to the increase. Implementing HHVM speeded up the saving of edits, which should logically have more impact on wiki gnomes doing lots of small edits than on editors who make just a few saves per hour." I don't now remember whether the second sentence was Erik's opinion or my extrapolation of it, and apologies to him that I didn't make that clear at the time. This experiment adds weight to the idea that speeding up saves would have little effect on those who make few edits; But it doesn't test the theory that speeding up saves would have more impact on wiki gnomes doing lots of saves per hour. That as far as I'm concerned remains the most plausible explanation for the ongoing rally in editing by very active editors. WereSpielChequers (talk) 10:04, 22 February 2016 (UTC)[reply]

Experiment setup[edit]

Something went wrong. Users PRIYANKA_SAHU_PS (22781155) and EvM-Susana (22802661) should have both been in the PHP5 bucket, but all of their edits are tagged HHVM. I'm investigating why. --Ori.livneh (talk) 03:35, 25 October 2014 (UTC)[reply]

Ori: What happened with this? --MZMcBride (talk) 02:29, 9 November 2014 (UTC)[reply]
MZMcBride, sorry to miss this question. We ran a follow-up experiment. We ended up reaching the same conclusion -- no significant effects on editing behavior were observed. So, now we're iterating on a next step. Right now, I'm working on a strategy to look at different types of activity to see if there was an effect. I have a hypothesis related to en:Activity theory that I think is worth testing. I'll get that proposal written up ASAP and ping you here when it's ready. --Halfak (WMF) (talk) 20:22, 3 December 2014 (UTC)[reply]
Cool, thanks for the update! --MZMcBride (talk) 23:19, 6 December 2014 (UTC)[reply]
MZMcBride. Sorry for the lack of an update here. I'm currently blocked on engineering support, so I haven't been able to move forward on this project. In the meantime, allow me to share my hypothesis and proposal.
I think that the reason we are seeing a lack of effect on edit rate is because edits occupy a temporal rhythm (1-7 minutes)(see edit session) that is qualitatively unaffected by a ~1.5 second decrease in page save time. It turns out that there are strong regularities in the temporal rhythms of human behavior[1] that I think suggest goal-directed human activity has a natural hierarchical structure (see en:Activity Theory). If we look at the lowest part of the hierarchy ("operations" in Activity Theory), they seem to correspond with events that take place on the scale of 1-15 seconds -- the timescale where a 1.5 second speed-up could have dramatic effects. It's those places that I'd like to look for evidence. However, there are not many places in MediaWiki where HHVM would speed up such actions (needs to use server CPU, most user "operations" are client-side or hit the caching servers).
This got me considering power tools (e.g. Huggle, Twinkle, etc.) that use the API to gather information and save edits. It turns out that HHVM has yet to roll out to api.php, so we still have an opportunity run controlled experiments as limited deployments of HHVM. --Halfak (WMF) (talk) 17:31, 15 December 2014 (UTC)[reply]

Re: activity sessions and actions[edit]

This part made me wonder (relatedly to my question above) if this is really suitable for a synchronous A/B test. Imagine two examples.

  1. Bob always keeps two edit tabs open; when saving or previewing an edit, he switches to the next tab and works on that different page, then goes back to the previous tab, and so on.
  2. Alice only has about 60 min to spend editing each evening after dinner and before bedtime. She has a watchlist which she's able to follow in this time; when the watchlist goes beyond her time "budget", she removes some items.

So, if you make editing much faster, what happens?

  1. Bob doesn't even notice: he doesn't check the "active" tab for many seconds, or sometimes minutes, until he's done with the next tab. Perhaps, if the action became instant, then he wouldn't need to spread himself across multiple tabs and we'd see an increase in productivity; but until he changes workflow, whether a page takes 3, 30 or 90 seconds to save doesn't matter to him.
  2. Alice will initially not notice she's doing things faster. Maybe she notices towards the end of her daily watchlist checking that she has some minute more, and she just goes slowly. Maybe after several weeks she notices that she's constantly completed her daily "work" earlier, and decides to use the newly available time to expand his activity, so she ends up doing more edits.
  3. Alice starts does some more edits and she ends up touching pages Bob follows, which were not edited in a while. Bob, because opening a page is now faster, decides to open two tabs instead of one, including Alice's diff, and he makes a followup edit which he wouldn't otherwise have done. (Cf. Group Size and Incentives to Contribute.)

Am I being unreasonable? No idea how to test all this though. --Nemo 14:12, 6 January 2015 (UTC)[reply]

I think these are quite a reasonable set of hypotheses. I initially proposed that we didn't look at experienced editors because I expected such effects to be delayed (as you speculate) whereas newly registered users would, presumably, be experiencing Wikipedia without the caching servers for the first time. --Halfak (WMF) (talk) 17:49, 6 January 2015 (UTC)[reply]
Also on the matter

Relative time save[edit]

The other question was "do 3 seconds matter compared to X minutes to prepare an edit?". I think this might be tested right now, with the data you have:

  1. group the edits/sessions by the time required for parsing (e.g. edit[or]s of w:Barack Obama vs. edits to microstubs), see if the 50 % speedup mattered more on the slower pages;
  2. divide the users by their latency (by geography as approximation of network latency? by navigation timing time as approximation of their computer's slowness?), see if the speedup mattered more to those with lower latency.

Then of course this might be inconclusive as well, because

  1. the grouping will have its own bias;
  2. we can't really know what share of the latency is due to the users and their platform, additionally it's possible that few seconds of slowness don't matter that much if your system is very fast and compensates, and vice versa that if you have a very slow system you don't even notice the added slowness on the site's side.

--Nemo 14:25, 6 January 2015 (UTC)[reply]

Correction for multiple comparisons[edit]

Halfak (WMF), may I ask what correction for multiple comparisons was used? HLHJ (talk) 00:48, 3 April 2018 (UTC)[reply]