Wikimedia Foundation metrics and activities meetings/Quarterly reviews/Analytics/June 2014

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Analytics team, June 23, 2014, 9:30 AM - 11:30 AM PDT.

Present: Lila Tretikov, Erik Möller, Toby Negrin, Dario Taraborelli, Kevin Leduc, Jessie Wild, Leila Zia, Tilman Bayer (taking minutes)

Participating remotely: Andrew Otto, Aaron Halfaker, Oliver Keyes, Nuria Ruiz, Erik Zachte, Dan Andreescu, Christian Aistleitner

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Agenda:
Introduction & Strategic Goals - Toby
Research and Data - Dario
Development - Kevin
Summary and Q & A - Toby

Presentation slides (full set)

Introduction and strategic goals[edit]

Toby: Welcome
this the review for Q4
(attendee introduction round)
agenda
Asks of audience
this is a bit different from previous quarterly reviews:

  • team is becoming more mature
  • new ED

new strategy over last 1-2 months
reflections from my first 2 quarterly reviews:
first one - I didn't yet understand WMF; what I brought from industry was wrong here
2nd review: excited, but lots to do
now: really making progress
group structure: 2 teams: development / research and data
Lila: breakdown?
Toby:
3 devs, 1.5 techops, 1 PM in Dev
5 researchers/developers in R&D
so roughly 50:50
that ratio will probably persist
R&D is primary stakeholder
development team is somewhat protected from our customers
teams collaborate on e.g. standardization, privacy
revisiting R&D model
followup on Q3
impact: ...
focus has always a problem for us, both teams do a lot of work in many areas
good quarter for community area: hackathons, maintain high level of relations with external researchers
had great offsite, good for collaboration on e.g. standardization
took place near Zurich, with entire team (except Oliver) - first time all were together (team is very distributed)
spent time on team values, privacy, future
Balancing Privacy and Understanding:
Some community members are very passionate about it
I think it's something we can lead the industry on, but also need to balance with data need
I see privacy policy (worked on it with Legal) as kind of the bedrock for us, but working beyond that
Proposed direction:
In past month, became clear that we need to take leadership on practices and training
Kevin worked on this with e.g. Dario, Howie/Product...
made good progress
Not a roadmap change, but need to put in more specific intent
Be more assertive, help people reach their goals
Be more involved in the beginning
what we found: a lack of consistent goals, metrics and techniques, makes it hard to compare and communicate
Help standardize things through outreach, training, instrumentation
I like models ;)

DIKW-diagram.png

Classic model: data - info - knowledge - wisdom (DIKW); maps to our work (slide)

Systems/services overview (slide)
left - software
right - knowledge and services
would like to have fewer systems, make things clearer

Editor Model[edit]

thing that stuck out: had lots of metrics, quite granular, hard to compare
big metric at top: active editors
I spent some time in gaming industry - not a great industry, but they did have a good model
Lila: have we looked at whether active editors is the right metric?
Toby: not necessarily, but it's the person we brought to the dance ;) worked with it for ~3 years, certainly not the only/defining measure for successful encyclopedia, but the problem WMF has set itself to solve
Model (slide):

  • goal (active editors)
  • metrics (acquisition, activation, retention)
  • levers (e.g. new registrations)

made it much easier for product teams (e.g. VE, Growth) to understand impact
Lila: from observation, some teams have a hard time mapping instrumentation into these
e.g. VE team doesn't seem to have good grasp on what needs to be monitored
Toby: that's what I mean by Analytics team becoming more assertive, training
Lila: talk about this later, but could think about embedded model
Toby: totally, and recall from the Growth quarterly review that they took these boxes and applied them to their work
Lila: yes, they are closest I've seen to that
Should go into teams' Q1 goals
they go into blue boxes...
Dario: (some teams) focused on low tier metrics, not realized how/that they are contributing to higher tiers
e.g. UploadWizard
Erik: teams like VE and Flow have big underlying hypothesis, harder to chunk
causes a bias to focus on milestones, granular metrics
e.g. current Teahouse work for Flow: smaller scale, but more immediately
on the other hand, Growth does smaller projects
Lila: I think everyone should try to do this
what's important is not the answer to the question, but that the team asks it
Toby: I got traction on this with Product people, works for them
progress:
Dario and Aaron did good work on pivoting with metrics
Talked with Grantmaking (Anasuya/Jessie): this has value outside Product, and outside WMF
Still to do: release planning, visualization, post MVP metrics, community engagement
Jessie: we already used that model (acq/act/retention) to categorize our grantmaking, starting with IEG
people understand this

--break--

Kevin:
I'm the new PM for the development team
started shortly before Q4
Lila: so were your assumptions wrong too? ;)
Kevin: I didn't have any ;)
Lila: reporting to Toby? yes
(Kevin:)
biweekly commitments for sprints
more predictability
backlog
we lost two team members shortly after I joined
still got lot of work done
stopped using Mingle, experimented with Phabricator for a while
now using Scrumbugs (built on top of Bugzilla)
and Etherpad for realtime collab
spreadsheets for daily work
got good at planning sprints
next step: release planning
most of team is remote
team overview: ...
3 devs ,1.5 ops, Andrew Otto and Jeff Gage
Toby: wanted reduce burden on Andrew as he is the only ops person working on analytics
he was single point of failure
Erik: also, need...[?]
Toby: hope we can get Jeff integrated
Toby: our meetings are hard to attend for WMF-based folks
Dario: also have used quite a bit of time of Sean Pringle..
Lila: if there are specific tasks, they should be on [other teams'] todo list
Kevin: ... impact on team and its bandwidth
Ori will be joining 50%
Lila: so still looking for these two open reqs?
Toby: yes
Kevin:
what we planned in Q4 (slide from last time)
epics (not products, but groups of features)
...
Lila: that is the problem with epics, they are broad and are hard to fully solve
what is the pageview API?
Toby: we make logs available, community creates various APIs
Kevin: working on pageview definition
metrics on what we did (#of stories, Bugzilla bugs)
Lila: typical story complexity?
Kevin: something that can be achieved in 2 weeks
Lila: do you have a velocity in mind?
Kevin: yes
Toby: did 8 story points per sprint, team decided to move up to 13
had so many production issues (that held up work)
then people started to grab other things on the side ;)
Lila: set goals you can achieve, then a high watermark
Toby: we were aiming for predictability, can start to push now
Kevin: ...
Lila: do you size bugs?
Kevin: we track hours in every standup
also on non-planned work, often production issues
Lila: that's fine, as long as you are tracking running averages
Kevin: example for production issues:

  • datacenter migration caused issues with slaves
  • dashboards
  • ...

Vital Signs:
Wikimetrics: user-friendly tool generating reports, originally created for Grantmaking
wanted to avoid that people create own metrics and then compare apples/oranges
Wikimetrics enhancements this q:

  • recurring reports
  • public reports
  • expand from user cohorts to entire projects (e.g. enwiki, dewiki)

new model metrics:

  • newly registered user
  • 3 others ready for implementation

Mobile usage (Oliver)
EventLogging transition (took over from Platform)
goal: make/keep it fully operative, no new features
Metrics Definition Standardization
Lila: there is a lof of stuff here but you need to describe the benefits of all this work for the WMF, the team, the community
Kevin: Vital Signs will benefit entire org
Dario: and community
Kevin: accurate pageviews for WP0
work hasn't started, dependencies on Hadoop
Toby: WP0 needs these for marketing to carriers, and for evaluating success of program in general
Kevin:
Refinery: (slide)
Erik: images (upload.wikimedia.org) too?
Aaron: planned
Lila: what do the colors mean?
Andrew: "messages" = single web HTTP request to WMF servers
Lila: so an AJAX type request could be several messages?
Andrew: yes
most of what we count as pageview is probably in "text"
Erik: are we already generating metrics from this? not yet
Toby: already used it to debug some requests, resolve puzzling issues
Oliver: most of my recurring code still uses sampled logs
but e.g. session analysis stuff came from Hadoop already
Kevin:
req for geowiki work
Toby: Anasuya forwarded this, so we will be able to work on it already

Q1 goals[edit]

Lila: Q1 goals seem quite abstract
Kevin: Vital Signs: complete MVP (dashboard and metrics for lifecycle model)
Lila: should list the graphs everyone should be looking at
Toby: this is an extension of editor Model
Kevin:

  • EventLogging: operationalization, geocoding IPs

Lila: what am I going to get out of it?
Erik: this is not for reach (pageviews)
Dario: for now ;)
Erik: ... but for editor engagement, has been used there for a while
geocoding is just for understanding geographical differences in editor behavior
Lila: OK, trying to get at the need for this
correlate activity of an editor throughout workflow?
Erik: yes, in aggregate
Hadoop is for big data stuff, e.g. uniques
Toby: EventLogging enables arbitrary instrumentation
Erik: e.g. dashboards for VE
a very generalized system
Lila: what does operationalization mean here, what are the benefits?
Toby: Ori basically managed EventLogging for himself
Erik: Ori was the EventLogging alert system ;)
Toby: so this is for keeping it up and running
Dario: Vital Signs: don't have a consistent way of generating metrics across projects
Kevin: (continues on goals)

  • Refinery: new Hadoop release,...
  • Wikistats: enhancements and bugfixes
  • support existing/legacy systems

Still prioritization needed for asks from other teams: WP0, Geowiki
Lila: need to get teams together on this
Jessie: not mentioned: talked about WikiMetrics further development

Research and Data[edit]

Research & Data section of the slides

Dario:
team: 5 people - a year ago, we were 2, grown over last quarter
goal is to produce knowledge and support decisionmaking
primarily quantitative, but also work UX
Lila: anything apart from user behavior?
(Erik, Tilman and others: digression on editor surveys)
Dario:
Q4: standardization, topic research, ...
had monthly active editor metric
expand to rolling monthly
add levers: ...
Lila: why Italian Wikipedia?
Dario: interesting large project, different pattern
Toby: acquisition is not on this slide
Lila: are you happy with the breakdown?
Dario: ...
Toby: striving for simplicity
Erik: help us iterate on which are the right ones
Lila: little upwards trend in enwiki new active editors at the end, is that an artifact?
Aaron: no, but not significant yet (?)
Dario: topical research:
Mobile: laid down groundwork for breaking down traffic by regions, device[?]
chart for tablet redirect pageview impact last week - nice example that we can now react quickly
Mobile acquisition
Lila: only a small number of editors will register...
Toby: needed for editing on Mobile
Dario:
editor acquisition - where do new editors come from
Lila: where did the 5 edit threshold come from? did we run a learning algorithm back then?
Dario: legacy definition
Lila: so it was arbitrary, right?
Erik Zachte: We chose that for Wikistats early on to restrict to people who seriously try to add/edit content, not just testing out site
Dario: activation rate: highest contribution from those who come from specific topic, less from generic internal referrals
rate has been quite flat on these large projects over recent years
Toby: smaller projects can be quite different
Dario: also, more different (e.g eswiki) on smaller thresholds, converge on 5+
Leila worked a lot on editor survival
first time we do this kind of analysis
active editor migration - do editors leave or migrate between different WMF projects?
Lila: do we care?
Dario: more holistic understanding, migration patterns. ErikZ worked on that
Jessie: Global South: people say editors migrate to local Wikipedia, test that hypothesis
Dario:
anon editor acq (signup CTA)
Lila: so we can make people register, but revert rate is higher?
Aaron: can dramatically increase regs, but get fewer productive edits
interruption by CTA had unintended impact on productivity
Lila: how do we explain that?
Aaron: there is a group of anon editors who continue to do good work, and we interrupt them
Lila: figure out why this is happening, and how we can do it do what we want it to do
Toby: working on this with Growth, i.e we are not the only people thinking about this
Dario:
Article survival: how AfC workflows impact content growth
this doesn't necessarily feed into editor model
Lila: need to understand that in the end, goal is knowledge provided and consumed (not # of editors)
Erik: question for Growth team is more general, article creation is just one part of editor activities
Toby: do we share this with the people who run AfC?
Aaron: yes, been interacting with them
Toby: has it lead to changes?
Aaron: some of people disagree with some the study's conclusions -- many confirming the conclusions or arguing that they don't go far enough. The discussion is ongoing, and relevant changes are happening.

Dario: focus areas for support in last four quarters
e.g. Growth had full support
Mobile getting good support now
Fundraising: hiring
Lila: baseline support?
Erik: also, there is the basic org-level prioritization: editor engagement, mobile, ...
but there might sometimes be a generally low priority level area with particularly high needs
Erik: Multimedia is a good model
but could have helped them drill a bit more on overall success metrics (in addtion to more granular metrics)
Toby: e.g. talked to them a few days ago and helped to understand how it ties into TAE, very good impact compared to how much time we needed to spend
Dario:
Community support: e.g. ptwiki, GLAM toolset
Outreach:
research showcase, conference presentations , ...
Toolkits and documentation

Goals for Q1[edit]

embedded model is working well
can do even more on integrating with UX research
Metrics standardization - second big focus
Lila: want to arrive at baseline metrics for each team
you can be prescriptive about that, after working with them
Dario: yes, as Toby said, want to become more assertive (after consultation of course)
Toby: with Growth and Multimedia, we've got two good case studies
Dario: also want content curation metrics: edit funnel, deletion/reverts affecting new users
Lila: are the metrics that you delivered already fully instrumented across teams?
Dario: still need to do some socializing
Toby: the green metrics are there, filling out the blue boxes now
Lila: can we quantify these goals (for the Analytics team)?
Toby: can definitely do that
Erik: my suspicion is that we will spend a lot of Q1 on that socialization, not get much further
Lila: need resourcing plan
Dario:
Topical research: same areas as in previous quarters
Mobile push for apps - track adoption
Leila will work on editor modeling
new part: traffic. definition for readership metrics
partly because of limitations of comScore data, partly because we need inhouse definition
formal collaborations, for outsourcing:

  • Knowledge graph (for Flow): GroupLens, UNM
  • traffic aggregation: LANL

Toby: GroupLens did interesting research on effectiveness on content accquisition campaigns
Erik: VE will restart engagement with enwiki, needs help with data
Dario:
Staffing: reqs for FR, traffic research
i.e. up to 7 FT at the end of 2nd quarter
Toby:
Challenges:

  • Focus
  • lower level tasks, legacy support
  • development transparency (team was understaffed)
  • community engagement

Asks:

  • we have stolen resources to do PM, still lack management capacity. need project manager/scrum master
  • techops support
  • exec support for standardization
  • interns for operational research (we are working with Design team on this)