Wikimedia Foundation metrics and activities meetings/Quarterly reviews/Analytics/June 2014
Present: Lila Tretikov, Erik Möller, Toby Negrin, Dario Taraborelli, Kevin Leduc, Jessie Wild, Leila Zia, Tilman Bayer (taking minutes)
Participating remotely: Andrew Otto, Aaron Halfaker, Oliver Keyes, Nuria Ruiz, Erik Zachte, Dan Andreescu, Christian Aistleitner
Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material
Introduction & Strategic Goals - Toby
Research and Data - Dario
Development - Kevin
Summary and Q & A - Toby
Introduction and strategic goals
this the review for Q4
(attendee introduction round)
Asks of audience
this is a bit different from previous quarterly reviews:
- team is becoming more mature
- new ED
new strategy over last 1-2 months
reflections from my first 2 quarterly reviews:
first one - I didn't yet understand WMF; what I brought from industry was wrong here
2nd review: excited, but lots to do
now: really making progress
group structure: 2 teams: development / research and data
3 devs, 1.5 techops, 1 PM in Dev
5 researchers/developers in R&D
so roughly 50:50
that ratio will probably persist
R&D is primary stakeholder
development team is somewhat protected from our customers
teams collaborate on e.g. standardization, privacy
revisiting R&D model
followup on Q3
focus has always a problem for us, both teams do a lot of work in many areas
good quarter for community area: hackathons, maintain high level of relations with external researchers
had great offsite, good for collaboration on e.g. standardization
took place near Zurich, with entire team (except Oliver) - first time all were together (team is very distributed)
spent time on team values, privacy, future
Balancing Privacy and Understanding:
Some community members are very passionate about it
I think it's something we can lead the industry on, but also need to balance with data need
In past month, became clear that we need to take leadership on practices and training
Kevin worked on this with e.g. Dario, Howie/Product...
made good progress
Not a roadmap change, but need to put in more specific intent
Be more assertive, help people reach their goals
Be more involved in the beginning
what we found: a lack of consistent goals, metrics and techniques, makes it hard to compare and communicate
Help standardize things through outreach, training, instrumentation
I like models ;)
Classic model: data - info - knowledge - wisdom (DIKW); maps to our work (slide)
Systems/services overview (slide)
left - software
right - knowledge and services
would like to have fewer systems, make things clearer
thing that stuck out: had lots of metrics, quite granular, hard to compare
big metric at top: active editors
I spent some time in gaming industry - not a great industry, but they did have a good model
Lila: have we looked at whether active editors is the right metric?
Toby: not necessarily, but it's the person we brought to the dance ;) worked with it for ~3 years, certainly not the only/defining measure for successful encyclopedia, but the problem WMF has set itself to solve
- goal (active editors)
- metrics (acquisition, activation, retention)
- levers (e.g. new registrations)
made it much easier for product teams (e.g. VE, Growth) to understand impact
Lila: from observation, some teams have a hard time mapping instrumentation into these
e.g. VE team doesn't seem to have good grasp on what needs to be monitored
Toby: that's what I mean by Analytics team becoming more assertive, training
Lila: talk about this later, but could think about embedded model
Toby: totally, and recall from the Growth quarterly review that they took these boxes and applied them to their work
Lila: yes, they are closest I've seen to that
Should go into teams' Q1 goals
they go into blue boxes...
Dario: (some teams) focused on low tier metrics, not realized how/that they are contributing to higher tiers
Erik: teams like VE and Flow have big underlying hypothesis, harder to chunk
causes a bias to focus on milestones, granular metrics
e.g. current Teahouse work for Flow: smaller scale, but more immediately
on the other hand, Growth does smaller projects
Lila: I think everyone should try to do this
what's important is not the answer to the question, but that the team asks it
Toby: I got traction on this with Product people, works for them
Dario and Aaron did good work on pivoting with metrics
Talked with Grantmaking (Anasuya/Jessie): this has value outside Product, and outside WMF
Still to do: release planning, visualization, post MVP metrics, community engagement
Jessie: we already used that model (acq/act/retention) to categorize our grantmaking, starting with IEG
people understand this
I'm the new PM for the development team
started shortly before Q4
Lila: so were your assumptions wrong too? ;)
Kevin: I didn't have any ;)
Lila: reporting to Toby? yes
biweekly commitments for sprints
we lost two team members shortly after I joined
still got lot of work done
stopped using Mingle, experimented with Phabricator for a while
now using Scrumbugs (built on top of Bugzilla)
and Etherpad for realtime collab
spreadsheets for daily work
got good at planning sprints
next step: release planning
most of team is remote
team overview: ...
3 devs ,1.5 ops, Andrew Otto and Jeff Gage
Toby: wanted reduce burden on Andrew as he is the only ops person working on analytics
he was single point of failure
Erik: also, need...[?]
Toby: hope we can get Jeff integrated
Toby: our meetings are hard to attend for WMF-based folks
Dario: also have used quite a bit of time of Sean Pringle..
Lila: if there are specific tasks, they should be on [other teams'] todo list
Kevin: ... impact on team and its bandwidth
Ori will be joining 50%
Lila: so still looking for these two open reqs?
what we planned in Q4 (slide from last time)
epics (not products, but groups of features)
Lila: that is the problem with epics, they are broad and are hard to fully solve
what is the pageview API?
Toby: we make logs available, community creates various APIs
Kevin: working on pageview definition
metrics on what we did (#of stories, Bugzilla bugs)
Lila: typical story complexity?
Kevin: something that can be achieved in 2 weeks
Lila: do you have a velocity in mind?
Toby: did 8 story points per sprint, team decided to move up to 13
had so many production issues (that held up work)
then people started to grab other things on the side ;)
Lila: set goals you can achieve, then a high watermark
Toby: we were aiming for predictability, can start to push now
Lila: do you size bugs?
Kevin: we track hours in every standup
also on non-planned work, often production issues
Lila: that's fine, as long as you are tracking running averages
Kevin: example for production issues:
- datacenter migration caused issues with slaves
Wikimetrics: user-friendly tool generating reports, originally created for Grantmaking
wanted to avoid that people create own metrics and then compare apples/oranges
Wikimetrics enhancements this q:
- recurring reports
- public reports
- expand from user cohorts to entire projects (e.g. enwiki, dewiki)
new model metrics:
- newly registered user
- 3 others ready for implementation
Mobile usage (Oliver)
EventLogging transition (took over from Platform)
goal: make/keep it fully operative, no new features
Metrics Definition Standardization
Lila: there is a lof of stuff here but you need to describe the benefits of all this work for the WMF, the team, the community
Kevin: Vital Signs will benefit entire org
Dario: and community
Kevin: accurate pageviews for WP0
work hasn't started, dependencies on Hadoop
Toby: WP0 needs these for marketing to carriers, and for evaluating success of program in general
Erik: images (upload.wikimedia.org) too?
Lila: what do the colors mean?
Andrew: "messages" = single web HTTP request to WMF servers
Lila: so an AJAX type request could be several messages?
most of what we count as pageview is probably in "text"
Erik: are we already generating metrics from this? not yet
Toby: already used it to debug some requests, resolve puzzling issues
Oliver: most of my recurring code still uses sampled logs
but e.g. session analysis stuff came from Hadoop already
req for geowiki work
Toby: Anasuya forwarded this, so we will be able to work on it already
Lila: Q1 goals seem quite abstract
Kevin: Vital Signs: complete MVP (dashboard and metrics for lifecycle model)
Lila: should list the graphs everyone should be looking at
Toby: this is an extension of editor Model
- EventLogging: operationalization, geocoding IPs
Lila: what am I going to get out of it?
Erik: this is not for reach (pageviews)
Dario: for now ;)
Erik: ... but for editor engagement, has been used there for a while
geocoding is just for understanding geographical differences in editor behavior
Lila: OK, trying to get at the need for this
correlate activity of an editor throughout workflow?
Erik: yes, in aggregate
Hadoop is for big data stuff, e.g. uniques
Toby: EventLogging enables arbitrary instrumentation
Erik: e.g. dashboards for VE
a very generalized system
Lila: what does operationalization mean here, what are the benefits?
Toby: Ori basically managed EventLogging for himself
Erik: Ori was the EventLogging alert system ;)
Toby: so this is for keeping it up and running
Dario: Vital Signs: don't have a consistent way of generating metrics across projects
Kevin: (continues on goals)
- Refinery: new Hadoop release,...
- Wikistats: enhancements and bugfixes
- support existing/legacy systems
Still prioritization needed for asks from other teams: WP0, Geowiki
Lila: need to get teams together on this
Jessie: not mentioned: talked about WikiMetrics further development
Research and Data
team: 5 people - a year ago, we were 2, grown over last quarter
goal is to produce knowledge and support decisionmaking
primarily quantitative, but also work UX
Lila: anything apart from user behavior?
(Erik, Tilman and others: digression on editor surveys)
Q4: standardization, topic research, ...
had monthly active editor metric
expand to rolling monthly
add levers: ...
Lila: why Italian Wikipedia?
Dario: interesting large project, different pattern
Toby: acquisition is not on this slide
Lila: are you happy with the breakdown?
Toby: striving for simplicity
Erik: help us iterate on which are the right ones
Lila: little upwards trend in enwiki new active editors at the end, is that an artifact?
Aaron: no, but not significant yet (?)
Dario: topical research:
Mobile: laid down groundwork for breaking down traffic by regions, device[?]
chart for tablet redirect pageview impact last week - nice example that we can now react quickly
Lila: only a small number of editors will register...
Toby: needed for editing on Mobile
editor acquisition - where do new editors come from
Lila: where did the 5 edit threshold come from? did we run a learning algorithm back then?
Dario: legacy definition
Lila: so it was arbitrary, right?
Erik Zachte: We chose that for Wikistats early on to restrict to people who seriously try to add/edit content, not just testing out site
Dario: activation rate: highest contribution from those who come from specific topic, less from generic internal referrals
rate has been quite flat on these large projects over recent years
Toby: smaller projects can be quite different
Dario: also, more different (e.g eswiki) on smaller thresholds, converge on 5+
Leila worked a lot on editor survival
first time we do this kind of analysis
active editor migration - do editors leave or migrate between different WMF projects?
Lila: do we care?
Dario: more holistic understanding, migration patterns. ErikZ worked on that
Jessie: Global South: people say editors migrate to local Wikipedia, test that hypothesis
anon editor acq (signup CTA)
Lila: so we can make people register, but revert rate is higher?
Aaron: can dramatically increase regs, but get fewer productive edits
interruption by CTA had unintended impact on productivity
Lila: how do we explain that?
Aaron: there is a group of anon editors who continue to do good work, and we interrupt them
Lila: figure out why this is happening, and how we can do it do what we want it to do
Toby: working on this with Growth, i.e we are not the only people thinking about this
Article survival: how AfC workflows impact content growth
this doesn't necessarily feed into editor model
Lila: need to understand that in the end, goal is knowledge provided and consumed (not # of editors)
Erik: question for Growth team is more general, article creation is just one part of editor activities
Toby: do we share this with the people who run AfC?
Aaron: yes, been interacting with them
Toby: has it lead to changes?
Aaron: some of people disagree with some the study's conclusions -- many confirming the conclusions or arguing that they don't go far enough. The discussion is ongoing, and relevant changes are happening.
Dario: focus areas for support in last four quarters
e.g. Growth had full support
Mobile getting good support now
Lila: baseline support?
Erik: also, there is the basic org-level prioritization: editor engagement, mobile, ...
but there might sometimes be a generally low priority level area with particularly high needs
Erik: Multimedia is a good model
but could have helped them drill a bit more on overall success metrics (in addtion to more granular metrics)
Toby: e.g. talked to them a few days ago and helped to understand how it ties into TAE, very good impact compared to how much time we needed to spend
Community support: e.g. ptwiki, GLAM toolset
research showcase, conference presentations , ...
Toolkits and documentation
Goals for Q1
embedded model is working well
can do even more on integrating with UX research
Metrics standardization - second big focus
Lila: want to arrive at baseline metrics for each team
you can be prescriptive about that, after working with them
Dario: yes, as Toby said, want to become more assertive (after consultation of course)
Toby: with Growth and Multimedia, we've got two good case studies
Dario: also want content curation metrics: edit funnel, deletion/reverts affecting new users
Lila: are the metrics that you delivered already fully instrumented across teams?
Dario: still need to do some socializing
Toby: the green metrics are there, filling out the blue boxes now
Lila: can we quantify these goals (for the Analytics team)?
Toby: can definitely do that
Erik: my suspicion is that we will spend a lot of Q1 on that socialization, not get much further
Lila: need resourcing plan
Topical research: same areas as in previous quarters
Mobile push for apps - track adoption
Leila will work on editor modeling
new part: traffic. definition for readership metrics
partly because of limitations of comScore data, partly because we need inhouse definition
formal collaborations, for outsourcing:
- Knowledge graph (for Flow): GroupLens, UNM
- traffic aggregation: LANL
Toby: GroupLens did interesting research on effectiveness on content accquisition campaigns
Erik: VE will restart engagement with enwiki, needs help with data
Staffing: reqs for FR, traffic research
i.e. up to 7 FT at the end of 2nd quarter
- lower level tasks, legacy support
- development transparency (team was understaffed)
- community engagement
- we have stolen resources to do PM, still lack management capacity. need project manager/scrum master
- techops support
- exec support for standardization
- interns for operational research (we are working with Design team on this)