Wikimedia monthly activities meetings/Quarterly reviews/Analytics/March 2014

From Meta, a Wikimedia project coordination wiki

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Analytics team, March 31, 2014, 9:30am to 11:10 am

Present: Carolynne Schloeder, Dan Andreescu, Kevin Leduc, Leila Zia, Oliver Keyes, Erik Moeller, Sue Gardner, Dario Taraborelli, Tilman Bayer (taking minutes), Toby Negrin, Jessie Wild, Howie Fung
Participating remotely: Aaron Halfaker, Andrew Otto, Charles Salvia, Christian Aistleitner, Erik Zachte, Mark Bergsma (until 11am), Nuria (until 10:30), Tomasz (from 10:30)

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Presentation slides (full deck including Research and Data part)

Agenda[edit]

  • Introduction (5 mins)
  • Follow up to Strategic Plan (10 mins)
  • Research & Data/Q&A (40 mins)
  • Break (5 mins)
  • Development/Q&A (35 mins)
  • Prioritization (15 mins)
  • Conclusions/General Q&A (10 mins)

Introduction[edit]

Toby:
Welcome
Six months ago, presented strategic plan (mission) to Sue and Engineering directors
I will talk a bit about problems and opportunities
then Dario on Research and Data, then break, discussion, prioritization

Group structure:

  • development team builds infrastructure, collects data
  • research and data team (R&D) analyzes and presents conclusions

Strategic plan[edit]

As mentioned in last review: After three months, I realized everything I thought was wrong ;)

Values: transparency, strategic alignment, flexibility, collaborative

Turnaround plan (after looking at academic literature)
Now a good performing team (if not yet high performing), everyone stepped up
Focus area updates:

  • Staffing:
    • research and data team has critical mass now
    • development team staffing complete, Kevin product manager now, taking this off my (Toby's) plate
  • Scope:
    • When I came in from other industries, I was surprised about the focus on participation (as opposed to readership/monetization)
    • Took over support for EventLogging - written by Ori, with Dario as kind of product manager

Execution:

    • established operating model for R&D
    • developing Agile production model

Q4 (might be a little fluffy):

  • Impact
  • Focus, balance (quickly/well, WMF /community, product/research, ...)
    • We tried to fix things by being very responsive, but this also robbed us of focus, getting longterm work done
  • Collaboration between the two teams
    • Will have first team offsite in Zurich
  • Integrate with community
    • first "incident": complaints about missing pageviews API, now at least supporting Henrik with new server
  • High priority initiatives - ErikM asked for two specific epics in next Q:
    • Editor Engagement Vital signs
    • Mobile Metrics

ErikM: High level: ED transition coming up, new ED will want to become familiar with data very quickly
Monthly progress reports are stil focused on slow-moving metrics like active editors
E.g. still don't have good data about apps traffic portion
need more on e.g. editor acquisition funnels - have a few dashboards here and there, but not integrated
A time horizon of a few months into the past is not sufficient for someone who tries do get a deeper understanding of the organization
Also missing basic annotation feature for dashboards ("this is what happened in Sept 2013")
Toby: let's discuss this at the end of the meeting, but we have already been working on both epic areas, good timing
Vital Signs: ... building visualization in spring, ...
Mobile metrics: did deliver our first report to Mobile team this morning ;) (on browser share), had some hickups on the way, but it's working
ErikM: so is Oliver the first person who has been working with this new infrastructure? yes
Oliver: Seems reliable, doesn't yet contain apps data (they use the desktop API, and so aren't included. Switching to mobile API, and so will be.)
Toby: questions about this intro? (none)

Presentation slides (Research and Data)

Research and Data[edit]

Dario:
this will be about Q3 plus part of Q2
(last review happened halfway in Q2)
metrics standardization:
Vital Signs - many stakeholders, in particular editor engagement
capture important phenomena
e.g. "what is the current survival rate of new editors on ARWP in the last week" - currently not answerable without running ad hoc queries
specify what we want to measure, come up with definition based on data we have
careful to compare apples with apples (projects have differing policies...)
ErikM: so not actually creating dashboards yet, but the definitions these will build on? yes
Focus areas remain the same - Growth and Mobile
team basically Dario, Aaron, ErikZ part-time
support Growth team with GettingStarted, especially as they extend to different languages
next Q focus on anon editor acquisition
may be one of the most productive segments to target
article creation and deletion trends, also outside ENWP
Mobile user acquisition, did two A/B tests for Mobile team
New data source: unsampled mobile logs, can ask new questions
Oliver did research on mobile browsing sessions
Toby: also, tune up privacy transparency, explain to people how we use data
Dario:
Other projects:

  • 2013 traffic trends analysis (several people worked on this for weeks), produced a good understanding of page view trends, at least on desktop
  • Performance A/B testing for Platform
  • supported Legal on privacy/data retention policies
  • Fundraising research knowledge transfer (Sahar, ...)
  • And a lot of other small support for other teams and community

Staffing basically doubled in two weeks, needed to work on coordination
Toby: ...
Dario:
now have solid structure
team coordination happens on Trello, standups, weekly group meetings. Started monthly showcase, became public in February

Q3 retrospective:
delivered stage 1 metrics
focus areas
worked on unanticipated major projects and team integration

Q4:
standardization:
stage 1 was new users
Q4: stage 2: community (editors, active editors, ...)
topical research:
build a body of shared knowledge, not siloed but cross-team and for new ED

a. mobile use

country/projecct/device/...
also for WP0, Growth
Carolynne: users means readers and editors? yes
(Dario:)
have bits of research about Mobile we haven't yet consolidated
Carolynne: marketing to people who don't yet know what WP is, interested in what we can find about them

b. Growth outside enwiki

Dario: haven't actually run a thorough analysis since Editor Trends Study
e.g. know eswiki is taking off with regards to mobile accq
Toby: also, reframe "Wikipedia is dying" meme - this could help people understand better what's going on

c. editor trajectories and retention

don't yet have good understanding of lifecycles and how they affect retention
which new users are the best target of acquisition/activation workflows?
Sue: definition of power user would be good
have been using "experienced editors" as a loose phrase encompassing both power users and long-tenured editors
Dario: this will be tackled in stage 2 of metrics standardization

d. anon editors

can hopefully support much more research requests in the future
Some of the dashboards currently scattered around limn instances will be consolidated as part of the Vital Signs projecct
Staffing: about to hire new f/t FR researcher (replacement for Sahar)

-- break --

Development[edit]

Toby:
every team member is either remote or new or both
speaks to team's ability to collaborate
Jeff is helping out 50% on Techops side
Q3 focused on:

  • standardization
  • WP0
  • vital signs
  • mobile metrics (ErikZ)
  • Wikistats
  • Kafka/Hadoop

got a lot done in Q3
some onboarding and task switching costs which could have been anticipated
Epics we worked on (list)
other things we worked on (list) - picture clearly shows team is working hard, but I'd like a little more strategy
Dan is Scrummaster
Agile helped understand our velocity:
less than 50% of time was spent on committed tasks.
Dan: or more like around a third
to some extent this is OK, but product manager can do more to protect time
Toby: Agile really helped us

Hadoop/Kafka

basically all our ability to understand pageviews is based on this
Kafka transports pageview info from servers to Hadoop, which folks like Oliver have been using for their research, will be our data warehouse
Mobile logs: pretty scalable, handling a lot of requests
Original architecture (when I came in): diagram
webrequests log stream was unreliable, turned it off
ErikZ has parallel infrastructure for sampled logs, used together with dumps, will work with him to replace/sync that
simplified structure, took out some pieces
all open source, comes from LinkedIn, know some folks there
The Ulsfo and esams datacenters transfer all data to eqiad

Challenges
  • Andrew Otto did phenomenal job, as did the rest of the team

but he had to do this largely by himself, difficult

  • critical infrastructure has been built (eg. used by Search now)

built legacy interface in case anyone else still needs UDPlogs

  • Network problems (as always when doing things across the ocean...), suspecting switches in esams

Course correction:
putting some more resources now into wikistats, but we need to continue building stable infrastructure
Asks, in particular from Ops: resourcing, desktop data, esams issues, capacity planning

Questions on Kafka / Hadoop/ data? no

Wikimetrics

Dan on high level reasoning
Dan: sat together with Dario to think about the overall picture
current process for feeding dashboards: basically randomly started sql queries, ...
expose results publicly
Toby: worked with Jessie, some nice synergy, which kind of justified the extra time spent
Dan: Wikimetrics focused on publicly available data, so publish results too
kind of in the home stretch, except maybe performance (time needed for processing requests)

Q4 goals

Toby:
Epics (list)
ErikZ worked on Wikistats features like search, deployed soon (it has been hard to find stuff there, this will improve things)
(preview: http://www.infodisiac.com/cgi-bin/search_portal.pl?search=views )
goal/challenge: stay focused on what we commit to

Prioritization[edit]

Toby (shows list slide): ErikM's asks on top of list
production issues always priority 0
new: EventLogging (Ori pressing for this)
Pageview API - community asking for this, everyone on the team would love to work on this, but some other things need to be done first
accurate WP0 pageviews, Limn simplification - might fall off
Hard choices for all of us, I believe this is the right prioritization, but happy to talk with Jessie and Carolynne, maybe some low-effort things we can do
Jessie: #1 priority for our team, kind of discouraging to see this deprioritized quarter after quarter
Carolynne: unsure about how to assess the importance of the WP0 accuracy improvements
question now about country-level data, don't think this it too hard?
Toby: we're at the point where we can make more accurate predictions
ErikM: distinction between RD and dev = often the distinction between readability / one-off ...
Oliver: Geolocation is very, very slow except in low-level languages....which requires dev involvement. Example: geolocating ~9 million IPs has so far taken 56 hours, and is still working. Zero has more than 9 million hits a week.
Toby: in this case, had to make some dashboards private
Dario: investing into Mobile overall, also standardization across projects
these don't immediately adress your needs, but ultimately they will yield e.g. country-level data
compromise, but will be valuable for you
Jessie: agrees
Carolynne: we are committing to partners that we will have dashboards, worked with Christian
Toby: Christian got dashboard creation process simplified, but may still need to be batched
Carolynne: continue conversation in later meeting

Conclusions/Q&A[edit]

Toby: (continuing challenges)

  • Community Engagement
  • dev transparency

Sue:
blockers have included integrating FR research, Ops
Toby: this is the best Ops team I have worked with, but Ops resourcing is challenging
Sue: plan for that
ErikM: Ops has hired 4 people in last few months, lots of onboarding work for them
Sue: so just a matter of time for them
Toby: I worked on lots of unreliable systems in the past, so I'm still wary
e.g. Oliver's work is still kind of skunkworks stuff
Sue: other consistent issue since even before the team's founding has been people being pulled in various directions, spread too thin to deliver their big projects - that's where magic Kevin comes in ;)
Toby: tradeoff between being everybody's friend and getting things done ;)
Dan: I sort of got thrown into the Scrummaster role, took a while to get good at it
Sue: not criticizing the team, everyone did the best they could, delivered a lot
I see this as ongoing conversation about increasing focus etc.
I know that Carolynne and Jessie are a bit sad now ;)
but better than team getting distracted and not delivering anything
Editor metrics foundational for everybody's work
Carolynne: don't know how big our dashboards ask is
ErikM: didn't talk about privatization stuff (country-level)
talked about this with Anasuya, something for next FY
Dan: figure out privacy policy/90days issue for pageviews... this took a lot of time
Toby: wanted to go above and beyond, show how a website should treat (esp mobile) user data
Sue: working with Luis on new pp?
Toby: yes, Luis and Michelle