Wikimedia monthly activities meetings/Quarterly reviews/Analytics, UX, Team Practices and Product Management, April 2015

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Analytics, User Experience, Team Practices and Product Management teams, April 16, 2015, 9:30 - 11:00 PDT.

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): Erik Moeller, Kevin Leduc, Dario Taraborelli, Leila Zia, Abbey Ripstra, Damon Sicore, Lila Tretikov, Garfield Byrd, Jared Zimmerman, Tilman Bayer (taking minutes), Terry Gilbey, Toby Negrin; participating remotely: Marcel Ruiz Forns, Andrew Garrett, Arthur Richards, Dan Andreescu, Aaron Halfaker, Ellery Wulczyn, Joseph Allemandou

Erik: welcome
Toby: welcome

Analytics Engineering

Kevin:

[slide 3]

objectives summary

[slide 4]

successes
unique clients: novel method without privacy downsides
(shows dashboard)
Toby: this is great work; some inconsistencies with wikitext data remain
Dan: with this instrumentation, can see more different types of actions, such as intent
e.g. shows spike that indicates failures of captcha challenge
caveats: wikitext instrumentation uses multiple session IDs (per edit[?]), but see same user saving several pages with only one ID
60% anon editors doing... does not match other data
can compare success/failure rates
graph is still wrong because of this anon identification issue
Erik: so we should not treat any data as canonical yet? or just wikitext?
Dan: would need to ask Aaron for that [i.e. can't confirm solely from analytics engineering perspective]
Terry: we can't say VE is ready if data is not ready
Kevin: this is the most complex instrumentation we have ever done, may need to be revisited
Lila: revisited in an incremental way, or completely?
Aaron: working with engineers on Editing team to resolve bugs, making progress
Lila: ETA for when data will be clean?
Dario: tight timeline for A/B test
Terry: will we make it?
Dario: lots of other dependencies
Aaron: we are working to adjust timeline today
Lila: let's try not to shift too much, a lot hinges on this
Terry: ...
Toby: need to go into launch feeling comfortable
Damon: should give team room to [do what's necessary]
Lila: why do we catch this now, instead of two weeks ago?
Damon: started focusing on this two months ago, finding data quality issues now
Lila: we should flag such issues earlier
Aaron, Damon: notified other teams
Terry: we [execs] want to help, better to flag possible issue three weeks earlier [even in case it turns out to be solvable without exec support]
Lila: especially if it puts ship date at risk
escalate as needed
Arthur: communicate to other teams only, or to execs too?
Lila: no, escalate only when no reaction (about either allocating resources, or adjusting dates)
Dan[?]: at beginning of March, we knew about anon problems, shortly after that, we [communicated(?)]
solved a lot of issues but some still remain
then emailed people saying we needed help
from my POV, did everything we could to communicate
Terry: of course we are playing Monday-morning quarterback here, with hindsight
Lila: not blaming you (team)
Dan: need better process to allocate resources in such a situation
Terry: good point
Dario: data quality is important
in 2013 pre-beta launch VE experiment, interactions were not optimal
Abbey: there are also still usability issues
--> Lila, Terry: flag such issues [data quality problems in this case] earlier, communicate with other teams about either allocating resources or adjusting ETAs, escalate to execs when not resolved
--> Toby: should do retrospective
Erik: starting today there is also a regular VE pre-launch meeting series, with execs
Lila: that's what should happen
Erik: a lot of work earlier in the quarter was focused on perf improvements, which meant that this work got pushed back

[slide 5]

misses
Kevin:
unique clients blocked on Ops, not getting prioritized right now
Toby: in Mark's defense, he is under-resourced in this function
came up in middle of quarter
--> Terry: we as execs need to dive deeper in our regular meeting [to identify such issues where one team is blocked on another]
Lila: breakdown in process when prioritizations don't match ...
Terry: agree, it's both bottom-up and top-down
Abbey: I need milestones / dates in Phabricator for me to prioritize my team's work
--> Lila: publish milestones / dates by function [e.g. in Phabricator]
Toby: Analytics team uses logging platform that was once written by Ori as a side project, now serves in a central function
lacking Ops support was serious drag for a while
now hiring devops within Analytics
--> Lila: I want to give teams freedom to allocate headcount in this way [e.g. Analytics hiring their own devops person]
not anybody's fault (e.g. Mark's) here, but a structural issue
--> teams: don't be shy, call a meeting, manage us [execs] as much we manage you
Damon: see this pattern emerge widely in startups where Analytics / data science team needs to rely on devops [too much], hire themselves
Toby: of course there is a certain efficiency in not everyone maintaining their own Hadoop infrastructure
--> Lila: can centralize later when we see multiple teams doing the same thing - but not upfront
Toby: Analytics works a lot to provide self-service tools to other teams
Lila: dashboard is really valuable

Research and Data

[slide 6]

Dario: ...
Lila: Analytics dev team seems to be doing similar things?
Dario: yes, they do infrastructure side, we do research side
Toby: Aaron came up with the "last visited" idea that allows us to do uniques without privacy concerns
Dario: data has been crash-tested for 2 quarters by now, we are confident about it
Toby: available on cluster, other teams can use it with their own definitions
Dario: provided all data that WikiGrok team needed for their decisions

[slide 7]

Fundraising [research] as minimization problem: better methods allowing to reduce testing at same level of information
Ellery took lead on this
big progress
Lila: implementation on their roadmap?
Dario: (yes, as...)
Lila: wow
Dario: other part postponed
Toby: Ellery is doing a lot of other work (interested in applying knowledge gained about pageviews elsewhere)
Dario: started our first formal collabs with academia earlier
one of this completing now, delivering funnel analysis
but later (May)
Toby: context: this is about scaling research. Dario, Leila and Aaron have a lot of questions [that outside researchers could be interested in working on]
had workshops at research conferences

[slide 8]

(Dario:)
success
have powerful solution
Lila: this is weak AI? yes
would like to see roadmap of different algorithms we might be working on
--> have an 1h meeting to align research priorities with org as a whole, in the next 3 weeks or so
Dario: Ellery also started looking at recommender services

[slide 9]

misses
Leila: link recommendation: wikilinks [to other wiki pages] mostly added manually currently
Dario: extracting browsing traces would allow adding links automatically, or provide suggestions to editors
Toby: also interesting info on how long readers stay on page (until they click link)
Dario: (commitments this q)
decide whether or not we want to productize this, and work with community
Toby: for productization, e.g. putting link recommender in VE, might need higher quality
and haven't done this kind of thing in the past
Dario: ... because we haven't work in such areas before
Toby: ...
Damon: push this kind of decision making process to earlier stages
(so that this doesn't become a gatekeeper function, with a yes/no only after the work has already been done)
Toby: to push back a little bit, I want to give team room for exploration
Damon: yes, they will always have that room
Lila: decision should be between Leila and Trevor (ask whether it would be helpful for Editing team, ...)
right stakeholders should be in the room
Jared: trace work is as much relevant outside Editing though
Lila: focus on the stakeholder
e.g. this is the reader for (a link improval feature), for Analytics team it's other teams
managers (here Dario) need to buffer that
--> Lila, Damon: decision process about which research ideas to productize should be moved to earlier stakes
--> right stakeholders should be in the room, e.g. Leila and Trevor deciding whether to implement an Editing feature
Damon: Legal has some great ideas about workflows, might be relevant here
Leila: Research is moving pretty fast, would need more infrastructure support going forward
unique clients work is related to this link work
Dario: other responsibilities, e.g. provide data for reporting

Team Practices

Arthur:

[slide 11]

[slide 12]

successes
major success that I'm proud of: supported Call to Action
in health check, identified two major issues across teams:
way too much tech debt, not enough automated testing

[slide 13]

misses
cross-team issues in Engineering
Phabricator enabled our success with Editing team, beyond expectations, but Burndown chart functionality limited, IMO misleading
Terry: where are we in getting Phabricator fixed?
Arthur: talked with Greg about this
plan conversations with various stakeholders, inlcuding WMDE who provided current burndown feature
also at Lyon Hackathon in May
Terry: are we able to get this done before...
Damon: recall we added some resources for improving Phab
if we now know they are needed before end of year...
Arthur: don't have capacity right now
there is a team member in RelEng who is working on Phabricator tech, but he cautions that making local changes involves very taxing work
Terry: I thought we had allocated resources for two members some time ago?
Damon: takes time
Lila: this is perfect for outsourcing
Erik: Arthur, your team is primary customer for Phabricator as PM tool, I would recommend you act as customer [for outsourcing]
background: that sprint extension [for burndowns] was developed by intern at WMDE within a few weeks
should not be a lot of work to build something better
--> Lila: will forward Arthur vendor list for outsourcing of work to fix burndown functionality of Phabricator, [you] talk to authors of extensions, we will give you resources

[slide 15]

VE burndown chart
[slide 16]
Erik: health table - good tool for org overview

User Experience

[slide 18]

Jared:
combined teams in this review; Abbey will weigh in a bit too
big success: living style guide
get things that are currently in extensions into core, e.g. dropdown menus
made promise[?] about accessibility
new metaphor for progress buttons
thanks to work with community members like TheDJ, can now simulate accessibility issues like various color-blindness types right inside guide
worked with Trevor and his team to get set of icons
instead of individual set of sprite assets, one comprehensive icon set
dynamically color them, saves resources
Lila: which designer did this?
(Jared:) May with intern, and Michelle
Lila: and the dynamic style guide?
Jared: Prateek, Andrew G and me
Erik: have UX engineer?
Jared: no, that's why not in core yet
Damon: "not in core" meaning?
Jared: means many things are still in extensions
Lila: when making a change in style guide, do updates happen automatically?
Jared: that's what is exciting about this: things changed in core propagate elsewhere
Andrew(?) made visual regression testing
i.e. guide does not just reflect how things should look like, but how they actually look like
Aaron [on hangout chat]: Huge +1 [on value of this]. I'd rather use OOjs UI and have it look like Wikipedia does
Damon: can people extend these elements?
Jared: will work to pull [it out like that, yes]
UX team members will act as gatekeepers (do you need something completely new? etc.)
Lila: what if someones overrides this?
Jared: will be reflected in regression tests
Damon: detect overriding in e.g. CSS, JS too?
Andrew, Jared: maybe not changes that far down the line
we believe that giving people this system will remove need for many custom-made things
Jared: Hovercards: in the works for 8 months or so, launched Hovercards [as default] yesterday on two wikis (Greek and Catalan Wikipedia)
Lila: are there community reactions already?
Jared: yes, mostly positive or constructive
this was an ask by local admins
had 2-3 months trial
needed to evaluate non-English lang feedback
check with Ori that load on servers is manageable
Toby: great feature, but serious implications for fundraising
Jared: knew from day 1 that this feature will decrease pageviews (similar to MediaViewer)
look at engagement metric instead of pv metric
see whether increase in engagement offsets decrease in pvs
Lila: how do we measure?
Jared: how often opening hovercard, clicking links, accidental hovercard views, opt-outs
Damon: noting trace of hovercards is going to be really important for understanding reader behavior/information needs
Jared: decrease in pvs will mean less displays of FR banners
Lila: are we measuring time spent on site?
Jared: looking into that, but not yet
Erik: note that Hovercards are a desktop only experiment so far
Jared: mobile [apps?] will try a tapping version of them
Damon: how do they relate to Collections?
Jared: connecting with that will be next step

[slide 19]

successes

[slide 20]

misses
Abbey: REFLEX (benchmarking tasks for specific users)
need testing environment
currently have three mirrors
need to run pilots to choose right tools
Jared: allows us to do qualitative measurements in a quantitative way (i.e. at large scale)
working with Legal team to ensure vendors' terms of use are compatible with ours

Product Management

[slide 22]

Erik:
group did not have specific goals, but the objective of implementing product development methodology
ended up focusing most of attention on reorg at executive level instead
so this work ended up being deferred
started using Phabricator to manage roadmap
pluses and minuses on that - Phab does not have concept of time (deadlines), requires workarounds like time columns

[slide 23]

successes

[slide 24]

misses
product management is severly understaffed right now, only a few people available for keeping things running (eg. Jon in Mobile)

General discussion

Lila: feedback on meeting/structure?
Jared: need to make room for Research team's explorative work, great source of ideas
Aaron: haven't been able to work directly with Abbey and Daisy, this separation is artificial and detrimental
--> Lila: integrate research teams [e.g. UX research and Research&Data] more; you guys should be both nose and tail (sniff out new things, and verify in the end that they work after implementation)
--> all researchers should be in same areas, need better lines of communications
--> Leila: need help understanding goals for research, expected outcomes for projects
--> Terry: set something up, would love to talk about this
--> Toby: team should produce proposal
Dario: working on some, reorg has been taxing
Lila: need work to set goals in streamlined way
Toby: Analytics struggled with Ops for unacceptably long time (not their fault)
comes at expense of e.g. self-service systems
Terry: this is our (execs') responsibility
Lila: will give resources, Ops too reactive right now [cf. action item in Analytics Engineering section]
Erik: some functions like database admin need to be centralized
Lila: yes, having them in other teams would likely be short-term solution