Wikimedia monthly activities meetings/Quarterly reviews/Research, Design Research, Analytics, and Performance, July 2016

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Technology I: Research, Design Research, Analytics Engineering, Performance teams, July 14, 10:00 - 11:30 AM PT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): Dario, Ellery, Leila, Michelle, Ori, Tilman, RobLa, Jaime, Madhu, Grace, Katherine, ; participating remotely: aotto, Dan, Darian, Faidon, Gilles, Greg, halfak, Jonathan, Maggie Dennis, Marcel Ruiz Forns, Mark, nuria, Samantha, Wes

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Backup Datacenter[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf
  • Wes: Great rollout. We had to make adjustments, moved out a quarter do be prepared and do this well. Good learning and improvements.
  • Katherine: reiterating last session; huge accomplishment for the team, community members were very positive; confidence in the results; board appreciated the work

Research and Data[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Revscoring in production[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf
  • Dario: first goal, ORES

most important milestone for this project. Both service (API) and client interface in the production cluster and available to users. Big thanks to Amir Sarabadani and others from Release Engineering, Operations and our volunteers.

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Revscoring in production - Successes and Misses[edit]

  • Dario: related updates on ORES. Wikimania session to socialize the changes in ORES by Amir.
  • Dario: 14 new models, substantial performance improvements
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Discussion modeling[edit]

  • Dario: Also related to the application of machine learning methods. Discussion modeling project (Detox) has the goal of providing insights into talk page discussions when it comes to interactions that may drive people away. This is a collaboration with different stakeholders, the community department, research-and-data, and Jigsaw. We have the first results of it available this quarter. Data-set released, for now there has not been an announcement. We will make an announcement one all diffs are scored by the algorithm. Ellery also did an analysis of users that are blocked to understand why users are blocked.
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Other successes and misses[edit]

  • Dario: It divides in 3 steps. Understanding harassment, designing interventions, [what is the third step? --Leila]
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Research outreach[edit]

  • Dario: hosted 2 pretty major outreach events. Co-organized by Stanford University, Wiki Research in WWW and ICWSM conferences. We also cohosted WikiCite with WMDE. A lot of effort for doing these outreach. The results have paid off for all the effort that we put in for organizing them.
  • Learnings: how to organize funding we get for events such us this one? We had no network for the events we organized.
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Other successes and misses[edit]

  • Dario: Wiki workshop largest concentration to date of Wikimedia research scientist. A report for WikiCite for the community and the funders in the coming quarter.
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Reader segmentation[edit]

  • Dario: Carry-over goal. Concluded that this was a miss, as not yet completed.
  • figure out next steps, still a lot of interesting results that could be drawn from this data
  • Katherine: partnership with Reading team?
  • (Dario:) yes, started as collaboration on technical level, in past 2 quarters shared findings with them
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Other successes and misses[edit]

  • Dario: Wikistats 2.0. Had been maintained quietly by Erik Z, but he'll focus on other tasks now. As of this quarter, the official maintenance of wikistats is handed over to the Analytics team.
  • Dario: Wikipedia navigation vectors data release which we expect to increase the research on understanding Wikipedia readers and logs.
  • Dario: published Research FAQ after request by FDC
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Core workflows and metrics[edit]

  • Dario: we did not host a showcase in the last quarter. You should see them coming back and happening on the regular basis starting this month.
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Appendix[edit]

  • Katherine: workshop (about harassment at Wikimania[?]) went really well. q: other than attendees, what do you see as outcomes from that? conversations, research, projects?
  • Leila: conversations around three projects. empowers when we want to start other projects, people get together and they may start something that comes up a year from now. we can announce maybe 6mos year from now. hard to quantify concrete outcomes of such workshops, but the relationship built are really important, enable collaboration
  • Dario: we org dev summit every year. research community is as important as developer community. tighter integration between researchers' priorities and community needs
  • Maggie: we heard from German community just last week that harassment research project is helpful work
  • Katherine: appreciated documentation, so people know who is doing what in research. I know that's not nearly as fun as working on projects, but it's important
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Design Research[edit]

  • Jonathan: Filling in for Abbey who is in a workshop.
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Personas[edit]

  • Jonathan: not completed for several reasons: most importantly loss of partners in org, in particular Kaity (design) also, had to make room for New Readers research (Mexico etc.)
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Personas[edit]

  • Jonathan: going forward, emphasis on integrating these personas into product research and design (rather than creating new personas or refining existing ones)
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Evaluative Design Research[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Evaluative Research[edit]

  • Screenshot credit: Pau Giner
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Deep dive / contextual inquiries[edit]

  • Collaboration with Reboot; we're prioritizing getting the information published
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

New Readers Contextual inquiries[edit]

  • Each deep dive took approx 2 weeks
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Collaboration with UW on Survey[edit]

  • research prompted by questoin asked by Trevor; how do people learn? collaboration deployed to students to find how they find informatino. summary provided at last month's metrics. this was design research's first
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Collaboration with UW on Survey[edit]

  • noting UW's publication of data
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Benchmarking / Tooling[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Benchmarking / Tooling[edit]

  • new user testing platform.
  • Jonathan: relationship with previous vendor was not working for several reasons. primary goal was to get something in place to meet product team's needs right now. Re-scoped to just securing a contract; meets our current needs and we think our future needs.
  • Wes: thanks to Legal team
  • Jonathan: Manprit and (?)
  • Jonathan: we have 3 seats for 3 researchers. we can use this for any research, not just 3 verticals. our previous contract
  • Jonathan: we're phasing out some of my old responsibilities; Chris Schilling taking over some of them
  • Katherine: thank you
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

slide 25[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Analytics Engineering[edit]

  • Nuria presenting
  • our team : Madhu is transitioning to Ops. We work quite a lot with ops
  • we use velocity as KPI, went down during this q - several of team took vacation, Wikimania
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Public by default[edit]

also track data by country, but only release aggregate publicly, for privacy reason

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

(data)[edit]

  • Nuria: WIkistats transition (old version has been up for the better part of a decade) so far relied on XML dumps
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Wikistats 2.0[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Public events stream[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Druid (screenshot)[edit]

  • Nuria: tool for e.g. some people in Reading (JonK, Tilman), much easier way to consume our data, queries takes seconds in Druid instead of minutes in Hive. WMDE loves this too
  • Dario: for Wikidata?
  • Nuria: yes
  • Dario: e.g. Russia is top traffic source for Wikidata
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Better Data Access[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Operational Excellence[edit]

our team has big operational (component), e.g. Cassandra scaling issues

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Other successes and misses[edit]

  • Nuria: Varnish/Varnishkafka upgrades
  • Nuria: Launched http://analytics.wikimedia.org : one place for our tools
  • Michelle: I'm so impressed with your emphasis on privacy. curious about API response rate. Practical impact on API users?
  • Nuria: impacted by cache. only heavy users of the tools. those people see the impact. overall if you request data, you aren't hitting storage
  • Dan: most people accessing via PageView tool by MusikAnimal, he's handling from community view
  • Dario: it was all really great this quarter. please demo Druid
  • Katherine: making unique dataset public is fantastic. have it for all projects?
  • Nuria: yes, did it originally for enwiki, but then for all. Also projects that are too small to yield
  • Katherine: overall de-duplicated?
  • Nuria: yes, it's something Reading/Tilman have requested. investigate performance/traffic considerations of additional cookie https://phabricator.wikimedia.org/T138027
  • Katherine: is that a goal currently?
  • Nuria: not a goal this q, perhaps next q, depends on ops/perf feedback
  • Katherine: recognize it's not something we would use in product decisions, but eg important for board (important vs urgent vs informational)
  • Nuria: banner consultation[?]
  • Katherine: interest to external audiences
Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Dashboards and Data analysis[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Performance team[edit]

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

KPI: First paint time[edit]

Ori: theory: traffic moving to mobile, but don't know

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

KPI: Page save time[edit]

Ori: regressed during quarter. one major regression (Authmanager)

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

KPI: Page save time[edit]

Ori: graph that goes to today; we're actually better now

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Thumbor[edit]

Ori: tight coupling to mw. goal was prod deploy. we have VM deployment. team decided to package for Debian. coming along quite well. Filipo provided enormous time and assistance

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Objective: Performance Inspector[edit]

Ori: not deployed; asked Peter to help Reading team with lazy loading. looking good for next q

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

Multi-DC[edit]

Ori: talked a lot about that earlier.

Technology Quarterly Review - Q4 FY15-16- Research and Data, Design Research, Analytics, Performance.pdf

other successes and misses[edit]

  • Ori: Running MediaWiki from secondary datacenter without writing. reducing slave lag (read slide)
  • Ori: Aaron Schulz made optimistic saving work for majority of edits
  • Ori: expect substantial impact of inlining CSS on first paint time
  • Katherine: so optimistic saving is live?
  • Ori: yes
  • Katherine: slave lag: that's in addition to multi-DC work?
  • Ori: yes. don't want to serve stale content, this imposes a lag tax
  • Katherine: does this (metric) mean team will work further on this
  • Ori: it's a huge win already, but.. we've made a lot of progress in past 3 mos. 5 secs of slave lag still a problem for modern web application
  • Dario: as top KPI?
  • Ori: it's a longer bet, switchover to Dallas was one of biggest payoffs of continuously serving from both DCs is that caches remain warm in future, should allow automatic swtichover improve performance e.g. in south America (closer to Dallas) but bigger payoff: can open more DCs in this way
  • Katherine: interested in progression on lag time
  • Ori: mostly social rather than technical. a lot of stuff exclusively Aaron Schulz. this is going to be problematic. needs to be widely understood in org, otherwise this will be difficult to work on. Aaron has been collaborating with Stas, documenting the work on mw.org; ideally we would reallocate
  • Katherine: when we first started; paint time increases may be transition to mobile. in next q, do you feel like you have resources to do what you want/tradeoffs?
  • Ori: work that Timo is doing will make substantial impact. but even if metrics substantially improve by eg. including CSS next q: it will be great to have compensated for that, but will still be not fully satisfactory; doesn't answer the question why regression happened in the first place. don't feel adequately resourced
  • Katherine: what is your purpose for next quarter?
  • Ori: jumping between projects; drop in project management
  • Wes: Ori, thanks for pushing forward performance and making the site more efficient; heard lots of compliments from other teams about your help supporting their teams.
  • Katherine: I really appreciate time we have together. helps me understand arc of the org, appreciate learning since this isn't my core experience, so thank you very much


General[edit]

  • Katherine: comment about presentation structure: makes sense to rotate order so VE doesn't always get smushed at the end.