Wikimedia monthly activities meetings/Quarterly reviews/Architecture, Operations, Release Engineering, Services, and Security, October 2016

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Technology II: Architecture, Operations, Release Engineering, Services, Security teams, July 14, 8:00 - 9:30 AM PT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Present (in the office): RobLa, Michelle Paulson, Zhou, Gabriel; participating remotely: Maggie Dennis, Aeryn, akosiaris, Andrew Bogott, Brandon Black, Darian Patrick, Emanuele, Eric E., Faidon, Filippo G., Giuseppe, Greg, Jaime V., Jaime Crespo, Katherine Maher, Mark, Petr, Sarah R., Wes

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Architecture[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Objective: Support Wikimedia Security[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Rob: See Slide
  • Rob: Working with Darian to Brian W. to help out. Darian really stepped up. I look forward to his management of security going forward.

Objective: Develop Fellowships program[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
    • Clarifying relationships between Brion V and Tim S. Not done this quarter -staffing issues in T&C and Executive Staff. This should be something to work with upcoming CTO.

Other successes and misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Rob: Worked with Kevin S. with TPG to improve documentation about how ArchCom works. Creating continuity for the architecture committee. What are the essential components and what are manager specific?
  • Rob: Archcom meets weekly. We also have well-attended IRC meetings.
  • Rob: DevSummit coming together with Quim Gil and his team. Quim is charing and Robla is taking an active role.

Wiki Text Parsoid and Parsing has been working on documenting wikidom? and wikitext (which has been used for 15 years). Parsing has discussed this extensively at offsite and will be a focus for the upcoming quarter.

  • Rob: Misses: would like to work more with colleagues in tech.

Release Engineering[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Greg: Team Size is 6
  • Greg: KPI went down about 8% (about a min from Q4)
  • Katherine: That seems a significant gain? Reasons?
  • Greg: It's hard to tell. Part of it may be that Node pool which were migrating to, we went back to permanent slaves which don't have the cost associated with them. We don't have benefits of isolation but the positive is that for quick tests they are faster. That is informing a lot of what we talk about at our offsite and decisions that we have made.

Time spent[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Greg: Consolidated visual graphical view of time spent.
  • Greg: Team fills out spreadsheet from what people remember the week before. It is based off of memory but it is a general snapshot of where time is going to.

Time Spent by Category[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Greg: All categories that we use to track time allocation.

Objective: Phase out Ubuntu Precise[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Greg: Objectives: We have a meta objective of phasing out of ubuntu precise (See Slide)
  • Greg: Done with strong collaboration with Ops
  • Greg: Learning: Some assumptions were made about Ops level items that we should not have made and left the questions open for Ops, so we created some confusion

Objective: Reduce Tech Debt[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Stretch Goals[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Successes and Misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf


Core workflows and metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Core workflows and metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf


Core workflows and metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf


Technical Operations[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Mark: 19 staff members
  • Mark: 3 People joined this quarter
  • Mark: DBA was finally filled
  • Mark: Madhu from analytics joined
  • Mark: Ricardo also joined -- he will work specifically on automation
  • Mark: Main KPI is availability

Objective: Puppet[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Mark: Puppet: First goal. We were running on an old slow version of this. It wasn't present in the new data center yet, if we lost the original we could have had some problems. This had been put off for a while. We made this a focus this quarter. It is running on multiple machines. We spent a bit fo time on it. There was some frustration for the engineers. There was some lack of documentation, so this slowed it down but we made it two or three weeks befroe the Quarter end.
  • Mark: Puppet runs are less than 20. This is not the end all solution but in general, this is a good thing.

Objective: Prometheus Metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Metrics monitoring. We have many systems across Wikimedia. We keep adding more. Many have problems. In the last year we did some work scaling graphite. Our systems couldn't keep up and there weren't solutions for that. Prometheus was one of the software packages we decided to experiment with our data in production. We deployed in both data centers from the start. Deployed across multiple servers in multiple data centers. It was met with praise and is efficient with bandwidth and storage. Several orders of magnitude more efficient from graphite.
  • Additional flexibility. We will move ahead in the next quarter.

Objective: Openstack Horizon[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Custom program written by a labs engineer years ago and over the last few quarters we have been migrating managment of puppet classes to new interface (openstack)
  • We did have a snag. One of our draft goals was posted as a goal but was not meant to be. The team decided to not move forward with the goal as written but the goal as discussed. Next time, we will explicitly check the official posted goal on the wiki. Transitioning to a new system takes a lot of coordination and prep work and this should be done separately from the actual goal.

Objective: Varnish 4[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Part of a year long effort. This was a big migration. We have a lot of traffic and content on varnish. The storage backend has some issues. There is a solution but it is not open source so it is not available. We are trying to decide whether to stay with the open source version or migrate away.

Objective: Object invalidation with X-Key[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • With current varnish servers we can only purge one page at a time if we know the content. It is not efficient. Right now we have several thousand purges per second per server and there is an ew way to optimize that. It is called XKey and we will need coordination from the org.

Objective: Kubernetes[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Managing containers. We did not finish but we felt it was important to get started. It did not get started until the last two weeks but it will be a focus next quarter.

Successes and Misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Successes and Misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Successes and Misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Core workflows and metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Core workflows and metrics[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Gradually expanding across quarters
  • Katherine: Thanks that was great. Appreciated discussion of availability.

Services[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Objective: Improve Services Platform[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Gabriel: Goal Improve Service Platform (See Slide for details)
  • Gabriel: Goals from last quarter and for next quarter.
  • Gabriel: Next quarter: we have several use cases in the pipeline with Editing and Reading

Objective: Improve Services and Security[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Protect sensitive user info (See Slide)
  • Not as successful this quarter. It is a multi-team collaboration with dependencies on teams. This goal was de-prioritized and will be revisited.

Objective: Overhaul Legacy Systems[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Provide maintainable cost effective pdf generation service for offline/mobile use. (See slide)
  • Wikipedia Germany has agreed to be product oder on this. We will prepare for deployment and this will go out in this quarter.
  • Wikimedia DE was interested in table support and this was a community goal as well

Core Workflows[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Support offline, improve performance, increase flexibility

Core Workflows[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Scorecard[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Security[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Darian: Team is the same size at about 1.5 people
  • Darian: 90% of time on core focus addressing security bugs and working with other teams
  • Darian: 2 Critical, 59 High security bugs

Objectives[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Successes and Misses[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Core Workflows[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

Core Workflows[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf
  • Katherine: Katherine: Curious about hiring.
  • Darian: We have several solid candidates. A contractor, Sam Reed, who has worked with us before will be working with us.

Bugs[edit]

Technology Quarterly Review - Q1 FY16-17- Architecture, Technical Operations, Release Engineering, Services, Security.pdf

session wrapup[edit]

  • Wes: We had a Product and Technology onsite this quarter, and there were many great outcomes. One big thing that came out were looking at improvements to QA and Beta Cluster and working with Ops. For those watching the presentation, Gabriel presented in a new format that we are testing out. if you have feedback please let me or Kristen Lans (from Subteam) help to improve process.