Wikimedia monthly activities meetings/Quarterly reviews/Release Engineering/September 2014

From Meta, a Wikimedia project coordination wiki

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Release Engineering and QA team, September 3, 2014, 11AM - 12:30PM PDT.

Present: Erik Möller, Greg Grossmeier, Tilman Bayer (taking minutes), James Forrester, Dan Duvall, Rummana Yasmeen, Chris Steipp, Lila Tretikov, Rob Lanphier, Juliusz Gonera (from 11:50)

Participating remotely: Bryan Davis, Chris McMahon, Andre Klapper, Mark Bergsma, Željko Filipin, Dan Garry, Mukunda Modell, Antoine Musso (from 12:15)

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Agenda

Presentation slides from the meeting

Greg:
Welcome
this is the first quarterly review after reorganization, becoming official team after having been virtual team within other teams

last quarter[edit]

deployment tooling (resolving pain points identified in January meeting Dev/Deploy
scap and trebuchet - our two main deployment tools for cluster
scap is for MediaWiki and extensions
trebuchet for e.g. Parsoid
Lila: as a general note (about these meetings): tell me the value of such things ;)
** ACTION: add value statement with each goal in next quarter's presentation
Greg: enable us to deploy faster than our current cadence
Lila: also to assess effort vs. value, picking low-hanging fruit first
Greg: that's what we did in that January meeting
HHVM now on beta cluster, updated every 5-10min
and on one production machine (one of the job runners, hand-built)
puppet defaults now mean new app servers will automatically be on HHVM
maybe some future incremental performance improvements
Lila: do we have a goal how many we want to switch?
RobLa: MW core team goal: everything by end of this quarter
Erik: you can already activate it for your user account
(Greg:)
scap has been used forever, recently re-written by us (mostly Bryan) in Python
using fork of Trebuchet (also in Python)
wrapper around another tool
RobLa: around 1-2y ago, trebuchet was "the future", migrated Parsoid
the things that are complicated about scap are the things that are complicated about our infrastructure
scap rsyncs code to our production servers
and (e.g.) all the i18n strings
Lila: so only used for our own infrastructure?
Greg: yes, would be surprised to learn about reusers
Robla: idea: scap as support library for trebuchet
Lila: what is the overlap? (value of maintaining both?)
Greg: scap is good enough for MW, sucks for eg. Parsoid
e.g. diagnostics when deployment goes wrong
Bryan: need root privileges for such things :(
Greg: also, e.g. Mathoid coming
Gabriel has been champion for Debian packages
Lila: sounds messy, need to agree on must-have requirements
Greg: slow conversation because it's very cross-team
RobLa: also, ownership clarified only recently
when Phabricator comes, perhaps Mukunda could take this
Erik: is there an RfC on deployment systems?
** ACTION: create RFC around deployment systems
Greg: no, but we interviewed some people like Gabriel and search team on must-haves etc
Erik: so for non-MW deploys, we don't ever use scap? no
Bryan: it does not pretend to be a general purpose tool
Greg: idea was that trebuchet is calling scap for missing bits
Bryan: take git repository and provision it to multiple servers
Erik: ...
Greg: originally had scap and trebuchet integration as goal...
Lila: other people (organizations) might have the same problem (with trebuchet)?
Greg, Bryan: Ryan Lane was main developer, no longer with us
Lila, Erik, RobLa: (discussion about deploy for developers)
RobLa: right now, all done from command line
a lot of other orgs have a web interface where you push a button to deploy
Erik: scap is actually quite good in monitoring / reverting deploy to a cluster
Greg: ...
RobLa: have been making incremental progress to making beta cluster more similar to production
Greg: ~115 segments in config that are conditional on beta
ChrisM: it's a shared test environment
DanG: e.g. to test on actual content, had to import articles, but then miss templates, ... Got to like 90%
Erik: already had a lot of attention going to it. it's just that doing a staging environment for WP *is* complicated
JamesF: also, local communities keep changing things like interface messages that we need to keep sync'ed to test
Greg: complicating factors: also want to have non-English WP wikis
Erik: Openstack wouldn't be the right answer either
Lila: is production running Openstack? no
Greg:
(last q goals:)
HHVM deployment tooling
Swift cluster in beta - not yet, was stretch goal
RobLa: Swift is basically an open source version of Amazon AWS
Mark: Andrew Bogott did some work on Swift in Labs
Greg: Antoine and Bryan working on that, but not much available now
MW release (for third party resuse):
supported release of MW 1.23, done by contractors (Mark & Markus)
decided a while ago that third-party releases are not WMF's core responsibility, so had first RFP. Now second RFP
RobLa: they are finding other funding sources
Lila: could this be automated?
Erik, RobLa: it's part of their task
RobLa, ChrisS: Jenkins has tags, they can download tarball from there
RobLa: they then do testing, bug fixing, add support for things we don't care about, e.g. postgresql, non-Linux OSs
we are focused on our own infrastructure
Erik: there is an element of conservatism about third-party reusers
Greg: considered more part of ECT now (Quim)

(last q, what we said:)

QA[edit]

  • features on master are newer than production, so tests would fail - account for that now
  • retire Cloudbees Jenkins
  • Integrate WMF Jenkins with new WMF Saucelabs account

Lila: testing infrastructure for MW?
ChrisM: think of these as browser level-tests for features, e.g Mobilefrontend
cover paths through the applications that are important for users
Lila: but do we have unit tests?
Greg: yes, but we are very browser test focused now
reported test coverage: 4% (~7% on includes/ per https://integration.wikimedia.org/cover/mediawiki-core/master/php/ )
Lila: :(
ChrisM: some of the newer components have unit tests
Erik: they are more integration than unit tests (e.g. test if build fails)
easier for new components that are built from scratch
RobLa: some of MW's tech debts make it actively hostile to unit tests
ChrisM: mobile team had policy that every new feature would have at least one unit test[?]
Lila: we need all three (integration, browser, unit tests)
Erik: infrastructure is there, but coverage is low
Lila: at Sugar, we enforced "no commit without tests"
DanD: also in design phase
Lila: yes, that's another thing we did: PM and QA working at the same time: while drafting use cases write tests at the same time (difficult though)
ChrisM: language engineering has been doing that
DanD: ...
Greg: ask: we need champion for testing
Erik: and more drive from devs themselves
Juliusz: how can one write tests for software that can't be tested ;)
Lila: had the same problem at Sugar. had an architect go in and review, then refactored
pretty much stopped people from using global variables
have to do it piecemeal
Erik: that's the issue with tech debt
ChrisM: Željko and I worked for 2 years to set the infrastructure up
Lila etc.: congratulations

next quarter[edit]

Greg:
Phabricator
Lila: what's the benefit?
Erik: we have numerous different project management tools
Bugzilla good at bug tracking
but not great at project managing a team
so people looked at Mingle, Trello, ...
(alternative would have been:) Mozilla introduced a (hackish) PM tool inside Bugzilla: scrumbugs
Phabricator: good usability, decent bug tracker, ...
good collaboration with them
2 y ago, Phabricator was in its infancy, so we went with Gerrit instead
proliferation of tools costs us visibility (what is team X working on now?), difficult to navigate all the different tools that are in use
migration will be painful, but afterwards it will be wonderful ;)
Lila: and it will also be external, so community benefits too? yes
Greg: metric: how many teams migrating

deployment tooling

Jenkins
team helps with request for new test jobs

Beta cluster:
add Swift, monitoring (Yuvi helping with Labs monitoring)
(to regard beta as just) Yet Another Cluster: want to remove these 150 conditional statements for beta
Hiera is a puppet tool that helps with that
Lila: stable versions?
...

Browser testing:
Željko did one-on-one pair programming with volunteers, but has been low-level skill transfer only
so switch to workshops
improve best practices documentation
pairing with other WMF teams

browser test window before weekly branch cut/ deploys

Vagrant (virtualized environment using Puppet, allows quick setup of new instance)
get MobileFrontend running in Vagrant
ChrisM: invested a lot in Beta Labs, it's very valuable now, but can only use it after code is already master branch
people can use Vagrant to test before they commit
Lila: who is responsible for advocating use of Vagrant, best practices?
DanD: I pair a lot with VE, Mobilefrontend, but also contribute to Vagrant based on what I see there
JamesF: I think the majority of in-house devs now use Vagrant to create their test instances
We've advocated it well at e.g. hackathons
Erik: Arthur did a lot of that, he's familiar with such things
I see his role as extension of VPE role
coaching
doing team survey
Greg: we'll talk with Arthur to coordinate efforts

Hiring:

  • another QA tester - to be shared with Editing and Mobile

Lila: manual tests?
Erik: yes, combination makes sense
Lila: exploratory testing is fine, but need use cases for edge cases
Erik: do you make scripts?
Rummana: not really
Lila: there is room for both, but manual testing should be structured too
JamesF: we do test fundamental tasks, during deployment several times a day
ChrisM: can/should do very structured exploratory testing
Erik: until Rummana joined, all our QA was focused on automation and manual testing by Product
used outsourced QA testing, poor signal/noise ratio
just building out human testing now
Lila: ok, but I'm looking for a process
1.: ... 2.: ...hard to get use cases where one is following scripts
RobLa: once the new person is in, we'll have a team discussion about that
** ACTION: Rob/Greg/Rummana/Chris team discussion about scripted testing, especially with new QA Tester starting
** ACTION: revisit/discuss role of scripted testing vs exploratory testing at next quarterly review

(Greg:)
Dependencies
from Ops:

  • Swift in Beta cluster
  • Beta cluster monitoring
  • Yet Another Cluster

Greg: ...cluster that only alpha uses[?]
Mark: haven't considered that use case, let's talk about that
** ACTION: Greg to bring up use case for bare metal test cluster

from MW Core:

  • HHVM: talked to Ori about how to support them better
  • need knowledge (transfer) on deployment tooling

Questions[edit]

Juliusz:
on behalf of Mobile team: for user testing (A/B), could use a separate beta cluster where no (other) experiment is going on[?]
Greg: ...
RobLa: but would need to define what's out of scope for that other cluster, or someone is always going to ruin it
real solution would be isolated test environment
that's what plain old Labs is for
make it easier to create a full stack *cluster* in Labs?
ChrisM: broke rule of already needing to have software in production before you can use Labs
Lila: Thanks all! I need to leave

Juliusz: who has ownership of MW Vagrant?
more and more people use it, recently had problems with it
Greg: first stop would be DanD, Bryan, and Ori
DanD: there are various aspects to Vagrant, I can help with some
RobLa: anyone can put in things that cause problems with Vagrant for all
JamesF: cost of using Vagrant: more areas where things can go wrong.

JamesF: for Editing tests (that change the state, in our case after clicking save), would like to have system that resets itself after a trigger
ChrisM: can be done in API