IRC office hours/Office hours 2013-04-30

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

4:56 PM — Coren waves.
4:56 PM <Lydia_WMDE> hey Coren
4:58 PM → Silke_WMDE_, petan and sumanah joined
5:00 PM <Lydia_WMDE> alright peoples :)
5:00 PM <Lydia_WMDE> who is here for the office hour?
5:00 PM <Silke_WMDE_> o/
5:00 PM <Silke_WMDE_> :)
5:00 PM <sumanah> Hi
5:00 PM <Silke_WMDE_> hi sumanah!
5:00 PM → Nettrom joined (6641@wikipedia/Nettrom)
5:01 PM <sumanah> Hi Silke_WMDE_!
5:01 PM → mindspillage joined
5:01 PM <Matthew_> Lydia_WMDE: Meeeeeee!
5:01 PM <Lydia_WMDE> wohoooo!
5:01 PM → auduwage joined (
5:01 PM <Lydia_WMDE> someone with questions i hope :P
5:01 PM <Matthew_> Yeah, a couple :)
5:01 PM <Coren> Not me! I'm just here to lurk... oh. :-)
5:02 PM <Lydia_WMDE> awesome
5:02 PM — Lydia_WMDE pokes Coren :P
5:02 PM <Lydia_WMDE> Matthew_: shoot then
5:02 PM <halfak> Lydia_WMDE: Me too. I was helping auduwage get set up.
5:03 PM <Lydia_WMDE> halfak: cool
5:03 PM <JohannesK_WMDE> hi all!
5:03 PM <Lydia_WMDE> so here's how we usually do this:
5:03 PM <auduwage> I see issues with toolserver lately, I have send a job and its at qw stage for last 12 hours.
5:03 PM <Lydia_WMDE> i collect questions and then Silke_WMDE_ and Coren will answer them in the order i give them to them
5:03 PM <Matthew_> ... Huh, OK. That took me by surprise...
5:04 PM <halfak> Lydia_WMDE: Do we PM you?
5:04 PM <Lydia_WMDE> halfak: that works but here is fine too
5:04 PM <Coren> Hello auduwage, strictly speaking, we can't really give support for the toolserver - but Petan here is an admin here and might be able to help later. :-)
5:04 PM sankarshan → sankarshan_away
5:05 PM <Lydia_WMDE> (silke is typing)
5:05 PM <Matthew_> So, my tools are a little interesting. I'm building a system that requires both a bot and web-based tools to have access to the same database. I've been told there's going to be database sharing between the web-based tools project and the labs project. When can we expect that? or is there some way I have to specifically request that? Sorry if it's a stupid question, but it's kinda important for my tools.
5:05 PM <Silke_WMDE_> auduwage: Yeah, we are rather here to talk about issues and questions concerning the migration from toolserver to Tool Labs. We are not toolserver admins
5:05 PM <auduwage> Thanks Coren
5:05 PM <halfak> Shall we start asking questions now or is there more into?
5:06 PM <Lydia_WMDE> halfak: just give questions to me, yes
5:06 PM <Coren> Matthew_: In the tools labs, there isn't an actual distinction between bots and web-tools; every tool is allowed to implement both and they have access to a DB. So yes, what you are discussing (bot and web tool speaking to the same DB) is actually the default.
5:06 PM <halfak> My only blocker for moving from toolserver to labs is mysql access. What's the ETA on that feature?
5:07 PM <halfak> Lydia_WMDE: ^^
5:07 PM <Lydia_WMDE> halfak: noted
5:07 PM <Coren> Matthew_: That said, there is also nothing that prevent different tools from giving each other accesses to their databases; they have control with grant option.
5:07 PM <Platonides> auduwage, 5 queues were in error state, I just reseted them
5:07 PM → K4-713 joined (~khorn@wikimedia/katiehorn)
5:07 PM <Matthew_> Coren: OK, but I thought they were in separate projects?
5:07 PM <Matthew_> Whoa, that'll be helpful!
5:08 PM <Lydia_WMDE> alright
5:08 PM <Lydia_WMDE> next one:
5:08 PM <Lydia_WMDE> [18:07:02] <halfak> My only blocker for moving from toolserver to labs is mysql access. What's the ETA on that feature?
5:08 PM <Matthew_> ^
5:08 PM <Coren> That depends on what you mean by 'mysql access'. If you you mean databases for your own tools (i.e.: not the replicas from the WMF projects) then it's already there.
5:08 PM ⇐ Alchimista quit (~alchimist@wikipedia/Alchimista) Quit: Konversation terminated!
5:09 PM <halfak> I need a replica of some tables from enwiki.
5:09 PM → multichill joined (
5:09 PM <Coren> halfak: Ah, that will come in the next month or so; we expect before the Amsterdam Hackaton. We're at the "select what columns/tables we can replicate" stage, right now, with is nearly the last step.
5:10 PM → qgil joined (
5:10 PM <multichill> Is the log somewhere online? I didn't notice this until just now
5:10 PM <Lydia_WMDE> multichill: i'll post it later
5:10 PM → Ryan_Lane joined (~Ryan_Lane@wikimedia/Ryan-lane)
5:10 PM <sumanah> multichill:
5:11 PM <halfak> Thanks Coren
5:11 PM <Lydia_WMDE> next one:
5:11 PM <auduwage> Thanks Platonides, I guess its better kill the job and re-run again.
5:11 PM <Lydia_WMDE> [18:09:40] <Nettrom> Some bots/tools, e.g. SuggestBot which I run, is in a production state, running as a service to the community 24/7 (on four Wikipedias). It’s also being actively developed, so it requires a development environment. How does Tool Labs support this kind of setup?
5:11 PM <sumanah> halfak: have you already seen ?
5:12 PM <Coren> Nettrom: We have, in fact, two project: "tools", which is the stable environment where 'live' tools are expected to run, and "bots" which is meant to be a more experimental environment for development and prototypes.
5:12 PM <multichill> Ah, summertime, right :-)
5:12 PM <halfak> sumanah: I hadn't. Thanks!
5:12 PM <sumanah> Coren: hey, you ought to update because right now it says the Redactatron is not done yet.
5:12 PM <Coren> Nettrom: Both projects offer the same facilities, but "tools" is a bit more restrictive in its setup to ensure stability.
5:12 PM <Coren> sumanah: I though binasher updated it. :-) I will.
5:12 PM <Silke_WMDE_> BTW here is the draft for a roadmap for migration, in case you don't know it already:
5:13 PM <Nettrom> Coren: so I'd be a member of both projects a migrate over to "tools" once development becomes stable?
5:13 PM <Nettrom> *and migrate
5:13 PM <multichill> Silke_WMDE_: Based on DAB's email. What are you going to do to improve the Toolserver cluster?
5:13 PM <Ryan_Lane> Nettrom: not necessarily migrate
5:13 PM <scfc_de> Nettrom: You can develop very well in the Tools project alone.
5:14 PM <Coren> Nettrom: You can have both running in parallel, really.
5:14 PM <Nettrom> oh, ok
5:14 PM <Ryan_Lane> Nettrom: one would be your dev environment, the other would be production
5:14 PM <Silke_WMDE_> multichill give me a moment to type...
5:14 PM → giftpfla1ze joined (
5:14 PM addenergyless → addshore
5:15 PM <Nettrom> sounds like I've been a bit confused about how projects and members and the whole thingamajig works... will go read some documentation, thanks!
5:16 PM → PPena_ joined ← Ironholds left
5:16 PM <Ryan_Lane> Nettrom: yw. a brief overview: a project has members and admins. it holds resources like virtual machines, storage, public IP addresses, firewall rules. etc. it's mostly a security separation that allows a community to manage their own infrastructure.
5:17 PM ⇐ jorm quit (~bharris@wikimedia/jorm) Quit: jorm
5:17 PM <Ryan_Lane> so, you join two "communities". one of those is meant for development, the other is meant for production
5:17 PM ⇐ giftpflanze quit (~GVMBot@wikipedia/Giftpflanze) Ping timeout: 248 seconds
5:17 PM <Ryan_Lane> Labs is comprised of a bunch of these communities, each aimed at a specific real world project
5:17 PM <scfc_de> Ryan_Lane: You *can* join two communities, you're not required to :-).
5:18 PM <Ryan_Lane> indeed
5:18 PM <Ryan_Lane> you can join as many or as few as you'd like
5:18 PM ⇐ PPena quit • PPena_ → PPena
5:18 PM <MF-W> so one tool = one labs project?
5:18 PM <Silke_WMDE_> multichill One thing is: We dont' have any more rack space nor electricity. DaB asked if it's possible to add disk to virtualize. For me it's hard to find out about the exact situation...
5:19 PM <Silke_WMDE_> ... of electricity because none of us is on-site.
5:20 PM <Silke_WMDE_> multichill: It's NOT that we would invest some more money.
5:20 PM <multichill> Silke_WMDE_: And how are you going this problem?
5:20 PM Raylton → Raylton_Away
5:20 PM <sumanah> and may help people out here
5:20 PM <addshore> Or even
5:21 PM <varnent> James_F: ping
5:21 PM <scfc_de> MF-W: No, "project" = (for example) "Tools" or "Bots" or ... Tools (= plural of tool) then are directories, code, data, etc. on Tools.
5:21 PM <multichill> Silke_WMDE_: If you want to know the current usage I can just drop by and check the racks. If that's keeping you from improving things.....
5:21 PM <Lydia_WMDE> (can we please discuss one question after another?)
5:22 PM <Silke_WMDE_> multichill: I hope the members' assembly will give me some direction here. they ought to say if we still want to start contracts with a data centre we never had a contract with a year before leaving etc.
5:22 PM <Platonides> I expect that the way the racks are used is fully documented...
5:22 PM <scfc_de> Silke_WMDE_: So currently the contract is with WMF? Don't they have documentation?
5:23 PM <multichill> afaik everything is in racktables
5:23 PM <Silke_WMDE_> scfc_de: The contract is with WMF.
5:23 PM <multichill> Just throw out some old servers saving power and replace them with newer ones
5:25 PM <multichill> What is your power capacity per rack? 2 * 16A? How much are using? What is the power usage per server? etc
5:25 PM <Lydia_WMDE> multichill: assuming the toolserver will go away - what do you need from labs to move there?
5:26 PM <multichill> Lydia_WMDE: That's a distant future, I would like to see the Toolserver fixed in the meantime
5:26 PM <multichill> The new product isn't ready yet and the old one is falling apart
5:26 PM <Lydia_WMDE> but you still have not said what is missing to make it ready
5:27 PM <Lydia_WMDE> that would really help
5:27 PM <Lydia_WMDE> then maybe we can give you a timeline
5:27 PM <multichill> Fully replicated databases with user databases
5:27 PM <multichill> That's Wikipedia + Commons + Wikidata + user databases on one server for joins
5:28 PM <scfc_de> multichill: Seconded.
5:28 PM <addshore> As far as I knew this was planned for May.?
5:28 PM <Lydia_WMDE> (coren is answering)
5:29 PM ⇐ Matthew_ quit (~matthew_@wikipedia/matthewrbowker) Read error: Connection reset by peer
5:29 PM <multichill> addshore: WMF and planning ;-)
5:29 PM → eyoung and Matthew_ joined
5:29 PM <Ryan_Lane> yes, it was planned for May. eqiad migration took priority
5:30 PM <Coren> multichill: The long story short; replicating databases is happening soon (Within the month) Replicating multiple copies of commons and wikidata isn't going to happen that way; it needs to be built into application logic or using federated tables. Almost /all/ of the problems the TS have had with replication were caused by that redundancy and trying to keep it synced.
5:30 PM <multichill> Coren: So you're basically saying Toollabs is useless for me
5:30 PM <Platonides> Template:Ref needed
5:30 PM <Coren> We're all more than happy to help you (and any other maintainer) with adapting your tools to work in that setup.
5:30 PM <scfc_de> Coren: In that case, Tools will not be able to replace Toolserver.
5:31 PM <giftpfla1ze> what are federated tables btw?
5:31 PM <Ryan_Lane> AFAIK toolserver will also have this limitation at some point
5:31 PM <scfc_de> Ryan_Lane: What do you mean?
5:31 PM <multichill> Coren / Ryan_Lane : You want to replace the product. We give specifications of the product. You change the specs and say we're complaining
5:32 PM <Coren> giftpfla1ze: Basically, a "symlink" between databases. You can join with it, provided you are a little careful about the kind of joins you do (so that you don't copy the database at every query!)
5:33 PM <scfc_de> Coren: Have you looked at the queries that are typically run on Toolserver?
5:33 PM <multichill> Platonides: +1. Toolserver databases are running on ancient hardware
5:33 PM <giftpfla1ze> Coren: can you give examples for joins that can/can't be done?
5:34 PM <Coren> scfc_de: Those I can see publically (through the query service). Almost all would work with little to no adaptation.
5:34 PM sankarshan_away → sankarshan
5:35 PM <JohannesK_WMDE> giftpla1ze: +1. i am still not entirely sure what the differences will exist in db replication toolserver vs labs, could you please go into some more detail Coren?
5:35 PM <Coren> giftpfla1ze: They can all /be/ done, but some will be horribly slow. The easy way to know if it'll work well is this: if you are picking rows in the federated table from a 1:1 match over an indexed column, it'll work very well. SELECT fed.a, fed.b, local.a, local.b from local, fed where fed.a=local.a; would be 100% supported at speed.
5:37 PM <halfak> Would all replicated tables be federated?
5:37 PM <scfc_de> Coren: You can see some more for example at
5:38 PM <Lydia_WMDE> ok how about Coren has a look at this in detail after this meeting and we move on to the next question?
5:38 PM <Lydia_WMDE> we have only 20 mins left
5:39 PM <Lydia_WMDE> the next question i have is:
5:39 PM <Lydia_WMDE> [18:30:35] <scfc_de> What's the status of the additional Toolserver admin?
5:40 PM <Silke_WMDE_> scfc_de: This should be clear and finished in the next 10 days or so.
5:40 PM <Silke_WMDE_> This is going to be a paid admin with 15-20 hours per week
5:40 PM <Silke_WMDE_> on the toolserver
5:40 PM <multichill> What is going to be the availability of tool labs? Is it production grade? When is the maintance window? How long in advance are you announcing it? Is it supported 24x7 or less?
5:41 PM <scfc_de> Silke_WMDE_: "Clear and finished" = "hired"? (Taking no chances here :-).)
5:41 PM → jvandavier and lwelling joined ↔ Matthew_ nipped out
5:42 PM <Silke_WMDE_> scfc_de: Hopefully, yes. The thing is that I am the person here to make the ultimate decision on my own. (This is why it has taken a while.)
5:42 PM <Silke_WMDE_> ^^not
5:42 PM awjr_away → awjr
5:42 PM <sumanah> multichill: ok, that's several questions, it's taking a moment for Coren & Ryan_Lane & me to answer :)
5:42 PM <Silke_WMDE_> as in I can't hire people on my own.
5:43 PM <multichill> sumanah: It's basically what SLA to expect ;-)
5:43 PM <Coren> multichill: It's production grade. It's been set up so that it's even possible to do rolling restarts of the entire cluster with migration of jobs to keep them up, for instance. Maintenance windows should be announced at least a week in advance, and would generally not cause interruption of tools over very brief periods while they move from one node to another.
5:43 PM <sumanah> yup
5:43 PM <Ryan_Lane> multichill: It's semi-production
5:43 PM <sumanah> multichill: Tool Labs, as part of Wikimedia Labs, is not production grade, but then again we have really really high standards for what "production grade" means!
5:43 PM <sumanah> Ryan_Lane: would you feel comfortable talking about predicted uptime numbers now?
5:45 PM <Matthew_> Please?
5:45 PM <multichill> Did you guys document what we can expect in terms of availability somewhere?
5:45 PM <halfak> Wait a second. Coren says Labs is production grade, Ryan says it is semi-production and Sumanah says it is not production grade. Huh?
5:45 PM <sumanah> multichill: also, when you say "supported 24x7" you're asking "is there going to be a person pingable and online 24x7"?
5:45 PM <Coren> Terminology confusion. :-)
5:45 PM <sumanah> halfak: I understand your confusion
5:45 PM <sumanah> halfak: yeah, it's all about what "production grade" means
5:45 PM <Silke_WMDE_> This is a question of definitions I think
5:46 PM <sumanah> compared to the vast majority of services available on the internet - and certainly compared to what the Toolserver is now -- Tool Labs is and is going to be reliable.
5:46 PM <multichill> It's all about the expectations. If shit hits the fan this should be clear. What I'm reading here is that you don't even know it for yourself
5:46 PM <sumanah> Compared to - no it's not going to be held to THAT standard of reliability.
5:46 PM mwalker|away → mwalker
5:46 PM <sumanah> So I'm with Ryan_Lane, as I suspect Coren is, that "semi-production" is the right term to use. Hope that helps halfak
5:46 PM <multichill> If something breaks in on Sunday morning, is someone getting out of bed to fix it?
5:46 PM <Ryan_Lane> multichill: yes
5:47 PM <Coren> I'm talking about extectation of uptime, Ryan is talking about SLA and I'm not. :-) Semi-production is better indeed, and matches WMF terminology.
5:47 PM <Ryan_Lane> we aren't going to provide support to the level of other wikimedia projects, but yes, we have people that will fix labs when issues occur
5:48 PM <multichill> *please* document this somewhere. This sounds like what we used to call the "Super Best-Effort SLA": We'll do our best, but we're not promising anything to you
5:48 PM <Ryan_Lane> when wikipedia breaks half of the operations teams shows up on IRC
5:48 PM <Ryan_Lane> that's not going to happen for labs
5:48 PM ⇐ andre__ quit (~andre@wikimedia/aklapper) Quit: andre__
5:48 PM <Matthew_> multichill: AKA what Toolserver does now :P
5:48 PM <Ryan_Lane> assume the level of support will be better than TS
5:49 PM <Silke_WMDE_> I propose to document this: what level of production grade do Labs/ Tool Labs have and what does that exactly mean. Should be written down somewhere in the docs.
5:49 PM <Coren> Matthew_: The difference is that *someone* will show up, maybe just not half of opsen from around the world at once. :-)
5:49 PM <Lydia_WMDE> (fyi: 10 mins left)
5:49 PM <Matthew_> Coren Ryan_Lane: OK, makes sense.
5:49 PM → lizzard joined (
5:50 PM <sumanah> Ryan_Lane: If Tool Labs breaks, someone will be woken up and will work on fixing it and WMF will not pause till it's fixed (barring pathological catastrophe) -- is that fair to say?
5:50 PM → Kolossos joined ↔ Matthew_ nipped out
5:51 PM <Ryan_Lane> sumanah: ideally. yes.
5:51 PM ⇐ Matthew_ quit (~matthew_@wikipedia/matthewrbowker) Remote host closed the connection
5:52 PM <sumanah> also Ryan_Lane if we can point to the Tiers of Support (which WMF services are wake-everyone-up things and which are high-priority and which are lower-priority) that would be great.
5:52 PM <Ryan_Lane> historically this has been the case
5:52 PM <multichill> Do you keep backups of the homedirectories etc?
5:52 PM <sumanah> I understand if it's too complicated to be broken out this way
5:53 PM <multichill> When will ipv6 be supported?
5:54 PM ⇐ HaeB quit (~quassel@wikipedia/HochaufeinemBaum) Ping timeout: 256 seconds
5:54 PM <Silke_WMDE_> (Ryan and Coren are about to answer)
5:54 PM <Ryan_Lane> multichill: we'll be bringing up a zone in eqiad, which will be a new openstack deployment. that zone will start with having IPv6. Once that is working correctly, we'll look at bringing it into the pmtpa zone
5:54 PM <Coren> multichill: We have automatic snapshots of the past 3 hours, 3 days, and 2 weeks on the new fileserver. There are no off-site backups of maintainer data, however. Good practice would have you store important code and config in git or other source control.
5:55 PM <multichill> Block based or file based backups?
5:55 PM → brion joined (~brion@wikipedia/pdpc.professional.brion)
5:55 PM <Ryan_Lane> eqiad = ashburn datacenter, pmtpa = tampa datacenter
5:55 PM <Ryan_Lane> multichill: LVM snapshots
5:55 PM <Ryan_Lane> so, it's on the same fileserver
5:55 PM <Coren> multichill: They are timetravel snapshots; implemented at the block and filesystem level.
5:56 PM <multichill> ok
5:57 PM <Silke_WMDE_> OK, one more question!
5:58 PM <Silke_WMDE_> multichill: Come and meet us at the Hackathon!
5:58 PM <Silke_WMDE_> Who else will be there?
5:58 PM <addshore> I will :)
5:58 PM <Lydia_WMDE> \o/
5:58 PM <sumanah> why it's nice to meet in person :)
5:58 PM <multichill> Silke_WMDE_: You that I'm organizing the hackathon?
5:58 PM <JohannesK_WMDE> i will be there
5:58 PM <Silke_WMDE_> multichill: I just learned that's you from Lydia
5:59 PM <Lydia_WMDE> alright - our hour is unfortunately over
5:59 PM <Lydia_WMDE> if you have more questions please do send them to the mailing list
5:59 PM <halfak> Thanks for your time folks!
5:59 PM <Lydia_WMDE> i will publish the log but probably tomorrow unless someone beats me to it
5:59 PM <Silke_WMDE_> thanks for coming!
5:59 PM — Nettrom seconds halfak, thanks!
6:00 PM <halfak> How can I best keep tabs on new developments?
6:00 PM <halfak> Sorry. Meta-question.
6:00 PM <Silke_WMDE_> halfak: Coren sends updates on toolserver-l and labs-l
6:01 PM <Silke_WMDE_> halfak: I'll gather a list of wiki pages that are regularly updated
6:01 PM <Silke_WMDE_> I don't have all of them here, now
6:01 PM → everton137 joined ↔ jorm popped in
6:02 PM <sumanah> halfak: labs-l is good
6:02 PM <sumanah>
6:02 PM → jorm joined (~bharris@wikimedia/jorm)
6:02 PM <sumanah> halfak: is good