Wikimedia monthly activities meetings/Quarterly reviews/Analytics/January 2015
Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material
Present (in the office): Leila Zia, Erik Moeller, Abbey Ripstra, Kevin Leduc, Ellery Wulczyn, Dario Taraborelli, Rob Lanphier, Garfield Byrd, Toby Negrin, Andrew Otto, Bob West, Carolynne Schloeder, Terence Gilbey, Tilman Bayer (taking minutes); participating remotely: Dan Andreescu, Aaron Halfaker, Oliver Keyes, Jonathan Morgan
Let's skip the "what we said" part of the slides
What we did
pageviews in vital signs, broken down by the site visited - mobile/desktop
ErikM: should demo the dashboards -
https://metrics.wmflabs.org/static/public/dash --> "add metrics" --> daily pageviews
Damon: how much do we trust this?
Toby: this is legacy data, but the Analytics team stands behind data that we publish
Kevin: Labs hasn't been reliable or performant enough
need another person from Ops to help with database issues. right now it's Sean part-time, need db expert
prototype using Pentaho (open source software)
ErikM: so Pentaho uses new definition?
Kevin: yes, still researching differences
Toby: differences weren't huge
Dario: will talk about that in my part
Kevin: Wikimetrics for Grantmaking
Marcel joined team, worked exclusively on this
datasets for community to consume
hit bottleneck regarding how many events EvenLogging can consume, Batch Inserts removed a bottle neck
ErikM: so these are still Limn dashboards?
What we learned
Christian has done a lot of EventLogging maintenance, but he's leaving now
Toby: yes, that was an Ops task we took on
Kevin: unique clients - this needs community outreach first
Toby: team feels strongly this should not be done without community consultation
Toby: want to call out that as the Analytics team, we use numbers to evaluate our own work too ;)
ErikM: in case of these [EventLogging outages] we were always able to backfill though
Toby: yes, we can backfill database from log file
Research and Data
What we said/did
[skipping "What we said" slides]
this was one of the most productive quarters for Research and Data
in this q, turned PV definition draft into implementation
session-based metrics mostly for mobile team
then handed off to developers
this is our general pipeline
ErikM: regarding data trust:
old definition does not distinguish crawler traffic from "human" traffic, was ambiguous
e.g. presentation at December metrics meeting was based on new definition
comfortable about that
big remaining issues: unique users - we still rely on comScore for that
Dario: in summer, majority of pvs in US was from automated traffic
old def would not have caught that
for FR, in the beginning of q we weren't sure we could use Ellery's new tooling already
but used it successfully
Andrew: What's a traffic researcher?
Toby, Dario: about readers, e.g. "how many social media referrals?"
Other key accomplishments
revscoring: already exists, want to move from standalone to service, also used by community
Toby: for FR, instead of maximizing money, minimize annoyance (eg. measured by impressions by client) for given goal
Toby: Ops issues
Damon: Labs, or Ops?
Toby: Ops including Labs
threat to stability and accuracy of our data
Labs is great environment, just need to make it more stable, or we will need to ...
Dumps are important for us, maintained by just one person, bus factor
Damon: where is search in team's work?
Toby: did one project
ErikM: get external search monitoring doing in next few weeks
search analytics goes back to general issues
Damon: my priorities:
- need to understand users
- make sure VE is successful
- learn about search
Dario: for apps, we did some search analytics
Leila: will have some in ...
Damon: e.g. "what types of searches are happening?"