Wikimedia Foundation metrics and activities meetings/Quarterly reviews/Discovery, October 2015
Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material
Present (in the office): Lila Tretikov, Tomasz Finc, Greg Grossmeier, Terry Gilbey, Dan Garry, Max Semenik, Tilman Bayer (taking minutes), Stephen LaPorte, Kevin Leduc, Rachel diCerbo, Moiz Syed; participating remotely: Luis Villa, Katherine Maher, Wes Moran, Trevor Parscal, Arthur Richards
Discovery team is interested in how people discover things on Wikimedia sites
e.g. via maps too [not just search]
only started measuring these KPIs, so no YOY
Lila: can we start monitoring referrer traffic? yes, goal for this q
Lila: no budge on zero results rate? we'll get to that
chose to focus on zero resuls rate as proxy for user satisfaction (assuming people are not happy with 0)
found that bots account for a lot of the search traffic
objective was correct, but not the right measure
Lila: long story short - 30% does include bots? yes
what's the number without bots?
Dan: don't know exact number
Lila: expect it to be higher or lower with bots filtered out?
Dan: lower, e.g. on Dutch Wiktionary 99% of traffic because of one bot
Wes: filter bot traffic on everything; this was part of the learning
Lila: where do we still have prefix search?
Dan: for search box on top of page, also on apps
because it's fast
Lila: don't necessarily need A/B
can switch completely for like 2h, observe effect
Dan: probably not ready for putting it to all users yet
Lila: goal for this q on this?
Dan: won't directly try to impact it
focus mainly user satisfaction
if 0 results rate is still 33% for humans, it absolutely needs to be a goal
Wes: decided to take other goals like lang support as primary goals
(Dan:) WDQS: e.g. list of US presidents born before X
Terry: so launched a new service, still exploring usage
is there a go/no go criterion?
Dan: no, just going to maintain it this q
then look at usage (qualitative / quantitative)
and feature requests (this is still a stripped-down version)
Terry: want to avoid feature creep
Lila: who is the customer?
Dan: not sure
initially motivated by Wikigrok (which was shelved)
were 95% there anyway, decided to roll out and see
Wes: we also had Wikidata team as stakeholder
--> Lila: work with Toby's team to see if we can service some of this [WDQS] data
internal need for this
Wes: had meetings with them
Lila: figure out Venn diagram between Wikidata and infoboxes on WP
Wes: also, natural language queries
found in our surveys etc. that people prefer natural language queries
Lila: yes, Google has trained people to do that
Dan: a prototype demoed at hackathon takes nat lang queries and translates them into WDQS
Greg: do you have a policy on operational support for your project from your team members? (man hours, etc)
Tomasz: should not take website down
ours are tier 3 services
---> Lila: might be a good thing to ask Mark to produce document on Tier1 vs Tier2 vs 3 services (support criteria, ...)
Wes: yes, need to specify that
Wikivoyage maps on Labs
Lila: how hard is it to integrate a map into article?
Tomasz: we have a working prototype for VE
Lila: before that, cluster needs to be ready for usage in production, does Mark know?
Wes, Tomasz: yes
Lila: time frame?
Wes: need to know how it's being used outside WMF
before we can assess full prod usage
e.g. do we build a template, etc.
Lila: understand there is complexity in detail, but need to know when it's production ready
need a quarter to feature this out?
Wes: probably yes
Mark: (Ops) needs preparation time
Lila: this is a critical feature. everything on WP should have a map
Terry: already seeing increase in engineering productivity, but still should get better at notifying people who will need to be involved downstream
Wes: agree, already understanding usage inside and outside much better
Terry: don't need to think way ahead before we can innovate (fill out 15 pages of capital requests ... ;)
need to be able to experiment
Tomasz: agree, I think we hit this many times historically
Dan: search satisfaction KPI
metric is now at 50% (?) which seems very low
Lila: dashboarding is awesome, want this for everything
but how is satisfaction defined?
Dan: first hypothesis was clicks on results
but need to include bounce rate
should still validate this by asking people if they found what they want
Lila: so what's [the definition now]
Dan: click through at 50% and bounce rate
Goal for this q?
Dan: validate this metric
Lila: fine, but also need goal for the metric itself
Wes: if we don't actually understand how it works, we won't succeed
took longer than we thought
Lila: I want one goal - improve our search results ;)
looking at external solutions too? e.g. don't think we need to build own typeahead feature
Dan: that's built into Elasticsearch
--> Lila: Can you link to your goals and other stuff from the dashboard? becoming a central place
--> Lila: can we get performance into KPI?
(Dan:) other successes and misses
Wes: we publicize test before we run them
Lila: can other teams like Readership use this A/B infrastructure?
Dan: this in particular is built into search, but yes in principle
--> Lila: Wes, can you sync up with Gabriel and Toby on this
Dan: core workflows
dynamic scripting: security issues, been working on this for ages, had to change extensions that relied on this (e. Translate extension)
Terry how much effort now in improving ElasticSearch?
Dan: maybe 20-30% of the four engineers who work on search
Dan: core workflows - wikipedia.org
proper code review for that site
at Wikimania talked to the community member who is most involved in maintaining this
Lila: plans for this next q?
Moiz: first runs some small A/B tests
e.g. improve typeahead, change size of search box
then move all the knowledge engine stuff there
--> Lila: make small but frequent changes, get used to that
Moiz: yes, we intend to perhaps have 10-20 week-long tests
Dan: Moiz did a lot of mockups for this, both on small changes and more strategic things
did not succeed in adding Eventlogging to it
--> Lila: need evaluation of how index is going to be run - e.g. will Wikidata be indexed itself, or serve as index
that decision will need to be taken on engineering level
e.g. "fork" of Wikidata?
--> Lila: get the goals on (satisfaction metric) in two weeks from now
Lila: how much of zero results are result of article missing? (on a that wiki)
so we can see how much we could seed with automatic translation
Wes: would be very curious about that too
Tomasz: many search English site for things available in other langs