IRC office hours/Office hours 2016-03-30
People present (lines said)
- CKoerner_WMF (45)
- JustinO (16)
- tfinc (11)
- YairRand (8)
- Deskana (6)
- MaxSem (5)
- wm-labs-meetbot (3)
- yurik (2)
- gehel (1)
- dcausse (1)
- legoktm (1)
Chat on Office Hour and Bluejeans on what's new in Discovery
30 March 2016
18:30 - 19:30 UTC
18:29:05 <CKoerner_WMF> #startmeeting What's new in Discovery | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
18:29:05 <wm-labs-meetbot> Meeting started Wed Mar 30 18:29:05 2016 UTC and is due to finish in 60 minutes. The chair is CKoerner_WMF. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:29:05 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:29:05 <wm-labs-meetbot> The meeting name has been set to 'what_s_new_in_discovery___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_'
18:29:41 * MaxSem waves
18:29:45 <dcausse> o/
18:29:50 <tfinc> CKoerner_WMF: thanks, and greetings to all of those who have joined us
18:30:17 * tfinc waves back to MaxSem and dcausse
18:30:18 <YairRand> (I don't suppose there's any way to access the feed without giving the site webcam access?)
18:30:48 <CKoerner_WMF> YairRand You can disable your webcam, but I can't remember if that's before or after you connect.
18:31:17 <legoktm> o/
18:31:24 <MaxSem> also, scotch
18:31:36 <CKoerner_WMF> Reminder for those who'd like to join us via video/audio, we have a meeting setup here: https://bluejeans.com/388063933/
18:31:46 <MaxSem> as in, tape on your webcam, not whiskey :P
18:32:28 <MaxSem> anyway? questions?
18:33:16 <gehel> https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q4_Goals
18:33:32 <tfinc> First question on blue jeans is about Q4 goals --^
18:34:06 <tfinc> it is a bit weird to be split between video and irc link here
18:34:12 <tfinc> but we'll make the best of it
18:34:12 <CKoerner_WMF> one of the first goals is looking into language detection
18:34:38 <tfinc> specially using https://www.mediawiki.org/wiki/TextCat
18:34:44 <CKoerner_WMF> A/B test this quarter in language detection.
18:35:03 <CKoerner_WMF> Also, this quarter, look into upgrading Elasticsearch
18:35:44 <CKoerner_WMF> improve stability and performance
18:36:27 <YairRand> geospaciel queries, that means searching for things within a certain geographic area?
18:37:18 <CKoerner_WMF> YairRand I'll tee that up for an answer
18:37:42 <MaxSem> YairRand, yes: https://www.mediawiki.org/wiki/Extension:GeoData#list.3Dgeosearch
18:40:54 <tfinc> CKoerner_WMF: i'm always torn between transcribing all the good discussion on the video link vs treating these two as different streams
18:41:05 <CKoerner_WMF> Now talking about clarifying the record of work for Discovery.
18:41:24 <CKoerner_WMF> Deb's talking a little about describing the work on the Portal: https://www.mediawiki.org/wiki/User:DTankersley_(WMF)/Proposals/Wikipedia_Portal_Update
18:41:44 <CKoerner_WMF> As an example of plans for a particular area
18:42:17 <CKoerner_WMF> (also another thing Discovery is working on!)
18:43:07 <CKoerner_WMF> Justin O. Wants more metrics! :)
18:44:15 <tfinc> YairRand: do you have any other questions about GeoSpatial ?
18:44:22 <YairRand> tfinc: nope
18:45:02 <YairRand> actually, is this stuff going to work with geoshapes, once that's available?
18:46:17 <YairRand> that's a planned property datatype in wikidata
18:46:22 <CKoerner_WMF> YairRand Do you mean polygons on interactive maps?
18:46:54 <tfinc> YairRand: what's the use case and do we have a phab task for it? we'll need that to prioritize
18:46:57 <CKoerner_WMF> reference: https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-shapes.html
18:47:14 <CKoerner_WMF> Use cases and phab tickets are welcome to help prioritize!
18:47:37 <YairRand> (searches around on phab)
18:47:58 <Deskana> YairRand: I'm not aware of any Phab tickets for this. :-)
18:48:44 <CKoerner_WMF> I think I see one that YairRand mentioned (planned property datatype in Wikidata) here: https://phabricator.wikimedia.org/T57549
18:49:00 <Deskana> There we go.
18:52:12 <CKoerner_WMF> Now talking about http://www.gdal.org and how we might interpret geospacial data
18:53:59 <CKoerner_WMF> Max is talking about the need for a central repository for such data, if not in wikidata itself.
18:54:37 <CKoerner_WMF> Current plate of work is full, but would be something to look into in the future.
18:54:49 <CKoerner_WMF> Short term maps plan - get maps on Wikipedia!
18:55:27 <JustinO> how many people are on search?
18:56:07 <Deskana> JustinO: About 3.5 engineers.
18:56:10 <JustinO> k
18:56:38 <tfinc> JustinO: + ops + PM
18:56:44 <Deskana> There is a team breakdown here: https://www.mediawiki.org/wiki/Wikimedia_Discovery#The_team
18:56:49 <JustinO> thx
18:58:00 <CKoerner_WMF> Max is reminding us that if you have an interest in something the team is working on join the related task(s) in Phabricator.
18:58:44 <CKoerner_WMF> For those who are possibly new to Phabricator: https://www.mediawiki.org/wiki/Phabricator
18:59:14 <CKoerner_WMF> Now we're talking about making repositiories public.
18:59:22 <CKoerner_WMF> And now on to recall issues
18:59:34 <CKoerner_WMF> me/ What is 'recall' in relation to search?
19:01:26 <CKoerner_WMF> Nevermind. There's a wiki article for that. https://en.wikipedia.org/wiki/Precision_and_recall
19:04:18 <CKoerner_WMF> Justin O. suggests taking results with zero results and running them against other search engines and see what they return. Why do they differ?
19:04:31 <JustinO> take ZRP queries, runtime on google/bing/ddg (theQuery site:wikipedia.org). if google returns a reasonable answer, figure out why.
19:05:25 <JustinO> recall: the ability to bring the correct result on to the search page. precision: placing the result in the best place on the page
19:05:37 <YairRand> *crickets*
19:05:52 <Deskana> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Search_Engines
19:06:24 <JustinO> other recall ideas:
19:06:25 <JustinO> find top queries w/ a low SAT or click rate. categorize them & count the categories. is there an easy category w/ high impact to attack?
19:06:26 <JustinO> run top queries w/ a low SAT or click rate on google & compare.
19:06:26 <JustinO> look at session that begin on wikipedia, then goto google and back to wikipedia.
19:06:32 <tfinc> I'd love any questions/comments/concerns on https://www.mediawiki.org/wiki/Wikimedia_Discovery/FDC_Proposal
19:06:47 <CKoerner_WMF> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes#Survey_of_Zero-Results_Queries
19:06:52 <JustinO> for SAT sessions of len >= 2, look at first query & final query. figure out why the first query wasn't successful.
19:07:59 <JustinO> recall: for SAT sessions of len >= 2, look at first query & final query. figure out why the first query wasn't successful.
19:08:50 <CKoerner_WMF> Question from video chat "Have you looking into other ways to determine satisfaction with search results?"
19:09:06 <CKoerner_WMF> dwell times are one way
19:09:14 <CKoerner_WMF> but what about things like
19:09:14 <CKoerner_WMF> copying the URL
19:09:17 <CKoerner_WMF> or content snippets?
19:09:24 <CKoerner_WMF> signifying that the got the information that they needed?
19:09:55 <CKoerner_WMF> Right now our user satisfaction metrics are desktop only
19:10:06 <yurik> CKoerner_WMF, should i try to join the video chat?
19:10:06 <CKoerner_WMF> looking in the future to do something with mobile (as it's growing)
19:10:15 <CKoerner_WMF> yurik Feel free!
19:10:46 <CKoerner_WMF> web dwarfs apps for search. That's interesting.
19:11:04 <CKoerner_WMF> More fun with numbers here: http://discovery.wmflabs.org/metrics/
19:12:18 <yurik> CKoerner_WMF, apps are like 1% of all of our traffic, maybe less
19:13:00 <CKoerner_WMF> comparing second queries to first when folks have unsatisfied results. Look at the distance (frequency of words)
19:13:12 <Deskana> TFIDF: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
19:13:21 <CKoerner_WMF> ^related to search satisfaction
19:14:30 <CKoerner_WMF> Dan's sharing some stats: http://discovery.wmflabs.org/metrics/#survival
19:17:49 <JustinO> Defining SAT: (what is search success on wikipedia?)
19:17:49 <JustinO> dwell >30s
19:17:49 <JustinO> copy url / snippet contents
19:17:49 <JustinO> click external link & dwell >30s on click
19:17:49 <JustinO> click & click sub-page & dwell >30s
19:19:07 <YairRand> one day someone needs to figure out how to get meetings like this on-wiki without losing any features
19:19:14 <CKoerner_WMF> If there are no more questions we're going to wrap up this chat.
19:19:48 <CKoerner_WMF> #endmeeting
Meeting ended at 19:19:48 UTC.