IRC office hours/Office hours 2014-11-20

From Meta, a Wikimedia project coordination wiki

Structured Data[edit]

Log[edit]

Time: 19:30-20:30 UTC
Channel: #wikimedia-office
Timestamps are in UTC.


[19:31:21] <Lydia_WMDE> anyone here for the office hour?
[19:31:26] <Lydia_WMDE> about structured data on commons?
[19:31:26] <James_F> Heya Lydia_WMDE.
[19:31:31] <Lydia_WMDE> jo James_F!
[19:31:36] <guillom> _o/
[19:31:37] <James_F> I'm just hanging out.
[19:31:43] <Lydia_WMDE> that's cool
[19:31:45] <Lydia_WMDE>  ;-)
[19:32:00] * Lydia_WMDE waits for all the people with questions to show up
[19:32:03] <guillom> And I'm lurking to see if I can recruit people for the cleanup drive!
[19:32:11] * Stryn is just listening here, hi all :)
[19:32:18] <guillom> Hello Stryn!
[19:32:27] <Lydia_WMDE> hey Stryn!
[19:32:46] * AntoineIsaac Antoine (from Europeana) also lurking around
[19:32:48] <fabriceflorin> Hello everyone! Good to reconnect with you all :)
[19:32:56] <guillom> hey fabriceflorin :)
[19:33:13] * marktraceur is here
[19:33:20] <Lydia_WMDE> alright so there are two things on my agenda for today then if no-one has questions already
[19:33:22] * aude waves
[19:33:29] <fabriceflorin> This session is about structured data for multimedia. You can read more about this project here: https://commons.wikimedia.org/wiki/Commons:Structured_data
[19:33:30] <Lydia_WMDE> 1) data access to wikidata for Commons
[19:33:38] <Lydia_WMDE> 2) the hackathon in amsterdam from last weekend
[19:33:58] <Lydia_WMDE> maybe we should start with the hackathon
[19:34:07] <Lydia_WMDE> the focus was glam and wikidata
[19:34:16] <Lydia_WMDE> it was a pretty good hackathon
[19:34:25] <Lydia_WMDE> thanks again to the organizers!
[19:34:35] <fabriceflorin> hello aude, guillom, marktraceur :)
[19:34:48] <jheald> an excellent hackathon
[19:34:49] <Lydia_WMDE> and a lot of cool stuff came out of it
[19:35:17] <fabriceflorin> jheald: So glad you could join the hackathon! Thanks for all you are doing on this project :)
[19:35:20] <Lydia_WMDE> the most impressive thing for me is this: http://sum.bykr.org/
[19:35:50] <Lydia_WMDE> it takes data about paintings from wikidata and the image file from commons and some text from wikipedia and then puts it into a nice view
[19:36:03] <Lydia_WMDE> pretty slick if you ask me for a weekend's work
[19:36:11] <thedj> so finally we can get information OUT of wiki :)
[19:36:18] <Lydia_WMDE> :D
[19:36:51] <Lydia_WMDE> another cool thing for me was an API wrapper for wikidata that makes it easier to work with its data if you want to build tools like the one i just mentioned
[19:37:04] <Lydia_WMDE> for the other people who attended the hackathon: what was your highlight?
[19:37:55] <marktraceur> Seeing how much awesome data got pulled into the system was great for me
[19:38:00] <Lydia_WMDE> yes!
[19:38:05] <fabriceflorin> thedj: I heard from tgr that you did some amazing work, can you tell us about it?
[19:38:58] <thedj> i like that gilles coded up the changes to improve statistics for image viewing. on the spot, after Erik figured he needed them
[19:39:25] <aude> gergo worked on https://github.com/creative-work-metadata/creative-work-metadata :)
[19:39:52] <aude> uses wikibase data model / values to represent commons data (from templates)
[19:40:03] <fabriceflorin> thedj: yeah, gilles is a pretty resourceful guy :) (he’s sorry he couldn’t join tonight, but sends his greetings)
[19:40:55] <Lydia_WMDE> any other highlights to share?
[19:41:03] <thedj> http://etherpad.wikimedia.org/p/glamdata
[19:41:07] <Lydia_WMDE> or questions about it?
[19:41:12] <Lydia_WMDE> thx thedj!
[19:41:32] <Josve05a> "Table 1: Getting shitloads of wiki data in Wikidata" :P
[19:41:37] <Lydia_WMDE> hehe
[19:42:09] <jane023> I was happy to see so much work on items for paintings, but also happy to see that people are also willing to talk about items for engravings
[19:42:27] <Lydia_WMDE>  :)
[19:42:29] <fabriceflorin> Here are some hackathon hilights which Gilles shared with us at our weekly multimedia meeting: http://etherpad.wikimedia.org/p/multimedia-weekly-meeting-2014-11-19
[19:42:51] <fabriceflorin> (in the second section from the top)
[19:43:11] <marktraceur> Gosh tgr, way to be late
[19:43:15] <marktraceur>  :)
[19:43:28] <jheald> I was knocked out by the people, the environment, the chance to talk to people, the training day on Friday. I had a really useful conversation with Bas, not re wikidata, that will really help me & learnt lots from Fae too
[19:43:46] <Lydia_WMDE> sweet! that's why we do them, right?
[19:43:55] <jheald> But simply the chance to meet everybody, and talk.
[19:43:56] <fabriceflorin> Kudos to Keegan|Away for cleaning up 250 files lacking {{information}} on Commons (he’s been sick for the past few days, which is why he can’t join us now, sorry)
[19:44:10] <guillom> fabriceflorin: Oh? I didn't even know!
[19:44:19] <jheald> If Keegan has the same cold that I have, it's miserable
[19:44:27] <fabriceflorin> guillom: that’s how the word spreads out :)
[19:44:27] <Lydia_WMDE> get well soon!
[19:44:36] <marktraceur> Y'all had better not have gotten me sick
[19:44:45] <marktraceur> My vengeance will be swift
[19:44:47] <jheald> sorry mark
[19:44:52] <fabriceflorin> jheald: sorry you also got the same cold. Hope you feel better soon :(
[19:45:07] <Lydia_WMDE> alright. anything else to share wrt to hackathon or should we move on to the next topic?
[19:45:13] <guillom> fabriceflorin: indeed!
[19:45:42] <marktraceur> !second move on
[19:45:50] <Lydia_WMDE> cool
[19:46:00] <Lydia_WMDE> so second topic i have is enabling data access on commons
[19:46:21] <Lydia_WMDE> I've been asked by jheald and others when Commons will get access to the existing data on Wikidata
[19:46:30] <Lydia_WMDE> so things like date of birth of an artist
[19:46:48] <Lydia_WMDE> and i think we can do that on december 2nd
[19:46:58] <jheald> cool!
[19:47:03] <jane023> cool
[19:47:04] <multichill> Lydia_WMDE: Direct or random access?
[19:47:07] <Lydia_WMDE> there will still be some big limitations to this data access but it's a good start
[19:47:20] <Lydia_WMDE> multichill: direct - let me explain
[19:47:23] <multichill> Because now you make it sound cooler than it actually is :P
[19:47:50] <Lydia_WMDE> so the limitation will be that you can only access data that is on the item that is directly connected to the page you are on via a sitelink
[19:48:03] <Lydia_WMDE> we're still working on a feature we call arbitrary access
[19:48:09] <Lydia_WMDE> that'll come in january or february
[19:48:16] <Lydia_WMDE> with that you can get data about any item on Wikidata
[19:48:41] <Lydia_WMDE> but i think the limited access over christmas is a good thing to get started
[19:48:49] <Lydia_WMDE> and then expand early next year
[19:48:55] <Lydia_WMDE> how does that sound?
[19:49:20] <Lydia_WMDE> so to be clear: this is not about file metadata like the license of an individual file - but about data about broader topics
[19:49:33] <jane023> Hmm - so a painting file on Commons will be able to get data about the painting, but not about the institution that holds it in its collection or about its creator
[19:49:34] <multichill> Good to say that in the first stage we'll have access to everythink on the connected items and the statements. From the connected items through statements we can access the labels and the sitelinks
[19:49:37] <Lydia_WMDE> file metadata will come later and they will be stored directly on commons
[19:49:40] <marktraceur> Lydia_WMDE: So more like using Wikidata to populate the {{artwork}} template data.
[19:49:52] <jheald> jane023: not even that
[19:50:03] <multichill> So templates like https://commons.wikimedia.org/wiki/Template:Amsterdam could be replaced with one Wikidata call
[19:50:06] <Lydia_WMDE> marktraceur: yes though that needs arbitrary access then for all i know
[19:50:10] <marktraceur> Ah.
[19:50:14] <tgr> direct access will be useful mostly on category pages, I imagine
[19:50:27] <multichill> tgr: Exactly
[19:50:30] <Lydia_WMDE> *nod*
[19:50:50] <tgr> not so much on templates because once you include them the linked page will be the one linked to the including page
[19:50:57] <Lydia_WMDE> right
[19:50:58] <tgr> ie. nothing, most of the time
[19:50:59] <jheald> mark -- we can't populate the artiwork template generally, because it is used on File pages, that are not linked
[19:51:07] <thedj> gtg
[19:51:12] <Lydia_WMDE> cu thedj
[19:51:14] <jheald> So it's only on gallery pages they will work
[19:51:19] <multichill> thedj: Yes, won't work for https://commons.wikimedia.org/wiki/Creator:Rembrandt , but for {{Amsterdam}} you only need labels and sitelinks
[19:51:46] <Lydia_WMDE> and keep in mind that january/february is coming soon...
[19:51:48] <Lydia_WMDE>  ;-)
[19:52:09] <tgr> arbitrary access in Jan/Feb sounds awesome
[19:52:15] <marktraceur> Yeah, whoa, it's almost December already.
[19:52:21] <Lydia_WMDE> indeed!
[19:52:21] <jheald> multichill -- interesting: so labels *can* be accessed arbitrarily?
[19:52:40] <multichill> Yup, otherwise all your statements would be really boring
[19:52:40] <jane023> got it
[19:52:41] <Lydia_WMDE> jheald: no but they are connected in the item that is directly connected so that works
[19:52:57] <Lydia_WMDE> so not arbitrary but for those linked through statements yes
[19:53:45] <multichill> Maybe good to mention we connected a lot of creator pages and institution pages to Wikidata and from Wikidata back to Commons
[19:53:49] <Lydia_WMDE> if there is no major ohnoesdon'tdothat here now i will go ahead and announce that in the apropriate channels
[19:54:04] <multichill> ohyespleasedothatnow!
[19:54:08] <Lydia_WMDE> haha
[19:54:12] <Lydia_WMDE> ai ai sir
[19:54:40] <jheald> I'll be filling in some of the data to creator items from creator templates soon too
[19:54:47] <Lydia_WMDE> sweet
[19:55:11] * multichill really likes http://sum.bykr.org/
[19:55:11] <jheald> the dj has gone, but are there any more thoughts on the gadget to go on category pages?
[19:55:18] <multichill> Creators on the right
[19:55:22] <jheald> I think Commons people would linke that
[19:55:26] <jheald> like
[19:56:14] <Lydia_WMDE> jheald: i think it is best to ping thedj about it when he is back
[19:56:19] <Lydia_WMDE> i don't know more about it
[19:56:24] <fabriceflorin> multichill: Yeah, the Sum of All Paintings is a really impressive use of wikidata for multimedia :)
[19:56:26] <jheald> okay
[19:56:44] <Lydia_WMDE> any other questions on the topic of data access?
[19:56:49] <Lydia_WMDE> or any other topics?
[19:56:54] <multichill> fabriceflorin: And no technical debt to start with makes it easier ;-)
[19:57:00] <Lydia_WMDE> hehe
[19:57:07] <jheald> One other thing on the data side. We've currently got ~75,000 article-like items that sitelink to Commons cats
[19:57:10] <fabriceflorin> multichill: hehe :)
[19:57:22] <jheald> (compared to 10x that number using P373)
[19:57:36] <jheald> Is this urgent to clear up before Dec ?
[19:57:39] <multichill> Don't forget the category items that link to Commons jheald
[19:57:46] <Lydia_WMDE> jheald: hmmm sounds like a task for a bot? and no i don't think it is
[19:58:39] <jheald> multichill: about 250,000 cat-like items that link to Commons cats
[19:58:53] <jheald> personally, I would be very slow to create any more
[19:59:28] <multichill> Yeah, we shouldn't create new items for just Commons categories. That's out of scope anyway
[19:59:49] <Lydia_WMDE> no issue with taking it slow at all
[19:59:54] <Lydia_WMDE> i think that's a good aproach
[20:00:11] <jheald> probably better ultimately handled with items on Commons for commonscats
[20:00:13] <jane023> unless those are painter cats for whom we need items still...
[20:00:26] <jheald> but that's still open, I think
[20:00:36] <fabriceflorin> A quick update on the documentation front: we’ve been adding more notes and pages on this development page, based on our Berlin bootcamp in October:
[20:00:37] <fabriceflorin> https://commons.wikimedia.org/wiki/Commons:Structured_data/Development
[20:01:09] <fabriceflorin> We updated this page to add new sections like Design, Research and Rights, as well as edited the Roadmap section.
[20:01:19] <multichill> fabriceflorin: Can you talk a bit more about the timeline?
[20:01:28] <fabriceflorin> The Structured Data slides have also been updated, and we added slides for the shorter presentation made at the Monthly Metrics meeting on Nov. 6, which include an overview of the Metadata Cleanup Drive.
[20:01:59] <fabriceflorin> multichill: Yes, we had to push back development a bit to address critical issues that the multimedia and wikidata teams have to deal with right now.
[20:02:13] <multichill> define a bit
[20:02:30] <multichill> hours/days/weeks/months/years?
[20:02:38] <fabriceflorin> So we want to use the time to get more community input about the proposals on the development page, as well as other ideas for implementing structured data on Commons.
[20:03:19] <fabriceflorin> This roadmap gives a sense of the next stages ahead of us: https://commons.wikimedia.org/wiki/Commons:Structured_data/Development#Roadmap
[20:03:39] <jheald> jane023: it's hard to tell. Of the 250,000 Commons cats, about 100,000 have no as yet identifiable article-like item.
[20:04:03] <multichill> I know the roadmap, but I would like to know when you start driving and what speed
[20:04:08] <fabriceflorin> We are still in stage 1 right now, and would like to get more community input on the data model and user expectations before serious development can start.
[20:04:50] <fabriceflorin> We’re discussing resource allocations right now at the Foundation, and folks like Eloquence and Damon can address these questions better than I can.
[20:05:04] <jane023> we need the creator items for all the artworks in "sum of all paintings"
[20:05:23] <Lydia_WMDE> from the wikidata team's side we are working on cleaning up the underlying code (assumptions we've made in the past about items for example) and arbitrary access. at the same time we are helping with finding a suitable query backend for the data in wikidata that will also be needed for commons
[20:06:03] <AntoineIsaac> fabriceflorin: giving you input on the data model is certainl still on our (Europeana) agenda!
[20:06:11] <Lydia_WMDE> and performance improvements which are sorely needed for anything really
[20:06:21] <jheald> jane023: wikidata items for the creators, presumably, not necessarily Commons creator templates
[20:06:44] <fabriceflorin> And the multimedia team is focused on fixing major bugs on the upload pipeline for the next month or so, before we can switch our attention to other projects. So structured data remains in planning stages for the rest of the year.
[20:07:14] <jane023> jheald: exactly
[20:07:18] <jheald> So, what is the timescale we're thinking of to lock up things like what things will have items, and where?
[20:07:41] <dennyvrandecic> hi, sorry, late to the party - if we switch on direct access to wikidata on commons, what would stop us to add license data there?
[20:07:55] <fabriceflorin> tgr: Are you planning to start a discussion of data models in coming days? I think the plan was to post notes on the data model we discussed in Berlin, and invite more comments from a wider community.
[20:07:59] <Lydia_WMDE> dennyvrandecic: nothing but convention
[20:08:03] <marktraceur> dennyvrandecic: Because license data should live in Commons, I think
[20:08:11] <multichill> dennyvrandecic: Creating items for files is out of scope
[20:08:20] <jheald> dennyvrandecic: there would be no acces from file pages
[20:08:32] <dennyvrandecic> so what's the benfit of switching on direct access?
[20:08:52] <multichill> dennyvrandecic: You're late! Scroll back ;-) (for example the template Amsterdam example)
[20:08:57] <jheald> gallery pages, category pages, and being able to try stuff out
[20:09:18] <multichill> Less local i18n hacks, more Wikidata hacks
[20:09:19] <dennyvrandecic> the template amsterdam example won't work because the template included will be on another page
[20:09:49] <dennyvrandecic> gallery and category pages would abuse the system even more than we would do it if we would allow file metadata
[20:10:03] <jheald> that's true, re template amsterdam
[20:10:30] <jheald> gallery & category pages actually don't abuse the system that much (see numbers above)
[20:10:33] <fabriceflorin> AntoineIsaac: Glad you’re open to giving more input on the data model :) I encourage you to connect with tgr multichill and Lydia_WMDE to follow up, once tgr updates our documentation.
[20:11:04] <jheald> & if the dj's toy could be got to work, that would remove most of the motivation for abuse from Category pages
[20:11:27] <dennyvrandecic> i mean, if the community would decide that this is all going too slow, and since we can have structured data about the files now accessible through direct access, no one would stop them, right?
[20:11:35] <multichill> dennyvrandecic: Can't I just do mw.wikibase.getEntityObject( 'Q42' ) and use the labels? This always confuses me
[20:11:53] <multichill> The Wikidata community would kill all the file items
[20:11:57] <jheald> there is no direct access, because there are no wikidata items for files
[20:12:29] <dennyvrandecic> but there could be, and half of the stuff we try to achieve would be working out of the box
[20:12:48] <AntoineIsaac> fabriceflorin: we were planning to try and give input through https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVjmQqs/ should we wait?
[20:13:11] <fabriceflorin> Hi folks, I have to leave now, but wanted to give a big thanks to all of you who are driving this project forward! It’s a real pleasure to see this initiative expand as a real partnership between the community and the foundation. Your work is really inspiring and we’re grateful for it :)
[20:13:34] <Lydia_WMDE> AntoineIsaac: that seems fine
[20:13:49] <AntoineIsaac> Lydia_WMDE: ok thx!
[20:13:54] <dennyvrandecic> the most powerful use of direct access seems to create the entities for the file-pages and add metadata about them
[20:14:14] <aude> it's a denny :)
[20:14:19] <fabriceflorin> And the collaboration with the wikidata team is a wonderful experience, thanks in large part to Lydia_WMDE ’s guidance. Always a pleasure working with you :)
[20:14:21] <dennyvrandecic> aude: hi :)
[20:14:26] <Lydia_WMDE> yes it would be but my hope is that we're all clear on not wanting that but instead having the file data on commons later
[20:14:32] <Lydia_WMDE> which also seems to be what commons wants
[20:15:04] <jheald> (Correction: there are currently a total of eight items sitelinked to files https://www.wikidata.org/wiki/Wikidata:WikiProject_Structured_Data_for_Commons/Phase_1_progress/Links/File )
[20:15:16] <dennyvrandecic> well, changing that later would be easy since it is structured data
[20:15:51] <dennyvrandecic> they could go ahead, create the file entities now, and do most of the roadmap for this project
[20:15:51] <jane023> I agree file data should stay on commons
[20:15:52] <jheald> insane to start linking things up before we've designed the data structure
[20:16:19] <dennyvrandecic> jheald: disagree. schema last is a pretty good principle imho :)
[20:16:37] <jane023> jheald: but that is the wiki way
[20:16:42] <dennyvrandecic> yep
[20:16:56] <dennyvrandecic> +1 to jane
[20:17:19] <jheald> ... at least the basic questions of what things can have Commons items
[20:17:35] <jheald> eg: Commons categories ?
[20:17:53] <jheald> eg: Information shared between multiple files ?
[20:18:05] <dennyvrandecic> it's up to the use cases really
[20:18:28] <jheald> things for which full wikidata items are not necessary/appropriate
[20:18:29] <jane023> institutions, creators, ...
[20:18:53] <jane023> art collectors
[20:18:58] <jheald> denny: it's up to use cases, but there is also whether it will be technically enabled
[20:19:14] <fabriceflorin> Bye for now … You are all invited to leave comments, questions and suggestions on the Structured Data talk page: https://commons.wikimedia.org/wiki/Commons_talk:Structured_data
[20:19:22] <jheald> jane023: all of those WILL have items on Wikidata
[20:19:24] <jane023> do we need rules?
[20:19:26] <Lydia_WMDE> cu fabriceflorin :)
[20:19:41] <marktraceur> Lydia_WMDE: Why, is he socking? :P
[20:19:47] <Lydia_WMDE> lol
[20:19:52] <Lydia_WMDE> who knows!
[20:19:55] <jheald> ... but there are things that won't / shouldn't
[20:19:58] <fabriceflorin> Thanks again, you all. Over and out.
[20:20:01] <jane023> jheald:I have a list of 900 categories on commons that do not yet have items on Wikidata
[20:20:05] <marktraceur> fabriceflorin === Fae, obvs
[20:20:13] <marktraceur> I've never seen them in the same room.
[20:20:21] <dennyvrandecic> what I wanted to point out is that direct access so early might lead to usage patterns that are not desirable
[20:20:40] <jheald> jane023: I have a list of 100,000
[20:20:50] <dennyvrandecic> the technical infrastructure would enable plenty of cool things, but we would say "but you are not allowed to use it this way"
[20:21:16] <Lydia_WMDE> dennyvrandecic: yes definitely but my impression is that the wikidata community understands that it shouldn't store data about files and that the commons community doesn't want to give up control of that data. so we should be fine
[20:21:24] <jheald> denny: that is a discussion that at some point we need to have
[20:21:25] <jane023> but do we have an idea of undesirable usage patterns? Maybe that is a conversation worth having
[20:21:44] <Lydia_WMDE> jane023: storing data about files on wikidata
[20:21:55] <multichill> jane023: Creating items for files is one
[20:21:56] <Lydia_WMDE> that is the one undesirable usage pattern there
[20:22:31] <dennyvrandecic> maybe just skip the direct access enabling - it just invites frustration
[20:22:39] <dennyvrandecic> one or two months later, we will have random access
[20:22:39] <jheald> what about items on Commons which relate to multiple files (no single file?)
[20:22:44] <dennyvrandecic> which makes much more sense
[20:23:00] <jane023> I tend to agree
[20:23:00] <multichill> jheald: That doesn't work anyway without random access
[20:23:16] <dennyvrandecic> jheald: what multichill says
[20:23:44] <jheald> multichill: obviously. but we need to think about the structures we're headed towards
[20:24:00] <dennyvrandecic> direct access is dangerously seductive I am afraid, because it seems to promise so much, but "everyone" agrees that is is "a bad thing"
[20:24:10] <multichill> First step in a long run. Limited access focused on improving i18n
[20:24:23] <jheald> denny: an RfC on Commons was unanimous in requesting it
[20:24:33] <multichill> dennyvrandecic: You can create items for user pages too
[20:24:34] <jane023> yes - at first I thought it would be good, but not anymore
[20:24:49] <multichill> That's not allowed either and I see it rarely happen
[20:25:34] <multichill> "People might do something stupid with it, let's not do it" is really not the wiki way
[20:25:53] <jane023> true
[20:26:00] <jheald> https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#RfC:_Should_we_request_Wikidata_Phase_2_to_be_activated_on_Commons.3F
[20:26:08] <multichill> But I do agree that communication is important here and we should address this clearly
[20:26:53] <multichill> So we need to explain Andy that creator templates still won't work with phase2 enabled
[20:27:04] <dennyvrandecic> jheald: the RFC seems mostly confused and there is no discussion about possible wrong incentives
[20:27:08] <jheald> Andy Mabbett ?
[20:27:11] <multichill> Phase1, phase2, we're talking vpn's :P
[20:27:21] <jane023> yes pigsonthewing
[20:27:36] <multichill> Yes, last line in the rfc
[20:27:46] <dennyvrandecic> I would disagree that always do what the community wants is best for the community :)
[20:27:52] <DanielK_WMDE> hm, i guess i'm just in time for the wrap-up ;)
[20:28:18] <jheald> I think it's just Andy saying he wants phase 3 a.s.a.p.
[20:28:19] <multichill> DanielK_WMDE: {{int:lang}} replacement?
[20:28:38] <multichill> Would be good for the Commons community too :-)
[20:29:09] <marktraceur> phase2/Wikidata.php
[20:29:11] <DanielK_WMDE> multichill: "replacement"?
[20:29:11] <dennyvrandecic> direct access enables so many use cases that there is the potential of people wanting to try them out, which might lead to a lot of energy poured into discussions and conflict
[20:29:15] <multichill> Lydia_WMDE: Probably best to do the next office hour somewhere in January
[20:29:27] <marktraceur> Maybe we should rewrite it to use multiple files.
[20:29:48] <multichill> Yes, we're using {{int:lang}} in LUA right now. I thought you were also working on some possible replacement
[20:30:17] <DanielK_WMDE> multichill: to get content in the user language, you mean?
[20:30:18] <dennyvrandecic> but then again, fortunately I am not making the call :) and it is temporarily only. it will all end up well. it is just that it would be nice to reduce possible issues on the way
[20:30:21] <multichill> yup
[20:30:34] <Lydia_WMDE> i am happy to wait until we have arbitrary access and as i said the timeline for that is january/february as it looks now. but i am willing to trust the community on not abusing this (otherwise i'm quick to turn it off again...)
[20:30:37] <DanielK_WMDE> talked about it with hoo, yes
[20:30:41] <DanielK_WMDE> nothing definite, yet
[20:30:48] <DanielK_WMDE> makes sense for multilingual wikis i guess
[20:31:10] <DanielK_WMDE> but plitting the parser cache by user language may be a problem for high traffic site
[20:31:12] <DanielK_WMDE> s
[20:31:14] <multichill> Combination of that and proper Wikidata access would be a good reason to rebuild i18n at Commons
[20:31:29] <multichill> That's no problem at Commons
[20:31:47] <DanielK_WMDE> yea, i know, but different modes for different wikis is always a bit icky.
[20:32:15] <DanielK_WMDE> anyway - improved lua support, client-side usage tracking, and arbitrary access should be available Really Soon Now.
[20:32:18] <multichill> Other wiki's didn't bother to create the int:lang in the first place
[20:32:47] <multichill> DanielK_WMDE: I'll buy you a beer in SF if you manage to get it enabled by then ;-)
[20:32:47] <DanielK_WMDE> but if I say a date, Lydia_WMDE will Not Be Amused ;)
[20:32:49] <Lydia_WMDE> dennyvrandecic: the other thing i am worried about if we turn it on only once we have arbitrary access: performance implications. i'd rather see this build up
[20:32:58] <Lydia_WMDE> DanielK_WMDE: :D
[20:33:01] <Lydia_WMDE> nah do it
[20:33:08] <DanielK_WMDE> multichill: heh... could work :P
[20:33:23] <multichill> Otherwise you have to way untill May before you get beer from me
[20:33:34] <DanielK_WMDE> i was going to say early next year. End of January is a challange, but doable,
[20:33:37] <dennyvrandecic> Lydia_WMDE: what performance implications would commons bring beyond what the wikipedias are already bringing?
[20:33:55] <aude> probably we can have arbitrary access on commons first? before wikipedias?
[20:33:58] <Lydia_WMDE> dennyvrandecic: no one has arbitrary access but wikidata
[20:34:01] <aude> e.g. as a 'beta' tester
[20:34:11] <Lydia_WMDE> yeah
[20:34:13] <DanielK_WMDE> aude: i'd like that
[20:34:14] <aude> wikidata is w/o usage tracking
[20:34:22] <dennyvrandecic> yeah, that makes sense
[20:34:25] <tgr> AntoineIsaac: sorry for being so unresponsive about metadata schema issues, any feedback on them is certainly greatly welcome, but it will take a week or two before I can incorporate it as we have to work on some other projects right now
[20:34:28] <DanielK_WMDE> we should try out tracking on wikidata first
[20:34:32] <aude> +1
[20:34:41] <DanielK_WMDE> not enough data there to do db profiling, though
[20:34:43] <dennyvrandecic> but it still does not imply to introduce direct access first
[20:34:48] <DanielK_WMDE> too few pages use it
[20:34:56] <aude> rolling it out everywhere at once is somewhat scary because of performance implications
[20:34:59] <aude> (e.g. enwiki)
[20:35:01] <dennyvrandecic> having random access first on commons (after wikidata itself) does make sense though
[20:35:26] <multichill> Wikidata -> Commons -> Meta -> Incubator -> ....
[20:35:43] <tgr> although if others are interested in acting as editors for that document, I can move it to the wiki so it is more open to collaboration - I don't mean to be the owner of it
[20:35:46] * DanielK_WMDE is thinking about mediawiki.org
[20:35:53] <multichill> ...... -> enwiki
[20:36:01] <aude>  :)
[20:36:05] <dennyvrandecic> oh come on, Wikipedia has to be much higher on that list
[20:36:13] <dennyvrandecic> Wikidata, Commons, Wikipedia
[20:36:14] <AntoineIsaac> tgr: I understand. Anyway having this timing info is very important for our planning. Thx!
[20:36:43] <DanielK_WMDE> arbitrary access is so useful, and unproblematic in most cases, but... SOMEONE is going to abuse it in HORRIBLE ways.
[20:36:44] <Lydia_WMDE> dennyvrandecic: probably start with smaller wikipedias again though
[20:36:45] <DanielK_WMDE> ugh
[20:36:53] * multichill raises hand
[20:36:58] <DanielK_WMDE> hehehe
[20:37:01] <Lydia_WMDE> lol
[20:37:08] <dennyvrandecic> Lydia_WMDE: agreed with that. But not enwiki after meta :P
[20:37:15] <Lydia_WMDE> hehe ok
[20:37:20] <aude> meta doesn't even have phase1
[20:37:22] <multichill> And now I have two extra months to think about all the horrible ways ;-)
[20:37:31] <DanielK_WMDE> multichill: as long as the access to "other" items is "narrow", it should be fine.
[20:37:32] <Lydia_WMDE> Oo
[20:37:58] <DanielK_WMDE> hm, we may have to browaden the definition of "label usage" to "label in any language" instead of "label in content language".#
[20:38:15] <DanielK_WMDE> at least for multilingual wikis
[20:38:24] <Lydia_WMDE> DanielK_WMDE: for usage tracking?
[20:38:30] <multichill> Did anyone start the office hour bot?
[20:38:34] <Lydia_WMDE> nope
[20:38:37] <multichill> lol
[20:38:41] <dennyvrandecic> usage tracking would be useful even for label lookup
[20:39:08] <multichill> jane023: Want to see if we can reduce the painter list to zero? ;-)
[20:39:11] <dennyvrandecic> the only reason we started label lookup without usage tracking is it would have delayed that too far otherwise :P
[20:39:20] <Lydia_WMDE> alright folks. we should wrap up. can i get a final yay or nay on direct access on dec 2nd before i sleep on it for another night?
[20:39:34] <jheald> yay>
[20:39:39] <multichill> +!
[20:39:41] <multichill> +1
[20:39:42] * marktraceur +1's
[20:40:16] <multichill> Oh, btw, JeanFred is importing artworks from Commons ;-)
[20:40:18] <dennyvrandecic> I would say "nay, but since it will happen anyway, introduce it quietly and explicitly with 'this is only for testing and in preparation of greater things'"
[20:40:26] <Lydia_WMDE> heh
[20:40:27] <Lydia_WMDE> ok
[20:40:41] * multichill agrees with Denny
[20:41:00] <dennyvrandecic> as said, I am afraid it can create an incentive structure that will lead to pain and frustration
[20:41:08] <multichill> See it as the Commons Wikidata baby. We're going to make it crawl before it will walk
[20:41:12] <jane023> yes and yay?
[20:41:13] <jheald> multichill: *which* artworks ?
[20:41:25] <tgr> AntoineIsaac: also, there will be a long period (months) in which the schema will be still open to changes, it's just the 0.2 version that I hope to publish in 1-2 weeks so that's not a deadline on feedback
[20:41:26] <Lydia_WMDE> lol multichill
[20:41:31] <dennyvrandecic> i am not saying that it is not useful. just dangerous for the community :(
[20:41:44] <multichill> jheald: See https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_sum_of_all_paintings#All_of_Commons_artwork interesting ones. Mainly French ones from museums that didn't properly open up their data yet
[20:41:55] <Lydia_WMDE> dennyvrandecic: will add big disclaimer
[20:42:05] <marktraceur> multichill: We can't just chuck it in the deep end of the pool and see what happens?
[20:42:05] <dennyvrandecic>  :) thanks
[20:42:12] <dennyvrandecic> sorry for being a PITA
[20:42:14] <jheald> denny: we can track files & cate linked to article-items very easily & reverse links that get made. It really shouldn't be a problem.
[20:42:15] <marktraceur> That sounded less terrible in my head
[20:42:21] <Lydia_WMDE> that's why we love you dennyvrandecic ;-)
[20:42:46] <dennyvrandecic> jheald: I am not saying that it cannot be easily fixed with tools
[20:43:04] <multichill> haha, makes me think of Nevermind
[20:43:16] <dennyvrandecic> jheald: I am saying that people might easily be mislead in what they can do and put a lot of effort into doing something, and others will put a lot of effort in stopping them
[20:43:30] <dennyvrandecic> and then good people will have friction with other good people
[20:43:46] <dennyvrandecic> which in general is something to be avoided in a collaborative project
[20:44:03] <AntoineIsaac> tgr: alright. We'll try not to use this info as an excuse for sending late feedback though ;-) we'd like you to feel that we're supporting the process.
[20:44:13] <jheald> So we make clear it's only articles<->articles, categories<->categories, and files to nothing. That's a lesson ppl have to learn anyway
[20:44:55] <jane023> Maybe we will get lots more galleries on Commons this way
[20:45:00] <dennyvrandecic> jheald: and enforcing that lesson is exactly the painpoint I was trying to avoid... :)
[20:45:40] <jheald> Which is why the dj's reasonator-link toy would be useful
[20:46:27] <jheald> because it deals with one at least of the 2 main incorrect use-cases
[20:46:55] <dennyvrandecic> without direct access there is basically no incentive for a files to items mapping.
[20:47:08] <dennyvrandecic> which makes enforcing that lesson much easier
[20:47:35] <tgr> dennyvrandecic: can't you just namespace-limit direct access?
[20:47:41] <tgr> in the software, I mean
[20:48:00] <multichill> I'm happy to see that a lot of the people who signed up at https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings are active here ;-)
[20:48:05] <dennyvrandecic> tgr: dunno. that would also be a solution. ^Lydia, DanielK_WMDE, aude?
[20:48:11] <multichill> Maybe more people want to help? :-D
[20:48:23] <Lydia_WMDE> dennyvrandecic: i think we already have an abuse filter for that even
[20:48:30] <Lydia_WMDE> if not it should be easy enough to create
[20:48:51] <dennyvrandecic> abusefilter for not allowing to link to file namespace?
[20:48:52] <jheald> there's 8 file <-> item sitelinks at the moment
[20:48:55] <Lydia_WMDE> yeah
[20:48:56] <aude> dennyvrandecic: we can
[20:49:03] <aude> i think we have a setting for that
[20:49:10] <aude> err tgr
[20:49:10] <Lydia_WMDE> or that
[20:49:16] <dennyvrandecic> tgr: sounds reasonable to me then
[20:49:21] <Lydia_WMDE> either is fine with me
[20:49:39] <dennyvrandecic> yeah in that case i would drop my objections
[20:49:49] <aude> we might already disallow file namespace (e.g. on wikipedias)
[20:50:05] <Lydia_WMDE> ok let's make sure that's the case then before we enable it
[20:50:14] <Lydia_WMDE> i will tesst
[20:50:17] <Lydia_WMDE> test even
[20:50:28] <jane023> I didn't even think about file namespace on Wikipedias
[20:50:41] <aude> yep, setting already there
[20:50:51] <Lydia_WMDE> alright folks. we're running way over tie but i think this was useful. so yay :)
[20:50:56] <Lydia_WMDE> aude: sweet
[20:51:01] <Lydia_WMDE> *time
[20:51:08] <jane023> OK bye
[20:51:18] <Lydia_WMDE> any remaining things?
[20:51:26] <Lydia_WMDE> now's the time to raise them :D
[20:51:38] <jheald> the q about timescale
[20:51:58] <jheald> for where an item can & can't be to be decided ?
[20:52:07] <jheald> Or is that way into the future ?
[20:52:28] <Lydia_WMDE> jheald: from my side it is pretty clear that all items should go on wikidata tbh. everything else would be extremely confusing and hell to manage
[20:52:54] <Lydia_WMDE> because the line is extremely hard to draw
[20:53:10] <Lydia_WMDE> and the benefit would be small imho
[20:53:28] <jheald> the trouble is information shared between pics, that it is hard to define a WD item for
[20:53:43] <jheald> I don't think it is that rare
[20:54:23] <jheald> & I think such data fits better with Commons to manage, socially
[20:54:54] <jheald> I know Daniel disagrees with me, but Gilles and Gergo were more open to it
[20:56:06] <jheald> there's also the problem of tying together contributors-dates-contribution-rights for contribution stages
[20:56:37] <jheald> which the item-property-qualifier hierarchy is 1 step too shallow to cope with
[20:57:51] <jheald> so it's not clear to me that the proposed structure can support the high-level API advertised
[20:58:17] <Lydia_WMDE> items: i see the issue but i think we need to solve this in another way. i have to think more about this. but i am pretty convinced that we shouldn't make things super confusing by allowing items on commons and wikidata
[20:59:03] <jheald> it makes sense to have items for Commons categories
[20:59:09] <Lydia_WMDE> as for data structure: we'll have to start playing with this and see specific use cases once that's possible. i think we're not getting further if we discuss in the abstract there.
[20:59:10] <jheald> (Commons items)
[20:59:59] <Lydia_WMDE> from my side that's a case of schema last as denny said
[21:01:01] <jheald> schema last is only possible if you ensure the flexibility to try different options
[21:01:19] <jheald> & see what works
[21:01:33] <Lydia_WMDE> yeah and that's my intention atm
[21:02:19] <Lydia_WMDE> let's see how far i get with that with everyone ;-)
[21:02:35] <jheald>  :-)
[21:03:34] <jheald> thanks for everything, and I really enjoyed A'dam
[21:03:41] <Lydia_WMDE> me too!
[21:03:49] <Lydia_WMDE> and thanks everyone for showing up :)
[21:04:07] <jheald> night all
[21:04:44] <Stryn> thanks all and night! :)