IRC office hours/Office hours 2018-10-04
[17:01:46] <Keegan> So we have structured data on commons in the works. The team has some stuff going on it can talk about - multilingual captions come out soon, we're working on depicts design
[17:01:56] <Keegan> There will be a search prototype discussion going up later today
[17:02:09] <Keegan> What else? We're looking at changing how old versions of file pages are viewed
[17:02:32] <Keegan> Is there anything that someone would like to talk about to start?
[17:02:45] <Keegan> Otherwise risler might have some things to add
[17:03:24] <apergos> I'm interested in the format in which the data will be stored, and the progress made on using multi-content revisions for it
[17:03:37] <yannf> yes, notability is the most important issue IMO
[17:04:07] <multichill> Notability of what yannf?
[17:04:14] <Keegan> Okay, a question about MCR are a point about notability. I think marktraceur can talk about MCR for a moment.
[17:05:24] <Keegan> and yes some more information about notability would be good for context, yannf
[17:05:38] <Keegan> *and a point < from two lines up
[17:06:01] <marktraceur> apergos: It'll be stored in a federated Wikibase repository on Commons, and yes, multi-content revisions are being used to store and render the file page and structured data together. The work on the MCR part is almost done, and we'll have some working versions of captions stored in Wikibase, rendered with MCR, up sometime over the next month or so.
[17:06:54] <Keegan> (information and links for structured data and community involvement: https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved )
[17:07:08] <apergos> is there a phab task I can follow and/or documentation of the format of the content as it will be stored? (i.e. jsn something-or-other, of a dict with these fields and values,...?)
[17:08:44] <rfarrand> multichill :)
[17:08:47] <marktraceur> apergos: There are many, many phab tasks. What work in particular are you hoping to follow? As for format of the data, any differences between this data and Wikidata are basically trivial (except for the MCR changes)
[17:09:11] <Keegan> Here's a link to our labs instance: https://federated-commons.wmflabs.org/wiki/Main_Page
[17:09:19] <Keegan> You'll notice it has search!
[17:09:24] <apergos> ah so I should be expecting a wikidata entity or something similar? good enough
[17:09:40] <Keegan> It's the search prototype, there will be a post about it later today
[17:09:51] <Keegan> But if you want to try it out now, feel free. We can discuss it here.
[17:10:05] <marktraceur> apergos: Basically, they'll be stored the same way, yes. It's just a Wikibase repository, federated to Wikidata, with some chrome on top.
[17:10:21] <apergos> gotcha. thanks (may have followup qs later offline)
[17:10:33] <marktraceur> bd
[17:13:06] <yannf> i.e. issue raised recently about increase of Wikidata size with items only used in Commons
[17:14:16] <yannf> jheald1, I think you raised this issue, right?
[17:14:27] <yannf> I can't find it back
[17:14:45] <yannf> now the discussion is spread on several pages...
[17:15:29] <abittaker> is anyone participating in the wikidata-commons property creation discussions? How is the process going? https://www.wikidata.org/wiki/Wikidata:WikiProject_Commons
[17:16:29] <Keegan> Something I need to do is copy our property table over to Wikidata https://commons.wikimedia.org/wiki/Commons:Structured_data/Properties_table
[17:17:22] <Keegan> yannf: I'm not sure what this size issue is in particular
[17:19:45] <yannf> ok, I will try to make a summary
[17:20:59] <multichill> yannf: I don't think the number of items on Commons would need to increase a lot for structured data on Commons. If you don't agree, what kind of extra items would be created that have notability issues?
[17:21:34] <yannf> with SD, the number of items in Wikidata could increase several folds, so is WD ready to deal with that?
[17:21:39] <pigsonthewing> Hi, sorry I'm late
[17:21:47] <multichill> yannf: Why would it?
[17:21:57] <multichill> What would the new items be about
[17:22:00] <multichill> ?
[17:22:09] <Keegan> hello pigsonthewing, no worries
[17:22:41] <yannf> it seems the consensus is to have one WD entry for each depicted suject on Commons
[17:22:51] <pigsonthewing> Yannf: The intention is to run a separate Wikiabse installation, not add items to Wikidata
[17:23:15] <pigsonthewing> ="Wikibase", sorry
[17:23:28] <yannf> but up to now, the idea was to add these items in WD
[17:23:49] <yannf> that's where the separation is not clear
[17:24:10] <mpeel> there are now more Wikidata items than there are media files on Commons, you know? (also, hi!)
[17:24:32] <multichill> But the subjects would for example be buildings or people. If these are notable they probably already have an item on Wikidata, right?
[17:24:36] <yannf> if we store these items in a separate base, there will be many suplicate entries
[17:24:51] <mpeel> on the commons category side, there are now ~2 million sitelinked from Wikidata, so would need ~4 million more items to describe the rest of them (<10% of Wikidata’s current size)
[17:25:23] <multichill> A large percentage of categories are intersected categories so no new subjects for those
[17:25:59] <Keegan> While this point is being sorted out, are there any other topics people are interested in talking about?
[17:26:20] <multichill> mpeel: Forget categories, that's the old way of doing it
[17:26:47] <pigsonthewing> I've been travelling, so am not up to speed with plans for/work on SDoC properties; can somone please summarise?
[17:27:01] <yannf> if each person or object or building get an item, we could increase several 100 millions items
[17:27:01] <apergos> as understand it (perhaps wrong), exif data from uploaded photos will go into this wikibase repository too; does each photo become an entity with various properties? or...?
[17:27:16] <multichill> ( https://twitter.com/CommonsCat )
[17:27:46] <Keegan> pigsonthewing: mentioned that right before you came in. We've got the properties table published on Commons, it needs copied over to WD https://commons.wikimedia.org/wiki/Commons:Structured_data/Properties_table
[17:27:58] <multichill> Adding to apergos question: Is that something you'll implement in software or do bots have to do that?
[17:28:19] <Keegan> pigsonthewing: and it looks like the old Wikiproject Commons on WD could be revived to work on this collectively https://www.wikidata.org/wiki/Wikidata:WikiProject_Commons
[17:28:56] <Keegan> apergos: good question
[17:29:05] <pigsonthewing> Thanks; I can see some gaps; for instance "Created with support by " could use "sponsor" (P859)
[17:29:32] <Keegan> apergos: multichill: risler will try to answer that
[17:29:44] <Keegan> marktraceur has had to leave us for another meeting
[17:29:55] <pigsonthewing> also "Derived from file / Extracted from file" could use "based on" (P144)
[17:30:02] <marktraceur> If there are still questions, I'm lurking
[17:30:08] <Keegan> pigsonthewing: By all means, edit the table :)
[17:30:31] <risler> in regards to EXIF, in the community-developed properties table, there are some proposals for new Wikidata properties to fully support EXIF fields
[17:30:51] <jheald_> test
[17:31:07] <Keegan> jheald_: your test worked
[17:31:10] <jheald_> EXIF: would be better for this info to be derived straight from file
[17:31:11] <risler> regardless of how that goes, the idea is to have structured EXIF on the "M Items" as we're calling them (the items on Wikibase@Commons)
[17:31:20] <jheald_> makes no sense to have this as a property
[17:31:35] <multichill> Can someone describe the today's approach to depicts? I've seen several approaches and not sure what is the current one. Only one property with a lot of qualifiers?
[17:31:45] <jheald_> notability thread: see https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/GLAM/CIDOC_CRM#Distinguish_digital_object_as_digital_object -- need timescale for answers
[17:31:51] <pigsonthewing> @Keegan: done!
[17:31:51] <yannf> found it back https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/GLAM/CIDOC_CRM
[17:32:08] <Keegan> multichill: depicts can be described, one moment and risler will get to that
[17:32:12] <yannf> #Distinguish digital object as digital object
[17:32:14] <Keegan> pigsonthewing: thank you!
[17:32:27] <jheald_> @yannf : sorry have been muted last 30 mins, till I registered my nick here
[17:32:38] <yannf> no pb
[17:32:46] <jheald_> @yannf : have been shouting at screen ...
[17:33:42] <risler> For depicts, we are exploring options. Nothing has been decided yet (but will be this month probably). We just had a meeting to go over the feasibility of having something like a "this is also a" property as a qualifier to a "primary" depicts statement
[17:34:26] <jheald_> on same CIDOC page also, why "date of upload" should absolutely NOT be a CommonsData property
[17:34:48] <jheald_> needs to be visible for WDQS, but like EXIF data, should NOT be a property
[17:35:13] <risler> so, in the case of a photo of a German Shephard puppy, instead of having several "loose" tags (German Shepherd, dog, puppy), have one Main primary tag (German Shepherd) and add the other tags (dog, puppy) as a qualifier to that primary depicts tag
[17:35:58] <multichill> risler: That sounds like the overcategorization discussion all over again
[17:36:22] <risler> this was the approach several Commonists suggested to avoid overcategorization :)
[17:36:45] <risler> we thought about it early on but tabled it. it's back in prominence now.
[17:36:51] <mpeel> couldn’t ‘dog’ be found through wikidata there? Or at least ‘dog breed'...
[17:37:16] <jheald_> re: multichill 18:25 intersect categories -- absolutely necessary to have readable descriptions for these in terms of Q-numbers -- Wikidata items for these look *essential*, plus would support auto-infoboxes with internationalisation
[17:37:19] <yannf> jheald_, yes, right, that's the same for any information which should not be changed independently
[17:37:55] <risler> dog could be, in this specific case. but there are many such cases, and the inconsistency of Wikidata hierarchies makes it...problematic
[17:38:13] <jheald_> risler: good! (having been one of those Commonists :-) )
[17:38:21] <Keegan> And by many we mean thousands and thousands
[17:38:45] <multichill> jheald_: We haven't done that up till now. You could have done that in the last couple of years. Now structured data is here as a better replacement
[17:39:38] <mpeel> risler: for displaying information on the file page, it’s simple enough to pull out that info (basically think of the wikidata infobox in categories, but for files)… But for search it’s, of course, a different matter.
[17:39:43] <jheald_> multichill: it's not a replacement ... two will exist side by side ... need to be able to leverage SD description to identify how pic should be categorised
[17:39:53] <pigsonthewing> Wikidata has dog as "instance of common name"; I raised this issue recently on WD's project chat, but got nowhere
[17:40:34] <jheald_> multichill: plus *essential* to have good description of categories, to be able to translate that info into CommonsData statements on the files
[17:40:48] <apergos> re-upping multichill's followup: will the exif data be extracted and go in by software (mw code I guess) or will bots be needed to do the work?
[17:40:56] <pigsonthewing> Link: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/08#What_heart_rate_does_your_name_have?
[17:41:35] <jheald_> pigsonthewing : see https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage & links there for presentation of the difficulties
[17:42:27] <risler> apergos: we could go either way on that. taking community suggestions.
[17:42:36] <jheald_> multichill : & GLAMs, uploaders etc who don't properly categorise mass-uploads will get blocked
[17:43:16] <multichill> Categories are a way to organize our files. It's an outdated quite bad system. Why would you pollute the new system with it? Being able to add statements to files is the new way to organize
[17:43:28] <apergos> if there is a place that's being discussed, I'd like to follow along (exif data by bot or by mw code)
[17:43:29] <jheald_> risler: surely EXIF, upload date, etc shouldn't be CommonsData properties at all ?
[17:43:38] <multichill> We seem to be doing just fine with Wikidata items without any categories
[17:44:13] <risler> jheald_: that's up to the community to decide :) that's why we went through the properties discussion.
[17:44:15] <jheald_> multichill: they're what we've got. And often a lot more expressive than WD currently is.
[17:44:19] <multichill> apergos: I remember talking about "virtual" properties. So for the client it looks like it's structured data, but it's actually just a mirror of the exif
[17:44:54] <jheald_> risler : much better option is a SERVICE to provide the info in WDQS
[17:44:55] <risler> multichill: yes, we were just tallking about virtual statements. the implementation on the Wikidata/Wikibase side isn't quite there yet though
[17:45:21] <jheald_> risler : ... or virtual statements, those would work too
[17:45:35] <pigsonthewing> @#jheald_ Thank you; that seems to be about a different issue. The one I described needs to be fixed on Wikidata.
[17:45:52] <apergos> multichill: right, pulling from the file each time vs asking the db each time (or some cache)...
[17:46:21] <Keegan> A little under 15 minutes remain
[17:46:30] <pigsonthewing> Dont forget that EXIF often includes junk
[17:46:50] <pigsonthewing> Example: people who never reset the clock on their camera
[17:46:58] <multichill> When does the Wikibase instance go live on Commons?
[17:47:10] <multichill> Yup, so you need an override option
[17:47:31] <jheald_> pigsonthewing : *Lots* of stuff needs to be fixed on Wikidata... and that was just the start of the problems... but agree that taxons have particular issues... discussed a bit in the phab ticket in the Commons discussion link I just gave you
[17:47:54] <yannf> Keegan, so what about the issue I mentioned?
[17:48:01] <jheald_> with 13 minutes left, can we discuss the notability issue now?
[17:48:48] <pigsonthewing> I'm doing my best to raise these issues on Wikidata, but I often feel like I'm a lone voice in doing so.
[17:49:02] <Keegan> multichill: Wikibase going live on Commons is to be determined. It was scheduled for this month and hopefully will still happen at the end of the month. However, operations put a hold on new database schemas during the server switchovers this month. It's delayed things.
[17:49:13] <Keegan> So - this month or next is the plan.
[17:49:26] <jheald_> We must come up with a clear decision about where information about real objects -- eg engravings etc that have been scanned -- is going to live
[17:50:04] <Keegan> yannf: Sure, I mean - notability, what deserves a Wikidata item and what does not, those are community decisions
[17:50:26] <jheald_> If I have, eg : Digital image -> Physical copy -> Recognised state or edition of work -> Work , then where do the items for each of these belong?
[17:50:41] <yannf> yes ^
[17:50:47] <Keegan> We're not going to force all things to be on Wikidata as well by software, that's a community workflow. Whatever y'all want to do, we'll support.
[17:50:52] <risler> pigsonthewing: we're trying to do what we can from our side on that issue as well. it's going to be a big problem people will constantly run into when trying to do depicts statements
[17:50:54] <jheald_> And this *can't* just be left to the community
[17:50:56] <multichill> Looking forward to it Keegan. Other question, how do I link https://commons.wikimedia.org/wiki/File:Carl_(Charles)_Ross_-_Die_Grotte_der_Nymphe_Egeria_bei_Rom_-_11590_-_Bavarian_State_Painting_Collections.jpg to https://www.wikidata.org/wiki/Q30064126 ?
[17:51:01] <Keegan> (ten minutes remaining)
[17:51:11] <multichill> When we have the structured data? What property would I use?
[17:51:44] <jheald_> If it's not on Wikidata, the relevant items on CommonsData simply will not have the depth to be able to describe them -- cannot put a qualifier on a qualifier
[17:52:11] <jheald_> Getting clear on this is absolutely critical to the property design process
[17:52:15] <yannf> Keegan, yes, there are decisions to be made by the project
[17:52:30] <Keegan> multichill: How would you like to link it? :)
[17:52:49] <jheald_> The current muddle and void from the team is an abandonment of responsibility, and seriously threatening the project
[17:53:04] <pigsonthewing> The community on Wikidata has, sadly, demonstrated an unwillingness - or inability - to tackle these issues.
[17:53:24] <multichill> The one depicts property seems to me a bad solution. I would expect a set of related properties that are more specific and to which you can add qualifiers when needed.
[17:54:14] <jheald_> multichill: as I undrestand it, "depicts" would be used. But agree that "digital manifestation of" would be better
[17:54:21] <yannf> multichill, actually the details do not matter at this stage
[17:54:26] <multichill> In the painting case a property like "digital reproduction of" (or some other fancy title shamelessly copied from Cidoc CRM) would work much better.
[17:54:44] <yannf> the issue is a decision regarding the design of the project
[17:54:44] <abittaker> @jheald_ would you like to expand on that? the ontology isn't something the WMF team did or could commit to doing.
[17:55:01] <multichill> These are not details, how to deal with multiple works and the relation to each other is quite important.
[17:55:33] <jheald_> abittaker: see presentation of the options at https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/GLAM/CIDOC_CRM#Distinguish_digital_object_as_digital_object
[17:55:35] <multichill> We need that for crops, assemblies, 2D reproductions, FOP cases, etc.
[17:56:40] <jheald_> abittaker: There has to be clarity and consistency on this. All the options have issues. But "CommonsData alone" won't work -- items can't be deep enough
[17:56:44] <Keegan> With only a few minutes left, I'd like to remind folks that we have another one of these scheduled for Thursday, 1 November (a month from now)
[17:57:14] <Keegan> As things ramp up we'll be holding these more regularly, so keep an eye out for the announcements and reminders
[17:57:26] <yannf> Keegan, at least we need to know when the decision will be made
[17:57:58] <multichill> Technical and curation issues seem to be mixed up into one big complicated problem......
[17:58:13] <pigsonthewing> @Keegan Same time on 1 Nov? I'll be travelling, again. Please accept my apologies in advance.
[17:58:33] <jheald_> Keegan : we have to know where the items for particular sorts of things are going to live, otherwise you just don't have the basis to be able to discuss properties
[17:58:38] <Keegan> pigsonthewing: Same time, thanks for letting me know. Probably be another one about the same time a month later in December
[17:59:43] <Keegan> I think abittaker is writing out an answer to the last couple pings for me, then we have to wrap this up for now. One moment
[18:00:18] <Keegan> (or risler is writing)
[18:00:22] <jheald_> Also: when will there be a decision on not using depicts for everything, using eg "digital manifestation of" for scans? That should be decided now.
[18:01:25] <Keegan> We've already decided that depicts will not be used for all things, that's why the properties table exists :) Depicts is the first supported property, others will follow fairly quickly (relatively speaking)
[18:01:25] <risler> multichill: for whatever technical issues you see, you can use whatever communication methodology you like to provide those to us. Technical issues are indeed within our purview. For curation issues, we really need folks to rally up their fellow community members so that whatever decision is made has consensus
[18:01:56] <jheald_> risler : the community aren't up to it -- you've seen that
[18:01:56] <Keegan> Okay, have to wrap this up for now in an official way, feel free to keep chatting if you'd like. Thank you all for coming and participating.
[18:02:11] <Keegan> Please be on the lookout for the search prototype discussion, your participation is welcome!
[18:02:17] <Keegan> #endmeeting