IRC office hours/Office hours 2018-06-26
18:00:37 <Keegan> #startmeeting Structured Data on Commons | Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: https://wm-bot.wmflabs.org/logs/%23wikimedia-office/
18:00:42 <wm-labs-meetbot> Meeting started Tue Jun 26 18:00:37 2018 UTC and is due to finish in 60 minutes. The chair is Keegan. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:42 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:00:42 <wm-labs-meetbot> The meeting name has been set to 'structured_data_on_commons___channel_is_logged_and_publicly_posted__do_not_remove_this_note_____logs__https___wm_bot_wmflabs_org_logs__23wikimedia_office_'
18:00:49 <spinster> We're on! Welcome all
18:00:57 <Keegan> Silly bot
18:00:57 <Keegan> Anyway
18:00:57 <Keegan> Hi everyone
18:01:12 <Sakhalinio> Hi
18:01:12 <Keegan> We're going to start off with a recap of what we've been working on
18:01:29 <Keegan> And I'll handle keeping track of questions as they come up
18:01:42 <Keegan> Are there any that people have for later while risler types?
18:01:53 <risler> Hello everyone! Here's a quick summary of where we are now:
18:02:25 <risler> a.) We are coding for multilingual file captions, which will still be the very first released feature (targeting October release now)
18:03:00 <risler> b.) We're preparing Community conversations about Wikidata properties needed for Commons, as well as a chat about Structured Licenses
18:03:00 <Keegan> There is a prototype available for captions, contact me if you haven't tried it out yet and would like to
18:03:38 <Keegan> https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Feedback_requests/Properties_for_Commons
18:04:01 <risler> c.) we're concurrently developing early work on the integration of depicts "tags" into Commons, starting with Search. Look for a prototype on that within the next month or so
18:04:20 <risler> and now I'll throw it over to Sandra for an update on GLAM-related work
18:04:53 <spinster> In the last months we had a first conversation on how GLAM metadata can be mapped to Wikidata and Commons - https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Feedback_requests/GLAM_metadata_and_ontologies_mapping
18:05:17 <spinster> And we are starting to talk about the first GLAM pilot projects that will use Structured Data on Commons for the first time!
18:05:43 <spinster> We had a well-attended workshop about possible pilot projects during the Wikimedia Conference in Berlin, where approx. 50 people attended.
18:06:01 <spinster> And we have a great longlist of projects which I will follow up on in the upcoming months.
18:06:01 <pigsonthewing> I'm active in proposing properties - and reviewing property - porposals on Wikidata; HMU if anyone needs help in that area.
18:06:04 <Jheald> any links for those GLAM pilots?
18:06:13 <Sakhalinio> Is there User:elmacenderesi?
18:06:28 <spinster> This was the session: https://meta.wikimedia.org/wiki/Wikimedia_Conference_2018/Program/47
18:06:38 <spinster> There are links to the report and the spreadsheet we collected
18:06:45 <Keegan> pigsonthewing: Thank you, will absolutely take you up on that at some point (or at least point others to you :) )
18:07:00 <spinster> This is just a start - if people are interested to do pilots, please get in touch
18:07:16 <spinster> I will also document this better on Commons, as that is still lacking :-)
18:07:27 <Keegan> Sakhalinio: that user is not in here, I think
18:07:56 <Sakhalinio> I am looking to photos
18:08:46 <Keegan> So, any questions about SDC so far?
18:08:51 <Keegan> Or about the future?
18:09:15 <Glrx> My understanding is there was a 3 year proposal, and were more than 2 years in to that. What's the progress?
18:09:35 <Jheald> Not so much structured data on files, but there are two big scanned-image projects (BHL, BL 1 million) that I'm working on Wikidata to build industrial-scale numbers of categories for on Commons (~100,000 + ~60,000)
18:09:56 <Keegan> Great question. abittaker is the Program Manager, she'll answer that Glrx
18:10:15 <Jheald> still cleaning up / creating the WD items, but hope to have the cats creating soon
18:10:18 <abittaker> Hullo Glrx, we're 1.5 years into the program, and working hard on a) MediaWiki infrastructure and b) features for Commons
18:11:09 <spinster> Jheald What will the categories describe?
18:11:14 <Sakhalinio> Actually i don't know this SDC project but I can help you for wikidata and commons structuring
18:11:26 <abittaker> we expect feature development to continue through the project, but we expect people will be able to add captions in October and depicts properties on Commons in January
18:11:39 <abittaker> All other properties will follow soon after that, in Feb or March
18:11:43 <Keegan> Sakhalinio: You can learn more about the project (and how you can help) here https://commons.wikimedia.org/wiki/Commons:Structured_data
18:11:43 <Jheald> it will be cat'ing the images into books, books into authors
18:11:53 <Jheald> also local map categories
18:12:01 <Jheald> also subject areas of the books
18:12:05 <spinster> Jheald I see! :-)
18:12:09 <Sakhalinio> We had a GLAM project in Turkey (Pera Museum) but know muesum administrator remove square codes because of wiki block in Turkey
18:12:27 * Steinsplitter waves
18:13:22 <Sakhalinio> https://tr.wikipedia.org/wiki/Vikipedi:%C4%B0%C5%9F_birli%C4%9Fi_projesi/2016/WMTR-Pera_(ORK-KD:ERS)
18:13:26 <Jheald> Current statistics on the BHL wd items at https://www.wikidata.org/wiki/Wikidata:WikiProject_BHL w/ progress page on data augmentation
18:13:35 <spinster> Sakhalinio I have seen images from Pera Museum on Commons!
18:13:41 <Jheald> Something we ought to talk about is the Commons roll-out of Template:Wikidata Infobox, https://commons.wikimedia.org/wiki/Template:Wikidata_Infobox
18:13:51 <Jheald> Certainly the most high-profile, arguably most significant, integration of Commons with wikidata yet attempted
18:14:02 <Jheald> Number of uses is now approaching 1.2 million, with Mike Peel's Pi bot currently adding about 4000 a day https://commons.wikimedia.org/wiki/Category:Uses_of_Wikidata_Infobox
18:14:12 <Jheald> example: https://commons.wikimedia.org/wiki/Category:St._Paul%27s_Cathedral , down the right hand side
18:14:52 <Steinsplitter> The prototype seems fine. Maybe copyright status should be updated automatically based on the edit on the filedescription. Or do we need a bot to import stuff to structured data? Easyest way would be to parse from fildesc.
18:14:53 <Steinsplitter> :)
18:14:54 <Jheald> these have turned up very much in the last month... would be useful to know what ppl think, how they are being received
18:15:03 <spinster> Which also means that 1.2 million categories correspond with Wikidata items, which is great
18:15:25 <Jheald> No: 1.8 million categories correspond to Wikidata items
18:15:43 <spinster> Even better :-)
18:16:24 <pigsonthewing> the infobox seems (apart from minor bikeshedding, which is to be expected) very well received.
18:16:29 <Keegan> That's a good number for certain
18:16:36 <Jheald> latest stats: https://www.wikidata.org/wiki/Wikidata:WikiProject_Commons/Links_and_sitelinks/historical#9_June_2018
18:16:51 <Jheald> plus 100,000 pre-empted by galleries
18:17:21 <pigsonthewing> It was modified recently so it diplays nicely when used in a category with no known Wikidata equivalent
18:17:56 <Jheald> but important to note that this is out of a total of 6.7 million Commons cats -- so that is 5 million not currently connected
18:18:02 <Keegan> A lot of these initiatives are being pretty well received. That's always a nice experience.
18:18:55 <pigsonthewing> many current Commons categories are intersections of two (or more) Wikidata items (cats in Paris, for example)
18:19:26 <pigsonthewing> Keegan: Makes a change from the resposnse of some on enWP!
18:19:54 <Keegan> :X
18:19:56 <Jheald> A striking thing (to me) is how sane/usefully the template performs on intersection categories, eg: https://commons.wikimedia.org/wiki/Category:Grade_I_listed_churches_in_Bedfordshire
18:20:28 <AntiComposite> I'm wondering if that box might display better horizontally
18:20:32 <Jheald> ... but at the moment it can only be used on Commons cats that have a corresponding WD item where the data is stored
18:20:46 <Keegan> The Wikidata matching for cats is great progress, it will help the goal of finding translations
18:20:49 * Steinsplitter is wondering if people saw his msg regarding prototype :)
18:21:03 <pigsonthewing> Jheald: Because, in that case, there is a single corresponding item
18:21:03 <Jheald> BIG question I think, is how to extend from 1.8 million cats to the full 6.7 million
18:21:25 <Keegan> Steinsplitter: which prototype are you discussing?
18:21:31 <Jheald> We *need* to think what we can do to push this forward
18:21:45 <Jheald> 2 main options:
18:21:49 <Steinsplitter> Keegan: The nice prototype on your betawiki
18:22:01 <susannaanas> I would be interested to know what kind of categories are hard to return to one or two Wikidata items, any documentation?
18:22:04 <Keegan> Jheald: Does Commons want/need exact 1:1 matching?
18:22:10 <Jheald> 1/ Create namespace for categories on CommonsData
18:22:27 <Jheald> 2/ Allow items for intersection cats on Wikidata
18:22:32 <pigsonthewing> What would you have it do, on https://commons.wikimedia.org/wiki/Category:Cats_in_Paris ?
18:22:37 <Jheald> Keegan: pretty much
18:22:43 <Keegan> Steinsplitter: Ah, I see. Yes, we're still working on pulling and displaying structured licensing and copyright
18:22:48 <Micru> Jheald, what about option 3/ allowing Q-items on Commons
18:22:55 <Steinsplitter> cool, thanks.
18:22:58 <Keegan> We'll be getting into that with community consultations starting in the next couple of months
18:23:24 <Jheald> pigsonthewing: needs an item on Commonsdata or Wikidata that can host the property: Combines topics: cats, Paris
18:23:47 <Keegan> In theory you can search for cats+Paris and get the same result
18:23:56 <Keegan> AIUI, risler
18:24:19 <pigsonthewing> Jheald: I don't see that going on Wikidata; maybe on Commonsdata?
18:24:31 <Jheald> Keegan: that doesn't give you an infobox, doesn't give the same link-clicky navigability
18:24:48 <Jheald> Keegan: doesn't help us document the categories in structured terms
18:25:03 <Jheald> it's about much more than just search
18:25:28 <pigsonthewing> But we need to be more precise than "Combines topics: cats, Paris". It's "cats *IN* Paris", not "cats from Paris" nor "cats called Paris"
18:25:37 <pigsonthewing> s/called/named after/
18:25:52 <Jheald> because documenting the categories in structured terms gives us ready made topics for the images
18:26:07 <nikki> some people have used qualifiers on the statements for that
18:26:25 <Jheald> pigsonthewing: see the usefulness of the Bedfordshire example above, which doesn't do that, but is still valuable
18:26:36 <Jheald> nature of the relationship can be added as a qualifier
18:26:49 <pigsonthewing> But if we document the images properly (depcits:cat; location:Paris) do we need (structured data about) the category?
18:27:23 <Glrx> If I add a picture of Notre Dame and link it to the wikidata cathredral, will it pick up Paris?
18:27:43 <Micru> pigsonthewing, ideally there should be an easy way to tag images based on the category they are in
18:27:43 <Jheald> pigsonthewing: we have to get there. It's all very well to wish for unicorns, but the cats is a much more realistic prospect to work on with bots
18:27:57 <Jheald> then cascade the topics as suggestions down to the images
18:28:23 <Keegan> Micru: Could you expand on what you mean by "tag"?
18:28:39 <marktraceur> Is "topic" a useful thing to search for? A topic could be any aspect of the image, including where it was taken, what's in it, who took it, what time of day it was...
18:28:47 <pigsonthewing> Same issue with the Bedfordshire example though this may help: https://www.wikidata.org/w/index.php?title=Q24974914&diff=703044119&oldid=701200898
18:28:52 <risler> Glrx: we're working on that functionality right now. "statement traversal". will have more info on this in the coming months.
18:28:56 <Jheald> he means: add a *depicts* : *topic* stmt
18:29:19 <pigsonthewing> I don't think I'm "wishing for unicorns".
18:29:40 <Micru> Jheald, Keegan, recently there was a new tool on Wikidata that would show inferred statements on an item based on other items. I believe something similar could be offered in Commons. A way to show statements from Categories without actually having them on the file page itself
18:29:58 <Jheald> but Andy even without that, the infobox is still giving an internationalised localised description that's useful to users
18:30:51 <Keegan> (procedural note: 30 mins left, we're halfway done)
18:30:53 <Jheald> Andy: qualifier P273 Bedfordshire would be better
18:30:54 <pigsonthewing> Useful to users, yes - but the aim of SD is to be useful (meaningful) to computers
18:31:31 <Jheald> usefulness to users is a v important driver
18:31:53 <pigsonthewing> P273 == not found
18:31:59 <Keegan> Micru: I haven't seen this tool, do you have a link for the chat?
18:32:11 <Jheald> getting any structured handle / record on what the remaining 5 million cats are is a step forward
18:32:15 <Micru> Keegan, I will have to dig it
18:32:25 * Keegan nods
18:32:27 <Keegan> Thanks
18:32:54 <Jheald> andy: "located in administrative territory" -- haven't been working on places for a bit, may have mis-remembered the number
18:33:10 <pigsonthewing> Keegan: https://www.wikidata.org/wiki/User:Pasleim/derivedstatements.js
18:33:31 <Keegan> Much obliged for the link, one moment
18:33:33 <Jheald> I was 100% against creating items for these intersection cats on Wikidata
18:33:51 <spinster> Ah, yes, I have installed that tool on my volunteer account
18:33:55 <Jheald> I still think a structured space on CommonsData for categories would be better
18:34:40 <Jheald> But having seen how usefully the Template:Wikidata infobox template performs on intersection cats
18:35:05 <Jheald> I think items for them on WD is something that is now a necessary structural need
18:35:10 <Micru> Keegan, it was announced here: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/06#Announcing_derivedstatements.js
18:35:30 <Glrx> Jheald: Why against it? Should the intersect category just fall out of two larger categories?
18:35:34 <nikki> I would suggest making a proposal to change the notability guidelines on wikidata then
18:35:53 <nikki> I know there was a lot of opposition in the past, but that was before there was any obvious use for them
18:36:04 <Jheald> & the more headway we can get documenting the remaining 5 million categories before 0-day for SD, the better
18:36:34 <Jheald> RfCs on WD tend to just die without ever any conclusion
18:37:01 <Jheald> It needs a group agreement that this is something we should do, and then to push it through
18:37:19 <Jheald> other issues with the infoboxes
18:37:49 <Jheald> 1) their use being blocked because the sitelink is taken by a gallery... is a pain
18:38:04 <Keegan> (responses are being written)
18:38:37 <Jheald> Need to review, and get a quick decision as to whether we're going to rethink this
18:39:02 <abittaker> micru, that is a super interesting tool, thank you for sharing! we will definitely share it with our designer and see if similar functionality should be part of the file page on Commons
18:39:02 <Glrx> How does a Russian user specify his picture is of a cat in Paris?
18:39:36 <Jheald> eg A) prefer WD item -> Commons cat sitelinks, leave galleries to twist in the wind (suggested by Ghouston, but not taken forward... yet?)
18:40:50 <Jheald> or B) confirm stick with present status: make new WD item for category, connect to main WD item via main-topic / main-cat pair of properties, and roll these out wholesale
18:41:44 <Jheald> it's crazy that we don't have infoboxes for the most high-profile things, because these are the very things most likely to have galleries
18:41:45 <risler> Glrx: that Russian user will add depicts statements (either in the UploadWizard, File Page, or some other method). Since both "cat" and "Paris" have labels in Russian, the "tags" will display in his language. We'll try to be as multilingual as the metadata in Wikidata allows us to be.
18:42:04 <Jheald> needs rapid decision one way or the other, then action
18:42:51 <Micru> Is there any way to link an image annotation to a depict statement?
18:43:05 <Jheald> Glrx: Wikidata is pretty much 100% language agnostic, and Russian in particular is very well developed. CommonsData / Commons presentation can inherit all of that
18:43:21 <Jheald> Issue 2) re infoboxes
18:43:37 <Jheald> (2) Real documentation shortfall
18:44:25 <Jheald> urgently need much better docs, directed at first-time Commons users trying to add/fix/improve an infobox
18:44:35 <risler> Glrx: that Russian user could also add a statement using other properties ("location" or "taken in" or some other properties the community decides upon). So the picture could "depict" a cat, but also have a different statement explicitly saying the location where the picture was taken.
18:44:37 <Jheald> eg: A) why does cat not have an infobox ?
18:44:50 <Jheald> B) why is the map wrong ? how do I fix it ?
18:45:03 <risler> Micru: There is not yet a link between image annotations and depicts statements, but there will be next year :)
18:45:10 <Keegan> 15 minutes remaining
18:45:27 <Jheald> C) why is the blue link going to the wrong sort of <X> (ie how to fix homophone issue on Wikidata)
18:45:46 <Micru> risler, that is great! how will it be done? with qualifiers containing the annotation ID?
18:45:56 <Glrx> There are many Template:Other versions/image123 templates to group derived-from images. Is there a SD property to compute such a group?
18:46:15 <Jheald> D) why is there black text for this term not a blue link (ie how to link wd item to commons if it exists, or create one if it doesn't)
18:47:15 <spinster> Glrx: It sounds very sensible to have such a property - it's up to the community to decide if that is wanted/needed and to create it and I would be all for it.
18:47:25 <Jheald> With luck, Commons users will really start taking to the blue-link to blue-link to blue-link navigation in infoboxes, will really want to start improving them
18:48:11 <Jheald> A huge boost for SD if they do, because this is exactly how SD info is going to be represented on files -- the very same vocabulary
18:48:42 <Keegan> risler is writing a reply to Micru. While he does that, I'd like to take a moment to promote the new page https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Feedback_requests/Properties_for_Commons
18:48:51 <risler> Micru: the exact implementation details are TBD. It will also be tied to statements related to IIIF spec, so that clients can crop/zoom to the specific thing/region of interest in the image. So we have a few use cases we need to account for. We'll have it all sorted out next year though :)
18:48:55 <Keegan> It's an exercise in figuring out what properties Commons will need
18:49:15 <Micru> risler, thanks for your answer!
18:49:29 <Keegan> You can use a file provided (or your own) and work through all the statements that might be possible for an image
18:49:35 <Jheald> So *Recommend* the SD project, particularly the community-interface co-ordinators, take on the infoboxes and their popularisation, and add this as a key task for right now
18:49:42 <Keegan> Or just list properties if you don't want to do the exercise, that works as well :)
18:50:26 <Micru> Keegan, I thought Commons would use Wikidata properties... if there is the need to create additional properties, cannot they be created on Wikidata?
18:50:37 <Keegan> Micru: Yes and yes
18:50:51 <Keegan> The question is, what properties will the software need to support initially?
18:51:00 <pigsonthewing> Great to see IIIF being invoked. For those not familiar: https://en.wikipedia.org/wiki/International_Image_Interoperability_Framework
18:51:22 <Jheald> Micru, Keegan: Presumably it will be like the Lexicographic properties -- use what WD already has, see what extra Commons needs
18:51:30 <spinster> And https://commons.wikimedia.org/wiki/Commons:International_Image_Interoperability_Framework
18:51:34 <Keegan> Jheald: Yes
18:51:57 <Micru> I agree, we cannot predict what is needed until we need it :)
18:52:32 <Jheald> Per IIIF: a gadget or API to export the contents of a category as an IIIF manifest would be ++valuable for building bridges with the IIIF community
18:52:32 <Keegan> Right so the idea is that we have a list of what we think we might need, at least. So we're not surprised when we need it, and we can get it created fairly easily, quickly, and painlessly
18:53:22 <Jheald> Some of the non-Wikimedia heavy users of MediaWiki v interested in IIIF import/export (the National Gallery in the UK for one)
18:53:45 <spinster> Yes, we are quite frequently contacted/asked about it
18:55:08 <Keegan> Okay, about five minutes left. Any remaining questions or comments for now?
18:55:31 <Jheald> Also on IIIF: we have a WD property to specify part of an image in a way the existing IIIF hack for Commons can display, but there is no URL-formatter currently on WD that can make the link for the property, so even when the data has been added, there is no blue-link currently from the WD page
18:55:44 <Keegan> This month you'll find the properties exercise that I linked to, it's open for participation for at least all of July
18:55:58 <Jheald> Is there any possibility of moving forward on that?
18:56:03 <Keegan> And coming later on this summer we'll have the Depicts prototype out for testing
18:56:30 <Glrx> How will Fae's copyright question be handled: e.g., file with PD image of Mona Lisa but copyrighted description of the image?
18:56:48 <pigsonthewing> Soone could write a tool (on Toolserver) to make the IIF URL using the data from Wikidata
18:57:00 <Micru> Keegan, I think there were some mockups about the depict prototype, can you please pass me the link?
18:57:03 <Jheald> It would be really really good, eg for motivating people to use Shonagon's tool for adding detail coords to WD
18:57:15 <Keegan> Glrx: good question. We don't know yet.
18:57:34 <pigsonthewing> s/soone/someone [I wish I could type!]
18:57:44 <Keegan> There will be a Structured Licensing and Copyright consultation towards the end of the summer, and the issue will be discussed then
18:57:51 <Jheald> Andy: Shonagon already has. Integrated part of Crotos. But a URL-formatter would be a huge boost to visibility & incentive
18:58:01 <Keegan> It's largely up for the community to decide, we'll have to bring the conversation together
18:58:08 <abittaker> Jheald, URL-formatters on Wikidata is something the Wikidata team might be able to handle
18:58:38 <Jheald> Keegan, Sandra: Any capacity at your end to help write HOWTO docs re fixing/adding WD infoboxes?
18:58:52 <Jheald> This is something that you help on
18:59:01 <Keegan> About a minute remaining, so thank you all for coming out and participating
18:59:17 <pigsonthewing> Jhead: I don't think what's there is what I mean. If I missed something: URL, please?
18:59:30 <Jheald> abittaker: Needs the push, that this is valuable/needed
18:59:31 <pigsonthewing> Thank you, all.
18:59:42 <Micru> Thanks everybody
18:59:48 <spinster> Jheald: Not on that specifically, but I am writing documentation (on very popular request) on how to do current Commons uploads to make them SD-compatible
18:59:54 <Jheald> wd team have endless tickets to fix, it just gets lost in the mass, w/o a championm
19:00:04 <spinster> And that includes guidelines on linking categories with Wikidata items
19:00:11 <pigsonthewing> abitt. Or a user script?
19:00:15 <Keegan> Jheald: It's unlikely I'll have time to work on howto docs for the infobox.
19:00:21 <spinster> I recommend to keep categories simple for new uploads
19:00:50 <Keegan> Okay, thanks for coming out (again). I'm going to end the formal meeting, but conversations are free to continue
19:00:50 <Keegan> #endmeeting