IRC office hours/Office hours 2017-11-21
Structured Data on Wikimedia Commons office hour
21 November 2017
18:00 - 19:15 UTC
18:00:36 <spinster> #startmeeting Structured Data on Commons office hour
18:00:36 <wm-labs-meetbot`> Meeting started Tue Nov 21 18:00:36 2017 UTC and is due to finish in 60 minutes. The chair is spinster. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:36 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:00:36 <wm-labs-meetbot`> The meeting name has been set to 'structured_data_on_commons_office_hour'
18:00:45 <spinster> Welcome everyone!
18:00:47 <Zppix> Hi
18:00:52 <geniice> hmm
18:00:57 <Lucas_WMDE> hi!
18:00:57 <Steinsplitter> :-)
18:00:58 <ebernhardson> hi
18:01:04 <BrillLyle> Hi
18:01:19 <spinster> I'm Sandra Fauconnier, community liaison for Structured Commons.
18:01:19 <abittaker> hullo hullo!
18:01:19 <zhuyifei1999_> hi
18:01:19 <spinster> In the meeting we also have Ramsey Isler, and Amanda Bittaker.
18:01:28 <risler_> hello everyone
18:01:40 <spinster> The meeting will last 1 hour - we will talk about past and current work for 30-40 minutes, leaving ample time for your questions.
18:01:55 <spinster> If you have specific questions while we explain things, feel free to just ask them!
18:02:00 <yannf> I announced the meeting on #wikimedia-commons
18:02:05 <spinster> Thank you :-)
18:02:11 <Zppix> I announced it in #wikipedia-en
18:02:29 <spinster> OK. We'll start by explaining the work of the past months!
18:02:57 <abittaker> I want to start by saying we have a team!
18:03:07 <spinster> :D
18:03:08 <pigsonthewing> Can somone announce this in the Wikidata channel, please?
18:03:10 <debt> yay for a team! :)
18:03:22 <Lucas_WMDE> pigsonthewing: I just did
18:03:43 <pigsonthewing> Lucas: Thank you.
18:03:43 <abittaker> Sandra is our CL, I'm the program manager, Ramsey is the product manager for multimedia, and Pam Drouin will be our UI/UX designer
18:03:55 <BrillLyle> CL?
18:04:03 <Zppix> BrillLyle: community liason
18:04:07 <BrillLyle> thx
18:04:25 <abittaker> You can see everyone working on the program here: #link https://commons.wikimedia.org/wiki/Commons:Structured_data/Development/Team
18:04:40 <Zppix> CL = Community Liason UI = User Interface
18:05:02 <Zppix> Fyi abittaker # commands have to be on a new line to take affect afaik
18:05:09 <spinster> And this new team has worked, among other things, on a large roadmap for the project - it includes the work that will be done till end 2019
18:05:10 <abittaker> Thanks, all, sorry for the acronyms.
18:05:19 <spinster> We still have to get used to this format ;-)
18:05:30 <spinster> The roadmap in its current version can be found here:
18:05:32 <spinster> #link https://commons.wikimedia.org/wiki/File:Roadmap_for_Structured_Commons_development_-_version_2017-10-31.pdf
18:05:52 <spinster> I'll explain the big bits of work that are in there
18:06:29 <spinster> The first six months are pretty realistic - but afterwards it becomes harder to plan. So we expect to update this roadmap every now and then.
18:07:02 <spinster> We will also post milestones: specific moments when there will be changes on Wikimedia Commons.
18:07:11 <spinster> The rough plan is as follows:
18:07:36 <spinster> The first 'structured data thing' that you will see on Commons will be around Summer 2018
18:07:50 <spinster> A small feature that we call 'file captions'
18:08:26 <spinster> A short, translatable line of text that explains what is in an image. We'll explain it further later.
18:08:52 <spinster> Later in 2018 we will publish more structured data / MediaInfo features.
18:09:13 <Steinsplitter> Regarding ", translatable line of text t" it may be used as file description for example?
18:09:14 <spinster> Then tools that indicate what a media file 'depicts'
18:09:21 <spinster> Then structured licenses,
18:09:36 <spinster> Then functionalities for re-using and embedding media files.
18:09:46 <spinster> That's the rough order :-)
18:10:29 <spinster> At the same time, we support volunteer developers who build tools on Wikimedia Commons and related to structured data
18:10:42 <yannf> I thought the first feature would be categories?
18:10:57 <BrillLyle> So these are the things that end users will see as they get implemented? Is this the order of how they are being addressed from the data perspective?
18:11:13 <spinster> No, not categories. We can explain categories later.
18:11:27 <spinster> I'll explain in more depth a bit later.
18:11:54 <zhuyifei1999_> just wondering, will there be multilingual categories?
18:12:00 <Steinsplitter> > At the same time, we support volunteer developers who build tools on Wikimedia Commons and related to structured data <-- This is *really really* important, we have only a few tools developer on commons. :)
18:12:25 <jheald> Steinsplitter: my understanding is it /could/ be used as a file description, but might be rather a poor one, as would contain no links or mark-up
18:12:36 <pigsonthewing> As I keep saying; we need to clone Magnus ;-)
18:12:40 <abittaker> Stiensplitter, the file caption will be very short, more like a translatable file title, than a full-length description
18:12:41 <geniice> Steinsplitter yes I imagine the poachers will be interested in that.
18:12:52 <Steinsplitter> pigsonthewing: haha, please! :)
18:12:57 <Zppix> spinster: ive seen complaints from commons users that structured data is taking so long to roll out is there a reasoning behind it that you could provide?
18:13:04 <Isarra> Why captions?
18:13:32 <BrillLyle> I see Commons more as a repository -- I am wondering if the functionality and options from the other projects are also being focused on - like integrating the structured data for dynamic Listeria tables and data visualization, i.e., Reasonator
18:13:53 <geniice> Isarra adding yet another form of metadata allows more opportunities for vandalism
18:13:55 <spinster> We'll answer questions about categories at the end if everyone is OK with that. We're making a list of them!
18:14:09 <jheald> BrillLyle: that will come with SPARQL integration
18:14:22 <BrillLyle> Okay. Thanks @jheald
18:14:39 <risler_> Hello folks! I'll chime in with some info about MediaInfo now
18:14:56 <BrillLyle> This is definitely jumping the gun but what about Creator templates that pull from Wikidata. Will Structured Data replace those? Facilitate those?
18:15:12 <risler_> MediaInfo is a key Wikibase extension that will help make all this work on Commons
18:15:41 <risler_> The Wikidata team has worked on this before. You can see a test here #link http://federated-commons.wmflabs.org/wiki/MediaInfo:M13
18:15:42 <stashbot> M13: Cite Inspector - https://phabricator.wikimedia.org/M13
18:15:58 <spinster> BrillLyle: structured data CAN definitely replace Creator templates that are currently linked to Wikidata. It is up to the community to model and decide that, but we definitely provide the technology for it
18:16:03 <jheald> BrillLyle: MediaInfo is primarily for info related directly to file pages. Creator-type info will typically live on Wikidata.
18:16:23 <Zppix> Could someone answer my question above about overall rollout?
18:16:33 <risler_> The Multimedia team is now helping to finish some leftover tasks on MediaInfo, and getting experience with Wikibase
18:16:51 <spinster> Zppix: Because it's a complex project that we don't rush into.
18:16:52 <jheald> Zppix: have you seen the length & density of the roadmap - not trivial!
18:17:21 <risler_> An easy way to think of the end result of MediaInfo is that it will allow us to have M items for each piece of media on Commons, kind of similar to how Wikidata has Q items
18:17:26 <BrillLyle> When I click on http://federated-commons.wmflabs.org/wiki/File:LighthouseinDublin.jpg the M13 data does not display. Would it eventually display? And would it be linked to the original structured data?
18:17:38 <spinster> BrillLyle: yes!
18:17:54 <Steinsplitter> stuff has to be merged by bots into the structured commons or will this be done by a server side parser?
18:18:09 <BrillLyle> Would these be editable by the end user like Wikidata facts are on Wikipedia? With the brush?
18:18:31 <risler_> we are keeping track of your questions and will answer each :)
18:18:57 <spinster> BrillLyle: We're only getting started on the design. My current understanding is that it will be editable like it is now on Wikidata itself.
18:19:35 <spinster> Design proposals, when they are more advanced, will be published for review by the community.
18:19:42 <BrillLyle> I just want to make sure that current Wikimedia Commons end users who might not be comfortable with Wikidata have both a heads up on the interface change and have ease of updating metadata
18:19:51 <risler_> Continuing with the MediaInfo topic: the Wikibase instance on Commons will be able to link directly to items in Wikidata due to a thing we call federation
18:20:06 <risler_> Federation is the functionality that makes it possible to use items and properties from Wikidata in the Wikibase installation on Commons.
18:20:37 <risler_> The Wikidata team worked on that in the first half of this year and we also have a test version of that ready now at the same URL mentioned earlier: #link http://federated-commons.wmflabs.org/wiki/MediaInfo:M13
18:20:41 <BrillLyle> When uploading new images will end users on Commons have a menu of federated metadata to select from?
18:20:42 <spinster> BrillLyle: We're very aware that many people are not familiar with the Wikidata interface. We have just hired a good designer, Pam, who will take this into account.
18:20:56 <BrillLyle> thanks @spinster
18:21:28 <risler_> we have an image that we hope helps to illustrate how federation will work
18:21:39 <risler_> #link https://commons.wikimedia.org/wiki/File:Structured_Data_on_Commons_-_which_information_goes_where_-_version_2017-10-31.png
18:22:02 <spinster> BrillLyle: That, too, is something we still need to design for, but I can imagine that we would like such type of functionality, yes.
18:22:36 <risler_> here's a quick reference key for the image above
18:22:41 <Steinsplitter> (q) stuff has to be merged by bots into the structured commons or will this be done by a server side parser? Commons has millions of filedescription pages.
18:23:17 <BrillLyle> "non-notable contributor" seems like a problematic descriptor
18:23:34 <risler_> Blue zone: what lives on Commons
18:23:42 <risler_> Left column of blue zone = 'old' Wikitext. We are not taking anything away ourselves!
18:23:43 <jheald> Steinsplitter: my understanding is most will have to be done /by hand/
18:23:48 <geniice> risler_ you appear to be creating a great tool for finding rare wildlife to poach and to stalk Lydia Pintscher
18:23:50 <jheald> crowdsourcing
18:24:00 <risler_> Right column of blue zone = what will live on Commons via federated Wikibase
18:24:04 <spinster> Steinsplitter: And bots, and Magnus tools
18:24:10 <BrillLyle> Why isn't the File Name, Resolution, EXIF data in Wikidata as well?
18:24:12 <risler_> White block on the right = Wikidata. We will pull items and properties from there.
18:24:22 <Steinsplitter> jheald: this will be impossible, 40 mllions of pages with only a hand full of active users on commons ;)
18:24:34 <jheald> v hard to machine-interpret the info in existing descriptions etc. Game-type guess/confirmation may be as good as it gets
18:25:08 <jheald> St: /might/ be possible to systematically extract from categories -- more realistic than descriptions
18:25:44 <BrillLyle> probably a dumb question but is the problem that the metadata that exists on commons is either dirty, too complicated, or non-existent?
18:25:46 <Zppix> jheald: i dont find it approiate to put so much of a workload on a community when it should be somewhat possible to atleast semi auto the process Steinsplitter is talking about
18:25:48 <jheald> St: but even that has huge potential for error, if no human verification
18:25:52 <risler_> BrillLyle - this isn't a final final mapping, but we do need to consider that some Commons info just don't fit with the Wikidata model
18:25:55 <abittaker> jheald, st: templates could be a good source for data to convert as well
18:25:59 <spinster> Steinsplitter: Also, templates that currently already link to Wikidata, e.g. via Jarekt's work, will be easier to convert in any case
18:26:00 <BrillLyle> Is structured data a way to solve and/or address these problems?
18:26:02 <Steinsplitter> BrillLyle: metadata (EXIF) is yet stored in the database.
18:26:22 <risler_> some additional things to consider:
18:26:29 <BrillLyle> EXIF metadata seems like it would be a great fit for Wikidata
18:26:32 <jheald> st: problem is existing template coverage is minimal, as an overall %age
18:26:43 <BrillLyle> Ah. okay. Thx @Steinsplitter
18:26:55 <risler_> Commons will also keep hosting data itself. For instance captions and longer text descriptions will still live there.
18:27:07 <Steinsplitter> (in image tabe, also replicated on labs but serialized)
18:27:18 <risler_> Usernames will also not be transferred to Wikidata. We think about using smart URIs for this (perhaps a topic for later).
18:27:33 <BrillLyle> Can Wikidata handle this load?
18:27:52 <jheald> brill: a very good question
18:27:59 <pigsonthewing> We don't want a Wikidata item for every file on Commons; that's what this new database is for.
18:28:15 <BrillLyle> Ah, so this is a separate database -- that looks very similar to Wikidata?
18:28:32 <BrillLyle> would it look like the federated commons site?
18:28:38 <BrillLyle> I think that's very confusing to end users
18:28:43 <BrillLyle> Yikes
18:28:48 <Isarra> Isn't it just another wikibase instance?
18:28:58 <Zppix> Oh lord not another...
18:29:00 <Lydia_WMDE> Wikidata is growing and showing some growing pains but we are constantly working on improving that so as Structured Data on Commons becomes a reality we will handle the demand I believe :)
18:29:04 <jheald> Brill: think of it as CommonsData, Commons's very own wikibase
18:29:20 <abittaker> the federated commons site doesn't have integrated structured data and file pages yet, because multicontent revisions isn't ready yet
18:29:28 <BrillLyle> What will the editor see when they update the metadata?
18:30:16 <jheald> brill: maybe a section similar to Wikidata, initially. Likely to become slicker & flashier with time.
18:30:35 <pigsonthewing> spinster: I think we need a short, clear video explaining the concepts and showing a mockup
18:30:40 <risler_> BrillLyle we are doing early work on designs now, and the community will start to see some wireframes next quarter
18:30:48 <BrillLyle> I don't want slicker and flashier. I want functional. And less hopping from project to project for editors.... :-)
18:31:05 <spinster> pigsonthewing: Yes, I had thought about that already
18:31:17 <Isarra> Yeah, keep it simple, please.
18:31:26 <Isarra> Wikidata's is already a bit all over the place.
18:31:44 <Lydia_WMDE> So in terms of what people will see: we are working on integrating the structured data into the file page
18:31:53 <BrillLyle> I am not convinced videos are the answer. These graphics are fine. It's more about communicating the information clearly to me. Which seems to be happening
18:32:00 <Lydia_WMDE> among the benefits that will bring us is that edits will show up in the version history of the file page
18:32:19 <Lydia_WMDE> the underlying technology for that is called multi content revisions
18:32:49 <BrillLyle> So that versioning option isn't happening for the current Creator templates pulling from Wikidata, correct?
18:32:51 <abittaker> Multicontent revisions! So good
18:32:56 <BrillLyle> The versioning would stay on Commons?
18:33:06 <Isarra> Is that like multiple content models in one revision?
18:33:13 <Lydia_WMDE> Isarra: exactly
18:33:15 <jheald> Re the roadmap: it would be good to have a much better idea of the possible search/refine-search functionality earlier. Also, at least some back-of-envelope estimates as to likely load that would imply, & what is realistic. Current SPARQL searches would (IMO) be too slow for live end-users refining concepts. Has any early-stage analysis been done?
18:33:15 <Lydia_WMDE> BrillLyle: yes
18:33:18 <Isarra> Cooool.
18:33:20 <abittaker> It won't be visible to the end user, but it's a restructuring of the databases so that structured data and file info can exist on the same page
18:33:38 <BrillLyle> I want the information to be versioned but at the same time I want the information to be dynamic and automatically updating
18:33:55 <Isarra> To the end user it should just look like it's all the same page. Editing it... edits it, shows up on recentchanges, histories, all that. None of the madenss you see with flow and the like?
18:34:13 <Lydia_WMDE> Isarra: that is the idea indeed :)
18:34:22 <Isarra> Can you guys rewrite flow next?
18:34:23 * Isarra flees.
18:34:23 <abittaker> The wikidata team has been working on multicontent revisions since mid 2016 and it should be ready to deploy by mid-2018
18:34:35 <abittaker> A presentation by Daniel Kinzler about it:
18:34:49 <abittaker> #link https://commons.wikimedia.org/w/index.php?title=File%3AUnconference_-_Next_step_towards_multi-content_revisions.webm
18:35:02 <Lydia_WMDE> Isarra: here is more info if you are interested from the tech side: https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions
18:35:04 <Isarra> Does this affect protection, deletion, etc?
18:35:10 <spinster> Now, a few more updates on current work - less technical
18:35:14 <Isarra> Or does that look the same too?
18:35:32 <Lydia_WMDE> the idea is to have it really behave as one
18:35:48 <spinster> Jonathan Morgan, our design researcher, has interviewed staff members from cultural institutions (GLAMs) and done a survey with them to research their needs.
18:35:49 <ebernhardson> Isarra: the community will just hate the new thing too, because its not the old things
18:35:49 <jheald> roadmap (ctd). Would like to see at least a cartoon of search-refinement much earlier than mid next-year. Also, rough estimates as to what it would require.
18:35:52 * ebernhardson flees
18:35:54 <BrillLyle> What will happen to current Creator templates?
18:36:09 <spinster> At the end of this calendar year we'll publish the results of that research.
18:36:26 <Isarra> ebernhardson: They'll get over it. Probably.
18:36:42 <spinster> Also, at this moment I'm making an inventory of all the important community tools that might benefit from being updated to work with structured data.
18:36:47 <jheald> (roadmap) Search functionality currently v bunched-up on the roadmap. I'd like to see some pre-planning, and believe it needs to be timetabled in.
18:36:58 <BrillLyle> I think if the end user tools consider their needs it will be fine. But that doesn't seem to be the focus right now. I get it but it's really what my priority is
18:37:19 <spinster> Please let us know if there are specific tools that are very important to you. We just published a page with a table for them
18:37:22 <pigsonthewing> BrillLyle: that question was answered earlier; at 18:16
18:37:25 <Steinsplitter> ebernhardson: no, if the community is well informed. and remeber: you are part of the communit ;) at least i hope so
18:37:30 <spinster> #link https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Tools
18:37:35 <BrillLyle> Do you have a link?
18:37:42 <spinster> And a large spreadsheet where you can help prioritizing.
18:37:52 <spinster> #link https://docs.google.com/a/wikimedia.org/spreadsheets/d/1GVR0jghBWuAGqJaT7KVXigMYWWNzdnrnwI9nWqfJrCo/edit?usp=drive_web
18:38:20 <spinster> Also, I think that many of you have seen that we have a community focus group. If you haven't joined it yet, please do so.
18:38:24 <pigsonthewing> spinster: could that be moved on-wiki?
18:38:36 <jheald> Search deliverability to end-users will be make-or-break for the project. Desire for generality -- eg "Painters by Dutch painters born 1740s" is much harder than most museum tag-based systems. Want to know whether this is realistically achievable
18:38:40 <spinster> That's the group of people who will be consulted very frequently
18:38:50 <spinster> #link https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Community_focus_group
18:39:07 <spinster> And a focus group of active GLAM people is being formed as well. I'm working on that at the moment.
18:39:20 <Steinsplitter> ebernhardson: if you build stuff together with the community, the community will love it. i am sure.
18:39:38 <Steinsplitter> the mp3 stuff, etc. was build together with the community. everybody is happy :)
18:39:55 <pigsonthewing> We also need to keep in mind that not all files are images: we have video, audio and, soon, 3D
18:40:03 <Steinsplitter> but splinter is doing a good job as community liason, thus i am not concerned :)
18:40:09 <spinster> In the next months, Jonathan Morgan, our design researcher, is going to interview active Commons contributors as well, to make sure we include needs and pain points from you all too. You will hear more about that soon I think.
18:40:13 <BrillLyle> I know Pigs but the issue is that I am in the process of preparing to upload a batch of donated images and we had thought to use the Creator template. While this project is developing, it is still unclear to me how the Creator template will be utilized and how structured data we input should be effectively organized
18:40:20 <spinster> #link https://phabricator.wikimedia.org/T175185
18:40:35 <jheald> st: but need the hooks to build those tools on -- no categories on CommonsData is (IMO) a big issue
18:40:55 <pigsonthewing> BrillLyle: use templates and caetgories, as now.
18:41:16 <BrillLyle> And just know that not everyone who works with metadata is officially part of GLAM officially. I suspect that a lot aren't officially connected
18:41:32 <spinster> Other next steps: multilingual captions, as we mentioned earlier
18:41:47 <jheald> Brill: expect the Creator template to be around for at least ~4 years. And if you use it, it will make it *far* easier to convert the info to Structured Data
18:41:52 <BrillLyle> Pigs, that seems shortsighted in light of how much everything will change going forward. And I'm asking the structured data team for feedback on this
18:41:58 <Steinsplitter> <jheald> as far i know categorys will be keept?
18:42:22 <pigsonthewing> BrillLyle: OK, I won't asnwer any more of your Qs.
18:42:33 <abittaker> Alright, everyone, let's talk about categories
18:42:35 <BrillLyle> @jheald okay. that's helpful
18:43:10 <jheald> st: cats will be kept, but current plan is won't have CommonsData items -> won't be able to store structured data about them -> makes them much harder to machine-process collaboratively and at scale
18:43:10 <abittaker> So! There will be structured data fields that can be used to organize everything that categories currently organize
18:43:26 <jheald> abittaker: dubious
18:43:30 <BrillLyle> so the language neutral properties of Wikidata, will that be integrated in the structured data? That seems to be the biggest strength of Wikidata, in my opinion
18:43:40 <pigsonthewing> abittaker +1!
18:43:51 <jheald> abittaker: that's the kind of comment that ppl who don't use categories on Commons tend to make
18:44:08 <abittaker> But, because of the nature of categories, they are difficult to translate 1:1 to structured data
18:44:11 <BrillLyle> i'm sorry but categories are a nightmare. they aren't three dimensional and they are bulky and unhelpful
18:44:14 <pigsonthewing> jheald: I use categories on Commons every day.
18:44:16 <Steinsplitter> abittaker: as far i know categorys will be keept? because cats are largely used, we have policyes for that, etc.
18:44:37 <spinster> BrillLyle: Yes, on structured Commons we will be able to re-use the multilingual properties and items in Wikidata. How exactly is up to the community to model/decide.
18:44:46 <abittaker> So we'll leave categories as they are now, people can translate what they want, and we'll see how categories might evolve down the line when we know how structured data will work
18:44:48 <geniice> BrillLyle " three dimensional"
18:44:54 <BrillLyle> there has to be a higher level solution to categories.
18:45:05 <abittaker> Steinsplitter, yes, categories will be kept
18:45:26 <Steinsplitter> abbitacker: it would be a idea to assign tags automatically/load the multilingual descriptions in the category header :)
18:45:33 <Steinsplitter> that would be excellent.
18:45:33 <jheald> pigs: yes they are horrible. But they will also survive, because people like to be able to offer that degree of curation & control
18:45:43 <abittaker> How does that sit with you all?
18:45:58 <pigsonthewing> so long as we have a way to group images in "sets" (not necessarily all images on a subject, but - for example - those taken consecutively, at one event, by one person)
18:46:20 <jheald> the big problem is that cats are the nearest thing we have now to structured data. But nowhere structured is planned to store information about them.
18:46:32 <pigsonthewing> abittaker: works for me
18:47:20 <Isarra> You can extrapolate more data out of categories without messing with what they're doing already.
18:47:23 <pigsonthewing> jheald: here's a cat I worked on today: https://commons.wikimedia.org/wiki/Category:Crosses_of_Nails_from_Coventry_Cathedral
18:47:33 <Isarra> ...maybe.
18:47:34 <Steinsplitter> yes, category are widely used. if you look at the daily edit stats, a huge amout of cat changes, we also have a lot of CFD(https://commons.wikimedia.org/wiki/Commons:Categories_for_discussion) discussions going on every month. There are a number of users who are working every day since years to keep the category stuff runnung :) We have bots, scrpts, etc. for.
18:47:59 <pigsonthewing> as stuctured data that would be, say, depicts=cross-of-nails; place-of-origin=Cov-Cathedral
18:48:03 <Steinsplitter> AND cats are also widely linked from wikipedias(!). at least from dewp.
18:48:05 <jheald> What is needed (IMO) are CommonsData items for categories. Wld be very helpful for translating the information across. Also a very helpful test-bed for the technology, de-coupling it from the complications of developing the file-page interface.
18:48:08 <spinster> We do have some unanswered questions that we will answer now, unless people object :-)
18:48:12 <risler_> steinsplitter: we are working on our migration plan for categories -> "tags" via the depicts property. it's a bit early to say what will or won't be in there, but we are exploring all viable options and that could be one.
18:48:22 <abittaker> Isarra, to your earlier question on "why captions?"
18:48:28 <Steinsplitter> 19:48:06 <jheald> What is needed (IMO) are CommonsData items for categories. <-- exactly! 100% agree. This is what the community wants :)
18:48:55 <jheald> pigs: we need to be able to record that, somewhere, in a structured way in order for the tools to be able to work on it
18:48:59 <pigsonthewing> We need a tool like Cat-a-Lot, but that adds properties to all (or selected) items in a category
18:49:02 <abittaker> We wanted to start with a small thing that would not break people's workflows, but that would still connect uploaded, editing, and search
18:49:18 <Isarra> Okay, stupid question.
18:49:20 <Isarra> What are captions?
18:49:43 <yannf> IMO the biggest short coming on Commons is that categories are English only, so I hope there will be a solution for that
18:49:54 <BillLyle2> I am worried about the metadata -- and what end users and institutional providers will be required to provide to donate their metadata-rich images
18:49:55 <geniice> Isarra a born unmaintained vandalism opportunity.
18:49:56 <jheald> as a by-product CommonsData for categories would also make it easy to present internationalisations of category names, Commons's #1 request for ever
18:49:59 <Steinsplitter> risler_: i suggest you to consult the community regarding this to prevent drama later ;) because category stuff is very sensitive on commons, and over the years we built policyes etc. over it. thy can't be simple replaced.
18:50:07 <Isarra> geniice: :D
18:50:11 <Steinsplitter> agee with yannf :)
18:50:18 <BillLyle2> categories are faux structured metadata. they are the WORST
18:50:22 <spinster> Steinsplitter: re categories, see also this Phabricator task and suggestion from Magnus about conversion, tools for that, etc https://phabricator.wikimedia.org/T180113
18:50:29 <Isarra> But seriously, what are they actually for? How is that different from the top of the description/filename?
18:50:36 <abittaker> Isarra, think of file captions as a short text field that can serve as a brief description of the image, similar to a multilingual file name. This new field can be used as 'alt' text for images and will aid with searching. Since the field will allow for multilingual alternative text strings, it will also be the project’s first effort at improving support for browsing content in multiple languages.
18:50:52 <Isarra> So basically the title?
18:51:01 <abittaker> Yup, but multilingual
18:51:06 <Isarra> Okay, thanks.
18:51:11 <abittaker> We wanted to start small :)
18:51:17 <pigsonthewing> No, no, no "alt" is not a caption!
18:51:25 <pigsonthewing> Soory, but *bugbear*
18:51:29 <Isarra> What is it?
18:51:30 <geniice> abittaker "and will aid with searching" you don't get it do you?
18:51:34 <BillLyle2> Oh I was wondering if the caption could be alt text. That's very helpful
18:51:39 <BillLyle2> If it's similar to the Wikidata description wouldn't it automatically be multi-lingual? I think I am not understanding
18:52:02 <jheald> spinster: replied just before the session on that phab. Magnus's idea of draining info out of categories won't run. Parallel running will be needed; management of which wld be v much easier if categories had structured data
18:52:03 <risler_> steinsplitter: yes, agreed. this conversation is just one part of a lot of conversation to come on the topic :). there have also been chats in various Phab tickets that we are considering.
18:52:19 <abittaker> BrillLyle, yes, it's automatically multilingual
18:52:27 <geniice> abittaker the reason commons has such poor descriptive metadata isn't due to the lack of places to enter the metadata. Its because people uploading have no particular incentive to add metadata
18:52:39 <pigsonthewing> alt is for people who DO NOT see the image.
18:52:56 <Isarra> Given the multilingual nature, how do people stop vandalism when they're only seeing one or another version of it at a time?
18:53:09 <jheald> See also comments towards the end of this Commons VP thread on CommonsData for categories, and why objections don't hold water: https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2017/11#Structured_Commons_newsletter.2C_October_25.2C_2017
18:53:14 <geniice> abittaker if you want to get around that to need to look at making better use of the information you already have like what articles the image is in
18:53:17 <BillLyle2> thnx
18:53:40 <BillLyle2> thanks for mansplaining Pigs. I did not know that!
18:53:46 <abittaker> pigsonthewing that is a good point, the caption *can* be used for alt text, but that doesn't mean it should necessarily be. we can develop some best practices and design for them.
18:53:49 <Isarra> With collaborationkit, we just plonked all the different fields on the page, but that doesn't really seem viable when you've got an arbitrary number of different versions of each...
18:54:03 <Isarra> BillLyle2: 'mansplaining'? Really?
18:54:38 <pigsonthewing> Oh FFS. Do we have to put up with gendered insults on this list?
18:54:50 <jheald> abittaker: main limitation is that the caption won't contain links or mark-up, right ?
18:54:56 <geniice> pigsonthewing IRC channel but yes
18:55:15 <BillLyle2> try not doing what you always do then Pigs
18:55:16 <abittaker> jheald, that's right. and it will be short. maybe 80 characters?
18:55:20 <BillLyle2> I'm just asking questions here
18:55:28 <pigsonthewing> And now ad hominem?
18:55:44 <Isarra> BillLyle2: Please just don't. A question was asked, and he answered.
18:55:58 <BillLyle2> Whatever
18:56:43 <Steinsplitter> (btw, for irc nerds: there is also a structured commons channel on irc: #wikimedia-commons-sd) :)
18:56:48 <yannf> abittaker, you have to take into account UTF-8, i.e. 80 characters in Chinese is much less than in English
18:56:49 <spinster> We're looking at the last unanswered questions now
18:56:59 <geniice> abittaker character limits are kinda dicey cross language
18:57:09 <geniice> so Question
18:57:13 <BillLyle2> It wasn't clear.
18:57:23 <jheald> abittaker: so only a limited substitute for descriptions, even short ones. I accept semantic mark-up, even links, aren't appropriate - but for many uses, they add huge value
18:57:28 <geniice> Question:What makes you think anyone is going to fill any of this stuff in?
18:57:46 <jheald> did anyone pick up my Q about search & the roadmap ?
18:57:51 <Isarra> geniice: If it's easy to edit as part of the usual interface, people'll do it.
18:57:56 <Isarra> If not, welll... who knows!
18:58:07 <abittaker> totes, characters are tricky cross language. we don't yet know how we're solving for that, but we're aware of it.
18:58:10 <geniice> Isarra I've been editing since 2004. My experience is otherwise
18:58:15 <BillLyle2> Will existing Library Science ontologies be utilized so there is minimal reinvention of the wheel? I mean, is this model that is being attempted already existing elsewhere, like Internet Archives, etc.?
18:58:26 <Isarra> geniice: And you've seen some of what's come out of the WMF in the past, eh? >.>
18:58:28 <jheald> isarra: it won't be wikitext. It might be similar to some of the infobox magic editing on some wikis.
18:58:35 <Isarra> Fortunately the team on this is better than previous...
18:58:58 <Isarra> jheald: Doesn't need to be wikitext to be editable without extra steps.
18:59:09 <Isarra> That's really the key thing you need - not requiring extra effort to interact with it.
18:59:18 <jheald> Q: is there any room for a re-think, to allow CommonsData items for categories?
18:59:53 <BillLyle2> I am still unclear how new bulk upload projects should prepare themselves and their metadata to maximize integration with this project
18:59:55 <spinster> BillLyle2: Yes, in the GLAM team at WMF we're talking to various parties in that area. And on Wikidata we already have quite a few typical GLAM vocabularies (think the Getty ones)
19:00:04 <geniice> Isarra I'm not worried about the software working it I'm worried about lack of interest in using it. Commercial image databases get good metadata because uploaders have a direct financial interest in making their images find-able. Doesn't apply to commons. This is the core problem
19:00:21 <risler_> jheald: we will have *some* search designs available throughout early next year. not everything though. it will be an ongoing process, not one where we dump everything at once.
19:00:29 <abittaker> Hey all, it's 11, but we'll stick around and answer more questions
19:00:30 <BillLyle2> Oh good. the Getty thesauri are great tools
19:00:53 <Isarra> geniice: My point is if it's easy to fill in when uploading/editing the rest of it, people will. They won't need extra interest if it's not extra effort.
19:00:57 <Isarra> But otherwise, yeah, that's a definite concern.
19:01:20 <jheald> Bill: use existing methods, but as many standard templates as you can -- eg creator, institution etc ("poor man's structured data")
19:01:22 <Isarra> We'll probably have to see what interfaces they come up with, though, to really see...
19:01:51 <geniice> Question:structured commons with its GPS and species Data would appear to create an obvious hazard in terms of poaching rare wildlife. Have you spoken to anyone about this?
19:02:03 <BillLyle2> If we focus on uploading the metadata to Wikidata, will that suffice?
19:02:13 <jheald> risler_: and how much will end-users realistically be able to refine their concepts, in real time ?
19:02:21 <BillLyle2> I mean that's sort of the plan, but I don't want to go through massive amounts of manual creation of Creator templates unnecessarily.
19:02:28 <geniice> Isarra so are you going to break the upload form I use (specifically this one https://commons.wikimedia.org/wiki/Special:Upload )?
19:02:42 <abittaker> woah, geniice, i hadn't even thought of that. we'll have to consider that in the design. Thank you!
19:02:46 <jheald> risler_: has anyone done any assessment on its achievability?
19:03:04 <Steinsplitter> BillLyle2: as far i know, for you - as uploader - not much will change. It will still be easy to upload stuff.
19:03:08 <abittaker> Can we follow up with you about this?
19:03:09 <yannf> BillLyle2, creating a Creator template now is 2 clicks
19:03:38 <geniice> abittaker it also makes it easier to stalk users since you can use geolocation to work out where they likely live although determined stalkers can already can and will do that.
19:04:10 <risler_> jheald: when you say "their concepts", which concepts are you referring to?
19:04:14 <BillLyle2> If I have a lot of Creators that's not 2 clicks though. I am concerned about bulk uploads vs. a manual clicking process.
19:04:19 <abittaker> Geniice, that is definitely something to consider as well.
19:04:19 <jheald> bill: one thing you can do is make sure /Wikidata/ has good information for all the concepts/people/types of obects etc that are relevant
19:04:32 <spinster> geniice: Yes, good point to think very well about privacy issues.
19:04:37 <jheald> bill: that may be where you should start
19:05:32 <BillLyle2> I hope the pathway between Wikidata and Commons is easy for end users. That is a huge priority. Apologies, probably very self-evident
19:05:34 <jheald> bill: if Wikidata has good info, that makes Creator templates easier to make -- it's also what structured data will ultimately reference
19:05:53 <yannf> yes, privacy issues might be a concern, that's why I sometimes *remove* geolocation from my images
19:06:24 <jheald> bill: I think we can expect it to be significantly difficult, especially when trying to reference things that don't already have Wikidata items.
19:06:24 <BillLyle2> Okay, thanks for all of the information. Appreciate it. Still pretty unclear as to how this will improve things on Commons but hopeful.
19:06:28 <Isarra> geniice: I would absolutely love to break that.
19:06:43 <Isarra> But seriously, yeah, definitely keep bringing that one up. >.>
19:08:08 <geniice> Isarra but I'm still using it see?
19:08:39 <Isarra> geniice: So am I!
19:08:46 <Isarra> IT's terrible, and yet it's the best thing we have!
19:08:58 <jheald> bill: cf how hand-making citations on Wikidata isn't the easiest of things sometimes
19:08:58 <BillLyle2> @jheald Yes. Trying to work from the Wikidata as skeleton and then fill in from there but it gets very laborious. Had hoped that Structured Data might improve this.
19:08:58 <jheald> bill: no. Structured Data will /require/ that skeleton. Won't magically make it for you. (At least, not until people develop a shedload of new tools).
19:08:58 <BillLyle2> Citation on Wikidata yes. Awful really. But the form and lookup on Wikipedia Markup is fabulous. I would like to see something like that, a form that might be a Wikidata or Structured Commons interface, but the end user wouldn't need to know
19:08:58 <jheald> bill: the pain /will/ be worth it :-)
19:08:58 <BillLyle2> @jheald I think therein lies the problem.... I don't want magic, but my concern is end-users and editors.
19:09:09 <Isarra> All attempts to improve it seem to make it worse!
19:09:10 <jheald> but ppl: give us CommonsData for categories, so we can at least try to get some of the way with machine help
19:09:56 <geniice> Isarra eh the current system is better for new users but as someone who's built image upload systems I prefer something close to the metal
19:10:56 <jheald> Is anyone from the team still here, or have you all gone home now?
19:11:09 <risler_> we're all still here
19:11:09 <abittaker> we're still here :)
19:11:36 <Isarra> geniice: Well, sort of. It's pretty overwhelming, when most of the fields aren't even needed/won't let you put in the right information...
19:11:36 <Lydia_WMDE> jheald: wrt to creating items on commons itself: i really think we should not do it. the whole setup with items and properties on wikidata and statements on commons and so on is already super complex and i fear hard to understand for people
19:11:47 <Lydia_WMDE> if we add another layer to that... :(
19:11:57 <Lydia_WMDE> so I think we need to find other ways to solve that issue
19:12:08 <Lydia_WMDE> and I think the work you've done on category items on wikidata is a huge step
19:12:32 <jheald> It's not another layer on top. It's another very similar layer on the side. One that will be much easier to implement, evaluate and test.
19:12:43 <jheald> A much better place to shake out bugs
19:12:55 <spinster> So, everyone, before we conclude - we're curious how you found the format of this new IRC office hour. Any feedback, tips for improvements?
19:12:59 <jheald> And visible only to those that want to see it
19:13:11 <jheald> A very useful de-coupling
19:13:24 <Steinsplitter> spinster, thanks for holding this office hour. it was useful and informative.
19:13:57 <spinster> Seeing there are so many participants and questions, I think it will be useful to plan a next one not too far away. Happy that so many people showed up :-)
19:14:03 <pigsonthewing> Very useful discussion, thank you.. Pity about the (in)civility issues.
19:14:37 <Lydia_WMDE> jheald: it would still mean we have to make significant code changes to allow items on commons itself and items coming from wikidata to commons to exist side by side
19:14:40 <spinster> Do you all prefer regular IRC? We have also been considering a Google Hangout with IRC on the side. Then we could show more things visually
19:14:44 <Lydia_WMDE> that's a non-trivial undertaking
19:15:12 <Isarra> > We have also been considering a Google Hangout with IRC on the side. << please don't.
19:15:14 <Lucas_WMDE> spinster: I found it very informative, but also too chaotic at times. not sure what to do about that though
19:15:16 <pigsonthewing> IRC's better.
19:15:20 <Lucas_WMDE> I’d prefer IRC too
19:15:37 <Steinsplitter> +1
19:15:40 <spinster> Good. I agree it was a bit chaotic - it's still new for us too and we'll think about making it better next time.
19:15:43 <jheald> Lydia: easier to think how to make that possible now, than down the line, with a live codebase to alter
19:15:57 <Isarra> You guys handled it well.
19:16:14 <Isarra> Even amidst the chaos I didn't notice anything important getting lost.
19:16:14 <spinster> :-)
19:16:42 <abittaker> Thanks everyone :) It's so good to see you all here
19:16:54 <spinster> If you have further questions, you can also use our dedicated IRC channel where I'm camping most of the time. Pinging me actively is a good idea
19:17:02 <spinster> #wikimedia-commons-sd
19:17:13 <pigsonthewing> Bye, folks
19:17:16 <abittaker> Next time, perhaps we'll note when we see questions and that we'll answer them at the end
19:17:18 <spinster> Bye Andy!
19:17:45 <spinster> We'll keep you posted about the next edition for sure.
19:17:58 <jheald> Also the items would be limited -- 1:1 matched to CommonsCats -- not suggesting items for non-notable creators (though I have seen it proposed as a solution to some quite tricky issues in a phab ticket).
19:18:11 <spinster> Probably after the holidays.
19:18:52 <spinster> If there are no objections, I will close the meeting with the magic incantation now.
19:19:18 <spinster> #endmeeting