IRC office hours/Office hours 2014-10-16

From Meta, a Wikimedia project coordination wiki

Structured Data[edit]


Time: 18:00-19:00 UTC
Channel: #wikimedia-office
Timestamps are in UTC.

18:00:14 <Keegan> #startmeeting Structured data on Commons
18:00:14 <wm-labs-meetbot> Meeting started Thu Oct 16 18:00:14 2014 UTC and is due to finish in 60 minutes. The chair is Keegan. Information about MeetBot at
18:00:14 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:00:14 <wm-labs-meetbot> The meeting name has been set to 'structured_data_on_commons'
18:00:19 * multichill doesn't know
18:00:32 <Keegan> #link
18:00:56 <RandomDSdevel> multichill: That's…cold, man; real cold.
18:00:58 <Keegan> Welcome everyone, I'm Keegan. I'm the Community Liaison for the WMF working on this project
18:01:17 <Keegan> We have Fabrice and Lydia here, product managers with the WMF and WMDE respectively
18:01:25 <multichill> RandomDSdevel: Haha /me <whatever>
18:01:33 <Keegan> And gi11es as well, who's a senior software engineer at the WMF
18:01:38 <Keegan> plus others
18:01:40 * gi11es waves
18:01:52 <Steinsplitter> :)
18:01:55 <fabriceflorin> Hi everyone, good to meet you again :)
18:01:56 <Keegan> So what would we like to talk about related to structured data?
18:02:07 <Keegan> We had a meeting last week in Berlin, is anyone interested in hearing about that?
18:02:11 <RandomDSdevel> multichill: As Jar Jar would say, "How wude."
18:02:14 <PKM_> Yes
18:02:15 <Harmonia_Amanda> yes
18:02:26 <Josve05a> Sure!
18:02:37 * bawolff notes, shouldn't the topic indicate that this meeting is logged?
18:02:47 <QueenOfFrance> It should.
18:02:50 <Romaine> ''The force is with us''
18:02:51 <Keegan> bawolff: it should, bugger topic
18:02:53 <Keegan> fixing
18:02:55 <Josve05a> I noticed that as well
18:03:01 <Keegan> Someone talk about last week while I do so!
18:03:06 <Josve05a> (me to shy to day something)
18:03:07 <Keegan> Thanks multichill
18:03:11 <fabriceflorin> Hi guys, we had a first bootcamp about structured data last week in Berlin to discuss this project and explore possible solutions. Here's an overview of what was discussed and accomplished:
18:03:11 <Steinsplitter> :)
18:03:11 <fabriceflorin>
18:03:37 <Keegan> So instead of reading about it, tell 'em about it :)
18:03:55 <fabriceflorin> Some good ideas came out from this event, but many questions remain unanswered. We would now like to invite more community participation to help plan next steps for this project: hence this meeting.
18:04:02 * basile waves
18:04:22 <Keegan> multichill was in attendance, I'd like to hear his thoughts as a community participant eventually :)
18:04:27 <Keegan> Thanks James_F
18:04:58 <RandomDSdevel> Yeah; what all happened?
18:05:03 <bawolff> So umm, the page says documented at, but doesn't really contain any documentation :s
18:05:25 <RandomDSdevel> bawolff: I noticed that, too…
18:05:28 <Eloquence> has the most technical detail
18:05:56 <Lydia_WMDE> so one of the important points we discussed is the painpoints we currently have without structured data:
18:06:15 <Jheald> In the etherpad for the meeting last week, the Wikidata team committed to spending 50% on their time on Structured Data. My question is: can they spare these resources, rather than using them on eg (a) use tracking (b) arbitrary access and (c) better integration with clientwiki-based anti-vandal tools, which are all key blockers for deployment ?
18:06:15 <Lydia_WMDE> it'd be awesome if you could have a look at those and see if anything is missing or if you disagree with anything there
18:06:16 <fabriceflorin> bawolff: Our preliminary documentation is being added on this page linked above by eloquence. Also, these slides can give an overview of what we discussed:
18:06:19 <RandomDSdevel> Enough with the links! Jumping around web pages is annoying!
18:06:30 <RandomDSdevel> Describe what's there, already!
18:06:43 <bawolff> fabriceflorin: So should I change the link to point to the /Development subpage
18:06:48 <marktraceur> RandomDSdevel: text/html, mostly. :)
18:07:04 <fabriceflorin> bawolff: That would be wonderful, thanks so much!
18:07:20 <Eloquence> in a nutshell, this is about moving from lightly structured wikitext to using the wikidata backend to provide query-able metadata for files
18:07:32 <Lydia_WMDE> Jheald: i consider things like arbitrary access included in those 50%
18:07:35 <Lydia_WMDE> at least partially
18:07:38 <RandomDSdevel> marktracuer: But what does it say? What about the structured data proposal does it describe now that wasn't there when I last looked?
18:08:00 <multichill> An arbitrary percentage of 50% I presume Lydia_WMDE? ;-)
18:08:07 <fabriceflorin> RandomDSdevel: Today, information about media files on Wikimedia sites is stored in unstructured formats that cause a range of issues: for example, file information is hard to search, some of it is only available in English, and it is difficult to edit or re-use files to comply with their license terms.
18:08:14 <Lydia_WMDE> multichill: best effort guess :D
18:08:29 <Eloquence> RandomDSdevel, we fleshed out a lot of details of how the integration could actually work
18:08:40 <Eloquence> we started modeling out some real world files using wikidata properties that we created on
18:08:52 <RandomDSdevel> Eloquence: What are they real quick?
18:08:55 <Eloquence> you can see for example
18:08:55 <fabriceflorin> RandomDSdevel: the focus of this project to investigate how to structure data on Wikimedia Commons, reusing the same technology as the one developed for Wikidata.
18:09:07 <Keegan> Yes yes, there are so many modeling possibilities
18:09:10 <RexxS> so if I were to include a commons file on en-wp, it would allow me to read in useful data about that media file?
18:09:20 <multichill> :-)
18:09:34 <Eloquence> RandomDSdevel, for example, how do we model that a photograph of a painting or sculpture has both a photographer and an artist
18:09:40 <Keegan> RexxS: Eventually, yes
18:09:53 <Jheald> Lydia: it would be useful to have progress thoughts about (a)(b)(c) above -- sometime, not necessarily now
18:10:01 <fabriceflorin> RexxS: Eventually, the structured data would be available on enwiki as well. For now, we’re focusing on Commons for a first implementation, but would love to hear from community members about this approach.
18:10:03 <Romaine> there are more fields in the file pages with difficulties
18:10:14 <RandomDSdevel> OK, so we're going to use Wikidata. I assume that entails extensibility?
18:10:14 <DanielK_WMDE> RexxS: and if you wanted to use it on your WordPress blog, as well.
18:10:17 <Eloquence> Romaine, throw us some and we'll see if we've thought about them
18:10:20 <Romaine> coords, date, contributors, rights holder
18:10:40 <Keegan> Indeed, the model is being built with portability to other wikis in mind. No sense in not doing so
18:10:46 <Romaine> and one main thing is that it is a chaos in current file pages as users have had the freedom
18:10:56 <Eloquence> Romaine, shows examples of coordinates and rightsholder information
18:10:58 <Steinsplitter> (Lydia_WMDE: i found a verry usefule document which ansver all my open questions :) ,
18:10:59 <Romaine> including custimized templates
18:11:07 <RexxS> well, the WP blog via an APi would be nice, but I tend to write my blogs, not use automation
18:11:14 <Lydia_WMDE> Steinsplitter: awesome! that's good to hear
18:11:15 <RandomDSdevel> @Keegan: Will the data be available for use/transclusion across wikis?
18:11:16 <bawolff> One thing that I find slightly confusing. Its unclear if the scope of structured data includes data that's functionally dependent on the image. (I don't think it should include that, but I think it should be clarified either way)
18:11:21 <Romaine> there are two types of coords: location of object and location of the camera
18:11:27 <PKM_> Re paintings, depicts with greater granularity (man in black hat and bobbin lace collar, woman in frontage and pearl necklace )
18:11:34 <Eloquence> bawolff, what does functionally dependent on the image mean?
18:11:36 <Eloquence> EXIF?
18:11:37 <Jheald> Eloquence: it seems the use cases you're modelling so far are rather simple. More complicated ones, like eg the one I emailed to the list about 5 mins before this chat, would also be interesting
18:11:42 <NotASpy> what is the general thought about using the structured data to eliminate not safe for work images from search results (for those who choose a safe search type option in their preferences) ?
18:11:43 <gi11es> Romaine: do you feel like that overall there is a convergence of templates over time or a divergence?
18:11:44 <RexxS> isn't the way it would be used to allow us to suggest captions and alt text?
18:11:45 <multichill> Romaine: For coords we already have two properties
18:11:53 <Keegan> RandomDSdevel: eventually, yes. But we have to get it working on Commons first :D
18:12:01 <fabriceflorin> Steinsplitter: Glad you found our slides useful … we worked on those collaboratively, and they were helpful in refining our thinking on this.
18:12:07 <bawolff> e.g. I heard somewhere someone suggesting storing perceptual hashes in this sort of thing
18:12:10 <Keegan> This is a very long term project, it is in the terms of years
18:12:10 <mvolz> forgive me if this is explained elsewhere, but why not have it actually *on* wikidata?
18:12:21 <Eloquence> mvolz, this was discussed extensively ...
18:12:30 * Romaine searches for convergence in dictionary
18:12:32 <Lydia_WMDE> bawolff: you mean things like exif data? we're thinking about providing a way to access them in similar ways but not store them again
18:12:38 <Lydia_WMDE> bawolff: does that make sense?
18:12:52 <bawolff> Lydia_WMDE: yes, that sounds good
18:12:54 <Steinsplitter> fabriceflorin: first i was a bit concerned, but after reading this helpful document looks OK toe me :):)
18:12:58 <multichill> mvolz: A lot of the data is out of scope for Wikidata.
18:13:05 <Eloquence> mvolz, we will likely use to pull the properties, but have the entities stored in Commons, while linking out to Wikidata items for notable works
18:13:05 <mvolz> Eloquence: I figured it would have been, but do you have any links or something?
18:13:07 <RandomDSdevel> fabriceflorin: OK, then; what kinds of cross-wiki integration do you guys have in mind? Shared content like templates, maybe?
18:13:19 <Amir1> Just a question does the multichill's link mean you want to map File:something.jpg to Q1323? Isn't it better to make it separate (a non-content namespace? in order to prevent from making a mess in Q## and obviously it's just my opinion
18:13:38 <multichill> Amir1: No, it's just an example. A plaything
18:13:47 <DanielK_WMDE> NotASpy: could be done in theory, but we don't have any plans to implement any kind of filtering based on the structured metadata.
18:13:47 <Amir1> oh okay
18:13:49 <fabriceflorin> Steinsplitter: Thanks :) Keep in mind that this is a living document, which will get updated as a result of community discussions like these … So nothing is cast in concrete, those are just first ideas for discussion purposes.
18:13:50 <DanielK_WMDE> search, yes
18:13:50 <mvolz> Also, in the cases where it is in scope, is there a clear guideline for what does where
18:13:52 <DanielK_WMDE> no filtering
18:13:53 <Lydia_WMDE> mvolz: also we want to keep the data close to the actual file
18:13:54 <Keegan> NotASpy: Your question was noted, just waiting a sec for things to slow down
18:13:54 <Eloquence> Amir1, current thinking is to not have an item number for Wikidata properties for a file, but to continue to use the filename as the unique identifier (icky but backwards compatible)
18:14:04 <mvolz> *what goes where
18:14:06 <gi11es> Romaine: basically I've heard that the commons community over time has tried to standardise those templates. given what you've just said I'm asking if you think that the situation is improving (more and more content being moved to standard templates and less people using/creating custom ones) or worsening (people keep creating custom templates faster than things get standardized)
18:14:26 <Amir1> Eloquence: Thanks
18:14:31 <Romaine> I have no good view on such
18:14:35 <Eloquence> Amir1, the Wikidata properties would be listed at the bottom of the file page initially, as shown here:
18:14:36 <bawolff> gi11es: Isn't that what meta templates are for?
18:15:04 <Romaine> but I do know that many uploads from 2005 and transfers of files from other wikis have terrible formats regurly
18:15:04 <gi11es> bawolff: I'm not sure if you're talking about templates I know of, do you have an example?
18:15:15 <Jheald> What about things that don't have an item on Wikidata, but don't relate to files on Commons in the simplest 1 <-> 1 way. What entities (and where) will information like that be stored on?
18:15:17 <Eloquence> (and by Wikidata, in this case, I mean "stored in Wikibase deployed to, properties defined in")
18:15:20 <multichill> gi11es: We might have some more custom templates, but these are usually based on standard templates so from a data point of view these are not that custom
18:15:25 <SarahStierch> I think it’s generally “ok” re: standardized templates. I don’t think it’s getting any worse. However, the variety of data is what might be an issue.
18:15:39 * aude waves
18:15:39 <DanielK_WMDE> Jheald: can you give an example?
18:15:41 <SarahStierch> Yeah, what multichill said - most of the customized templates are all the same just modified with data and style.
18:15:53 <Romaine> with the mediaviewer one complaint was that it took not take care of the customized templates, the same issue applies here
18:15:55 <bawolff> gi11es: I don't know on commons, but general idea is that people create a (meta) template that's used to create new templates, and the meta template handles all the tricky standardization stuff
18:15:57 <SarahStierch> but there is some really old crap out there…
18:16:08 <Amir1> Eloquence: you can make Wikimedia commons a Wikibase repo (just a thought) and people edit there
18:16:10 <Romaine> I hope having structured data coming that it would be more harmonized
18:16:11 <gi11es> bawolff: I see, thanks for the info, I'll look into those
18:16:25 <Lydia_WMDE> Amir1: that's the plan
18:16:28 <Eloquence> Amir1, indeed, that is the idea, with some customization to keep user-facing complexity manageable
18:16:40 <NotASpy> gi11es: and in any case, the information displayed in those templates will still need to be displayed, what I'd expect to happen is that instead of using {{cc-by-2.0}} on a file page, you'll select the licence in a drop down, the template or something similar will still be displayed
18:16:40 <Amir1> Lydia_WMDE Eloquence Coool
18:17:04 <Eloquence> Romaine, we need a high level API and canonical data model for basic file properties so third party users don't have to follow property changes
18:17:09 <gi11es> Romaine: from my point of view, structured data would make the task of harmonisation easier. and since the community has managed to standardise with templates, there's no reason it won't be able to achieve the same with a more powerful tool
18:17:15 <multichill> NotASpy: At some point in time we'll probably have a {{licence}} template that just fetches the correct licenses from Wikibase
18:17:27 <NotASpy> yeap
18:17:32 <multichill> It will be a step-by-step process
18:17:34 <Romaine> I think it is possible to archive it yes
18:17:34 <RandomDSdevel> What about when somebody wants to mention data from Wikidata/Commons on another wiki in such a way that it stays up to date?
18:17:40 <PKM_> Selecting from drop downs takes a lot longer than typing in data, especially for repetitive tasks. What steps are being taken to support editors and advanced uploaders?
18:17:47 <Jheald> DanielK_WMDE eg an underlying work, that has many images. Or an image that contains many distict contributing work-stages
18:17:47 <Eloquence> PKM_, great question
18:17:48 <DanielK_WMDE> Romaine: our goal is that templates will no longer be used as a way to store an maintain metadata. they would stay around for formatting/displaying data though. And maybe serve as a way to offer a customized editing interface (eventually, via the template editing mechanism in VE)
18:18:00 <mvolz> Eloquence: looking at those slides, to me the example used (van gogh painting) seems exactly the kind of thing that should be on wikidata
18:18:07 <Romaine> sounds fine to me
18:18:10 <Lydia_WMDE> RandomDSdevel: you mean including the data in a wikipedia article for example? we'll be working on that
18:18:15 <Romaine> that is what I would have expected as well
18:18:19 <Eloquence> PKM_, we're currently thinking of leveraging the TemplateData technology developer for VisualEditor to generate fast UIs that users can use to enter data. Templates won't disappear - they'll just pull the data from the backend.
18:18:21 <aude> RandomDSdevel:
18:18:27 <Eloquence> s/developer/developed/
18:18:34 <DanielK_WMDE> Jheald: if there are many images of a single work, i would imagine that this would make it ok to create a data item on wikidata.
18:18:38 <PKM_> Thanks
18:18:50 <DanielK_WMDE> Jheald: if there are many contributing works on commons, we can model this directly.
18:19:01 <DanielK_WMDE> if there are contributing works elsewhere... i'm not sure.
18:19:16 <RandomDSdevel> Lydia_WMDE: OK, cool! What about non-MediaWiki wikis, though? Can I ask about that here, or is that part of a separate project?
18:19:26 <DanielK_WMDE> we could use external URLs to reference such works. this is still unclear
18:19:41 <DanielK_WMDE> RandomDSdevel: non-MediaWiki, or non-Wikimedia?
18:19:54 <Amir1> DanielK_WMDE: and you can connect these items (in Wikidata and WM commons) but connecting two repos is challenging
18:20:09 <RandomDSdevel> DanielK_WMDE: Sorry; I meant the former.
18:20:09 <Jheald> DanielK_WMDE: the works I'm thinking about aren't separate files, they're separate stages in the chain that has led to the image -- with separate authors/dates/rights etc
18:20:13 <Lydia_WMDE> RandomDSdevel: in general for wikidata and related projects we also want to offer the ability to access that data in 3rd party wikis but that is further out unfortunately. other things are more pressing. but maybe we can find some student group or so to work on it sooner?
18:20:29 <multichill> mvolz: And we already have quite a few paintings on Wikidata, see for example
18:20:40 <multichill> But most of the works won't be notable enough for Wikidata
18:20:52 <RandomDSdevel> Lydia_WMDE: That would be nice. I'd help, but I don't really know how to code web pages.
18:21:05 <Krenair> RandomDSdevel,
18:21:11 <Lydia_WMDE> RandomDSdevel: that's ok :) if you know someone who'd be interested in coding on this let me know
18:21:24 <PKM_> Yes,Portrait of Man by Unknown Artist should not be in Wikidata except in rare cases.
18:21:31 <RandomDSdevel> Lydia_WMDE: OK.
18:21:38 <DanielK_WMDE> Amir1: what do you mean? Referencing wikidata items from inside commons should be pretty simple
18:21:40 <mvolz> multichill: I'm not so sure I agree with that. I think it makes sense for a wikidata item to exist for any photograph that is a faithful reproduction of a work
18:22:05 <mvolz> because that seperates the layers of metadata about the photograph
18:22:09 <mvolz> and metadata about the work
18:22:20 <DanielK_WMDE> RandomDSdevel: generally, accessing our data from other kinds of software should be simple enough. what is tricky is receiving updates when something changes.
18:22:27 <mvolz> the creator of the work and the creator of the photograph are not the same entity
18:22:32 <DanielK_WMDE> we have some ideas, but nothing definite yet
18:22:35 <Jheald> multichill: how do we describe their rights/dates/authorship of stages in the development of an image in a way that is retrievable / filterable / searchable ?
18:22:35 <fabriceflorin> Jheald: At a high level, we have been discussing a data model where one file can have multiple works, a work can have multiple contributors and multiple licenses, as shown in this slide: … but it may be possible to keep track of different versions of a work if necessary for licensing purposes.
18:22:54 <RandomDSdevel> DanielK_WMDE: Yeah, I suppose that might be somewhat tricky.
18:23:00 <Amir1> My question is: Wikimedia Commons is a client now but it'll be repo. and let's say there is an item for a painting in Wikidata and we want connect them to several items (images of that painting) in commons but commons is repo
18:23:08 <Amir1> DanielK_WMDE: ^
18:23:09 <PKM_> In some cases, the work is a photograph. Can we use some other terminology for the-thing-we-upload?
18:23:16 <Lydia_WMDE> Amir1: it will be a client and a repo
18:23:19 <multichill> PKM_: File
18:23:35 <mvolz> thanks, file :)
18:23:50 <DanielK_WMDE> Jheald: if the "stages" are not "relevant" by themselves, the contributors would just be listed. Each contributer can have wikidata-style qualifiers though, stating a role or other extra info about the contribution.
18:23:54 <Amir1> Lydia_WMDE: interesting, it's like several layers of wikibase
18:24:01 <Lydia_WMDE> kind of yes
18:24:21 <Jheald> fabriceflorin: yes, I've been keeping an eye on the development of that API. But I'm not sure how it relates to concrete items on CommonsData and/or Wikidata. Does every "work" in the API get an item? It seems not.
18:24:32 <RandomDSdevel> We're talking about having as few duplicates as possible here, right?
18:25:12 <DanielK_WMDE> Amir1: is a client to wikidata, and will then be a repo. There are some technical challanges there, but conceptually, it's simple enough, because we will not have "data items" on commons, we will have "media info".
18:25:22 <DanielK_WMDE> different types of entities live on different repos
18:25:33 <tgr> Jheald: the current plan is that a work would be either mapped to another Commons image or a Wikidata item
18:25:54 <Jheald> DanielK_WMDE: The stage are relevant (i) because they may have rights attached to them, and (ii) because they may have dates attached to them, that would need to be searchable. We need to be able to identify what the rights or dates refer to.
18:26:11 <RandomDSdevel> tar: What's the holdup on having both available simultaneously?
18:26:13 <tgr> if something does not have a file and is not notable enough to have a Wikidata item, then it's probably not important enough to track as a separate work
18:26:25 <Jheald> tgr: And what if a work doesn't fit either case?
18:26:33 <DanielK_WMDE> Jheald: yes, this can be done with qualifiers. are you fammiliar with qualifiers on wikidata?
18:26:42 <mvolz> So in wikidata we've recently introduced property manifestation of for modelling Functional Requirements for Bibliographic Records
18:26:46 <RandomDSdevel> tgr: Oh, wait; never mind, you addressed why this might not be necessary.
18:27:02 <mvolz> and the idea is anything would be notable in order to properly model those relationships
18:27:17 <Lydia_WMDE> anyone not familiar with qualifiers: <- look at the head of government part
18:27:25 <Jheald> DanielK_WMDE: Can you put the whole of what the API calls a "work" object into a qualifier ?
18:27:35 <PKM_> Is someone collecting edge-case user stories
18:27:40 <mvolz> and I worry that images and soundfiles are getting left out of that model by not having the information about them on wikidata
18:27:44 <PKM_> Or is it too early ?
18:27:53 <Lydia_WMDE> the qualifiers are for example "start date" and "end date". you can have a lot more of those
18:28:01 <DanielK_WMDE> Jheald: probably not all of it. and i do not think that would be necessary. how is that modelled now?
18:28:06 <mvolz> in terms of the work, not information about the file.
18:28:07 <Jheald> DanielK_WMDE: ie Author + Date + Nature of Contribution + Rights
18:28:09 <Lydia_WMDE> PKM_: which do you have in mind?
18:28:17 <DanielK_WMDE> Jheald: these things, yes.
18:28:32 <Eloquence> fabriceflorin, what's the canonical place for user story documentation at this point, in answer to PKM_'s question?
18:28:33 <DanielK_WMDE> Jheald: not one qualifier - one main value (author), three qualifiers
18:28:39 <gi11es> mvolz: for audio, isn't the distinction recording vs work/event?
18:28:49 <PKM_> Things like photos of fashions where you need the item, style, fabrics, trims, accessories, etc.
18:29:07 <DanielK_WMDE> Jheald: that basically modelles a contribution. for a full work, you'd need a Q-item or a separate file.
18:29:10 <Lydia_WMDE> PKM_: ah good, yeah. fabriceflorin hopefully has a link where oyu can add those
18:29:14 <Jheald> DanielK_WMDE: but if you have many dates, many authors, many rights, many contributions -- can you identify which belongs to which, if you're just hanging them on qualifiers?
18:29:32 <RandomDSdevel> How will Wikidata items be linked to Commons files?
18:29:36 <tgr> Jheald: and if the parent work also has a parent work, put that into a qualifier of a qualifier? that would get out of hand quickly
18:29:45 <DanielK_WMDE> Jheald: authors (contributors) have their contributions attached to them.
18:30:04 <fabriceflorin> Eloquence: We plan to link all subpages about user stories to this section: — we don’t have one yet, but will have one soon.
18:30:11 <Susannnaanas> I would like to be able to tap into the discussion of dividing information between the Commons Base and Wikidata based on notability. Is there or will there be documentation?
18:30:14 <PKM_> Thanks
18:30:40 <Keegan> PKM_: for now you can put them on the talk page of that /Development page that fabriceflorin just linked to :)
18:30:48 <Jheald> DanielK_WMDE: Note that by author, I'm not necessarily meaning a Wikiuser - but eg a particular artist + engraver combination for that image perhaps
18:30:53 <Lydia_WMDE> RandomDSdevel: which part of the linking do you mean? that's not clear to me
18:30:54 <Keegan> I'll organize it appropriately as they start coming in :)
18:31:24 <Jheald> Next up: Most of us prefer not to use wizards, and choose not to edit through Visual Editor. How serious a priority is it for the team to build a text-based read/write interface to the stored data ?
18:31:33 <DanielK_WMDE> Jheald: yes, i'm aware of that. Authors can be wiki users, notable people with q-items, and perhaps peopel identified simply by a url (or even a plain name).
18:32:02 <Eloquence> Jheald, this is tricky, and we talked about it a bit but not extensively
18:32:03 <DanielK_WMDE> Jheald: we may need to introduce a special data type to cover that, or have several contributor properties for the different types.
18:32:13 <Eloquence> (plain-text based editing)
18:32:14 <fabriceflorin> We have been discussing the idea of adding a new 'data section' at the end of file pages, to migrate, view and edit structured data. What do you think of this idea, as visualized in these slides?
18:32:58 <Eloquence> Jheald, do you feel a form-based editing experience where you don't encounter VE does meet the description of plain text editing?
18:33:12 <Eloquence> because that could be generated from templatedata without invoking the full VE for users who don't need it
18:33:31 <PKM_> We really need the ability to copy an item to make a new item, and then make changes as needed.
18:33:33 <RandomDSdevel> Lydia_WMDE: How complicated would a link between a file on Commons and an item on Wikidata be? Could one Wikidata entry be associated with multiple files representing different versions of the same work? And what if somebody created an entry on Wikidata for a file on Commons without knowing that one of the former already existed? Would they get an error message, or could files be related to multiple data entries?
18:33:37 <VisitorQQQ> I have one question, will it be possible to enter: "File:Mona Lisa, by Leonardo da Vinci, from C2RMF retouched.jpg" manifestation of "Q12418"?
18:34:02 <Jheald> Eloquence: I'm really thinking of whether I can set several fields in one go with a copy & paste
18:34:10 <PKM_> Me too.
18:34:13 <DanielK_WMDE> PKM_: copy & modify is worth thinking about. would not be hard to do, but we have to make sure we don't generate a ton of dupes this way.
18:34:17 <Susannnaanas> fabriceflorin: Support!
18:34:29 <Lydia_WMDE> RandomDSdevel: ahhhh ok. so there generally should not be an entry on wikidata for a file - if at all there should be an entry for the work there which can be referenced in a statement on the file on commons
18:34:37 <Lydia_WMDE> RandomDSdevel: clearer now?
18:34:46 <Eloquence> Jheald, *nod* so there are two approaches to this - making forms smater, or supporting some kind of limited plain text editing
18:35:12 <DanielK_WMDE> VisitorQQQ: it will be possible to represent that. Not to enter it as a text. Making a system intelligent enough to understand all possible sentences of that kind, in a ton of languages, would be extremely hard.
18:35:20 <Eloquence> I'd be inclined towards the former because validating any kind of plaintext format would be very hard
18:35:25 <RandomDSdevel> Lydia_WMDE: OK…but would the Wikidata entry show the files from which it has been linked to?
18:35:45 <fabriceflorin> Susannnaanas: Yes, good support will be key for this project. The current plan is to prototype this project on a separate beta site for many months, so we all have a chance to test these tools extensively, and refine both the workflow and the documentation for end-users.
18:36:01 <Lydia_WMDE> RandomDSdevel: something for the todo list - had not considered this yet
18:36:08 <Romaine> are categories from the file pages to be added to the structured data?
18:36:17 <Eloquence> Romaine, categories are a fun topic :)
18:36:22 <RandomDSdevel> fabriceflorin: If you mess with the text editor, make sure that you don't break wilEd in the process, 'kay?
18:36:38 <Lydia_WMDE> RandomDSdevel: wilEd?
18:36:48 <mvolz> Here's an example of a wikidata entry for a work, which has a file on commons:
18:36:49 <VisitorQQQ> DanielK_WMDE, what about the opposite? Will the property p18 (image) have some effect on Commons or will that connection be done with bots?
18:36:54 <Eloquence> All information that is represented in categories should be ideally represented as Wikidata properties, but that doesn't mean categories have to go away
18:37:07 <PKM_> Oh please make them go away
18:37:15 <RandomDSdevel> Lydia_WMDE: It's a gadget by…um, his name starts with a 'C,' I think?
18:37:17 <fabriceflorin> Susannnaanas: I’m sorry, I may have misunderstood you. Did you just say that you support the idea of a ‘data section’ at the bottom of the file page?
18:37:19 <Eloquence> We've discussed retaining the category system as a shortcut to add a lot of properties in one go
18:37:30 <Lydia_WMDE> RandomDSdevel: ok thx
18:37:43 <RandomDSdevel> Lydia_WMDE: You're welcome.
18:37:47 <PKM_> Eloquence: that masks sense. Parse the categories...
18:37:50 <Romaine> Eloquence: categories are the basic way of finding images on Commons
18:37:54 <Susannnaanas> fabriceflori: Yes, but supporting generally is a good idea as well :)
18:38:00 <Eloquence> [[Category:Churches in Rome]] would add properties identifying it as a photograph of a church and as location: Rome
18:38:00 <multichill> We talked about a smooth transition from categories to Wikibase statements quite a lot, no easy solution
18:38:01 <DanielK_WMDE> VisitorQQQ: i could imagine turning that into an actual image usage, tracked like we track image usage on wikipedias. but there are no concrete plans yet.
18:38:19 <Eloquence> Romaine, yes. there's no need to remove them anytime soon, as far as I can tell.
18:38:30 <Eloquence> this can play out organically. if the new system offers greater benefits eventually they can be phased out.
18:38:45 <Jheald> In terms of UI, I saw a slide with tabs across the top. Is this still a live idea? It doesn't seem a good one to me: eg we want to make sure readers see the licensing, and credit information, it can't be hidden behind a tab they don't look at
18:38:48 <fabriceflorin> romaine: Here is a first visualization of how categories and topics could co-exist nicely in our user interfaces:
18:38:49 <Eloquence> the only tricky part is managing synchronization, which will probably require some bot work.
18:39:13 <Romaine> an alternative system can exist, but I think categories will stay needed
18:39:18 <Keegan> Jheald: it's not a live idea anymore, the tabs
18:39:24 <Eloquence> note that Wikidata items for categories can be used to create multilingual labels for them, so they don't have to stay English-only.
18:39:35 <Romaine> maybe categories can be organized in a more dynamic way
18:39:38 <PKM_> Eloquence: great
18:39:45 <Jheald> In my opinion categories probably need their own structured data -- to document Wikidata items that they relate to.
18:39:47 <Keegan> (these are all just ideas at this point)
18:40:06 <RandomDSdevel> Maybe categories and properties could be linked together in some way in order to ease the transition?
18:40:22 <Jheald> Keegan: good
18:40:31 <Eloquence> RandomDSdevel, yeah, that's what we just described earlier -- you'd have a wikidata item for a cat, which describes the properties that it implies
18:40:39 <Eloquence> so you can then use the category to "insta-add" properties
18:40:53 <PKM_> Eloquence: like that a lot, actually.
18:40:54 <Keegan> well as find it in a language that is not English!
18:40:54 <Eloquence> there are some existing properties on for that, multichill may have a link handy
18:40:56 <fabriceflorin> Jheald: We will want to explore a number of possible user interface solutions to make it easy for folks to migrate, view or edit structured data. The tabs are just one possible method for doing this, there are others, which can be explored through user studies, iteratively.
18:41:00 <RandomDSdevel> Eloquence: That's good, but what about Jheald's concerns?
18:41:08 <Lydia_WMDE> <- one category with more information that we can use intelligently
18:41:08 <gi11es> Romaine: I expect that the part of categories that will survive the longest is the way they're used for people's workflows. when it comes to topics (what's depicted by the file), I think that structured data will be superior to the point of people not bothering to use categories for that anymore. but let's see how it pans out, there's definitely no urge on our end to touch categories
18:41:18 <Jheald> Eloquence: you should also be able to define properties for a category, to insta-add pictures to the category
18:41:29 <Lydia_WMDE> Jheald:
18:41:31 <Lydia_WMDE> :)
18:41:45 <PKM_> Can it work both ways, add cats based in on properties?
18:41:56 <multichill> Eloquence: ?
18:41:58 <Eloquence> that's probably easier to do via bots for now
18:42:16 <bawolff> Perhaps I missed it, but is there a description on how categories and "topics" are different
18:42:16 <Eloquence> I expect that there'll be a lot of interesting problems to hack on for intrepid bot writers in all of this ;-)
18:42:20 <gi11es> and if any migration of information from categories to structured data (and back?) happens, it'll be done by the community
18:42:26 <DanielK_WMDE> Jheald: many categroies are already described on wikidata
18:42:27 <bawolff> Or what a "topic" concretely is?
18:43:18 <fabriceflorin> Susannnaanas: Glad you support both the ‘data section’ idea AND more ‘support’ overall :)
18:43:18 <PKM_> You've mentioned tagging files. Will tags be properties or something different?
18:43:28 <Lydia_WMDE> bawolff: ok so a "topic" is something like "Berlin" in "picture taken in:Berlin". they are translatable via wikidata and more information about them can be found via wikidata
18:43:37 <Lydia_WMDE> bawolff: they are connections to wikidata items basically
18:43:45 <Eloquence> PKM_, properties, but we've discussed having a catch-all property that is exposed through UIs and users can then specialize if they want to
18:43:46 <multichill> PKM_: Property -> Topic
18:44:12 <Eloquence> so e.g. if you upload through upload wizard, you'd add a generic "topic" or "about" property, which other users can then specialize if needed
18:44:12 <Jheald> DanielK_WMDE: I know, I ran the stats, The overwhelming number aren't. And the big question is: do we *want* every cat to have a Q-number on wikidata?
18:44:25 <VisitorQQQ> Lydia_WMDE, I think this item should be represented as a query
18:44:31 <Lydia_WMDE> bawolff: so they are different from categories in that they are not free-text and that they are translatable and that you can find more data about them
18:44:39 <DanielK_WMDE> PKM_: to avoid confusion: "depicts" would be a property, "church" would be a topic that could be assigned to that property in order to "tag" the image.
18:44:41 <bawolff> So the presumption is that topics will be superior to categories, and thus preferred by users, due to their translatability?
18:44:52 <fabriceflorin> PKM_: The idea is that you could attach topics or categories as qualifiers for the ‘About’ property about a file, as visualized in this mockup for a future ‘data editor’:
18:44:55 <Lydia_WMDE> VisitorQQQ: in the end yes but this will help us enourmously in the transition
18:44:57 <PKM_> So "16th century oil on panel paintings in the United Kingdom" is a category that says many things about a file.
18:44:58 <gi11es> and they can be specialised, it doesn't have to be just a generic "tag" or "topic"
18:45:09 <Romaine> Jheald: the difficulty is that there is still no solution for how to connect Commons in the two ways that are possible
18:45:19 <bawolff> Or are there other properties that make them special?
18:45:23 <DanielK_WMDE> Jheald: if that would be useful for something, i don't see why not.
18:45:32 <Lydia_WMDE> bawolff: in the long run yes
18:45:32 <Romaine> Category:Netherlands and article Netherlands bot have a relation with the same category on Commons
18:45:33 <Eloquence> bawolff, categories are very difficult to use because of their hierarchy and highly specialized nature.
18:45:43 <fabriceflorin> What’s wonderful about supporting categories as part of structured data is that they contain ‘bundles of topics’, which makes them very practical for adding more qualifiers about a file.
18:45:57 <Eloquence> most new users cannot categorize files correctly, and even experienced users often get it wrong.
18:46:17 <bawolff> Eloquence: That's a social construct though, what's to stop the users from doing the same thing with topics
18:46:26 <Romaine> fabriceflorin: sounds interesting
18:46:34 <Eloquence> bawolff, autocompletion :-)
18:46:36 <VisitorQQQ> Eloquence, categories are not that difficult, they are just queries, but with a hierarchy
18:46:49 <Sannita> have you already worked out how to link the topic an image represent to the related item on wikidata? (sorry, I don't know if this question has already been asked, I was late and I tried to catch up)
18:46:54 <bawolff> So the killer feature is hotcat :P
18:47:04 <RandomDSdevel> Eloqunce: I agree with VisitorQQQ on this one.
18:47:12 <Eloquence> bawolff, more like the Wikidata search box
18:47:18 <Sannita> i.e. Image represents X (link to Qxxx)
18:47:19 <Jheald> DanielK_WMDE: we're not worried about adding 3.5 million solely wiki-specific new items to Wikidata ?
18:47:28 <fabriceflorin> But it’s also wonderful to be able to add single topics about a file, because that will make search a lot more effective: we could then search using simple keywords like ‘cat’ ‘roof’, etc. — without having to know category names.
18:47:29 <Romaine> Eloquence: when it is possible to upload files directly from articles on Wikipedia with categories automatically added, would help much
18:47:31 <gi11es> Sannita: yes, most properties hosted on commons that we're talking about can just link to wikidata items
18:47:50 <DanielK_WMDE> Jheald: if they are so specific, then it's probably not a good idea. and i can't speak for thew wikidata community.
18:47:56 <Sannita> gi11es: ok, but have you figured out how? (a bit interested in that)
18:48:01 <PKM_> The thing with categories is, how do you know which ones exist? Painted portraits or portrait paintings?
18:48:15 <Eloquence> Romaine, interesting idea - we should consider that when we do deeper editing integration of the upload workflow
18:48:21 <gi11es> Sannita: from a technical perspective?
18:48:25 <Sannita> yep
18:48:27 <DanielK_WMDE> Jheald: if there is a good reason to have some categroies on wikidata, then we should. but not just for the sake of it
18:48:41 <RandomDSdevel> PKM_: Wouldn't those be candidates for a merger anyway?
18:48:41 <fabriceflorin> So being able to support both categories and topics seems like the best of both worlds.
18:48:53 <PKM_> RandomDSdevel: yep.
18:49:12 <VisitorQQQ> PKM_, they are user defined, if users agree that it is needed, then it is created
18:49:12 <Romaine> we now say to people: go to the article and click upload, the next development we like to have is that the files is directly added to the categories associated with that article
18:49:15 <Keegan> I'll hotcat one way and you autofill the other :)
18:49:22 <gi11es> Sannita: I think the idea is that the wikibase repo on commons will be aware of the syntax of wikidata identifiers
18:49:27 <Jheald> DanieldK_WMDE: If they're so specific, it's useful to have *somewhere* to record (in terms of Ps and Qs) what it is that they specify
18:49:40 <gi11es> Sannita: is there a particular technical challenge that you are expecting?
18:49:52 <DanielK_WMDE> Romaine: in-context uploads would be nice (but could also lead to a flood of crap).
18:50:07 <Romaine> fabriceflorin: I agree, having both would gove it the dynamics and the stability of categories and of tags
18:50:18 <Sannita> gi11es: it's rather long, don't want to get in the way of the office hour, talk about it later ;)
18:50:19 <DanielK_WMDE> Romaine: having an actual api for structured metadata would make that easier, but it's a project of it's own, in my oppinioon.
18:50:27 <VisitorQQQ> Jheald, I think at some point some DB integration will be needed to be able to answer queries like "all works by author X", if the works only have an item in Commons or in Wikidata, it won't work
18:50:39 <Romaine> DanielK_WMDE: a link under "Upload file" with "Upload file about this topic" would be nice I think
18:50:43 <Romaine> in the sidebar
18:50:45 <PKM_> I see a huge documentation and data mapping opportunity
18:50:46 <Keegan> (ten minutes left)
18:50:54 <gi11es> Sannita: sure thing, just send an email the wikidata and/or multimedia mailing lists
18:51:17 <Lydia_WMDE> Sannita: let's talk after the office hour about that
18:51:18 <Eloquence> DanielK_WMDE, we should think more about up-leveling the architecture for categories -- we will either have a lightly coupled system with bots doing sync work, or we can try to support/represent categories better using wikidata items natively
18:51:43 <DanielK_WMDE> Jheald: yes, i agree that this kind of mapping is useful in general. whether it means we should have q-items for *all* categories i can't say off-hand.
18:51:44 <fabriceflorin> Romaine: Glad you like the idea of supporting both categories and topics. It does seem to give us the most flexibility. In fact, one of the ways folks could contribute to this effort in coming months would be to start adding important categories as items on Wikidata, while we are busy working on some of the harder technical issues.
18:51:48 <Sannita> Lydia_WMDE: +1
18:51:53 <Keegan> (though I note there is not another meeting scheduled in here today, so conversation doesn't have to necessarily end
18:51:54 <Keegan> )
18:52:05 <Romaine> fabriceflorin: it gives both flexibility and stability
18:52:16 <RandomDSdevel> fabriceflorin: I can help with that!
18:52:24 <Romaine> categories provide the navigational structure on Commons
18:52:46 <DanielK_WMDE> Eloquence: lots of potential there. but Jheald is right that we have to manage the mapping somewhere. probably on wikidata.
18:52:47 <multichill> We're already connecting topic and category items on Wikidata. We can extend on that system
18:53:01 <fabriceflorin> Romaine: I agree with you on both points. (and I owe you an email response too, coming soon :)
18:53:14 <Romaine> :)
18:53:26 <PKM_> Is there a visual representation of the Wikidata structure somewhere?
18:53:34 <Keegan> For the non-tech folk like me, there's also this project starting up:
18:53:42 <multichill> See for example
18:53:52 <Lydia_WMDE> PKM_: you mean the data model?
18:53:55 <Lydia_WMDE> or something else?
18:54:00 <PKM_> Dara model
18:54:00 <Keegan> An organized and systemic approach to adding or fixing machine-readable data across all Wikimedia projects
18:54:05 <Jheald> DanielK_WMDE: I just worry about filling up Wikidata with too much stuff that has not so much real-world relevance
18:54:06 <Lydia_WMDE> PKM_: yes - one sec
18:54:14 <Keegan> *systematic
18:54:19 <Keegan> Not systemic.
18:54:24 <PKM_> Keegan: I'm in
18:54:40 <Keegan> Excellent
18:54:46 <DanielK_WMDE> Jheald: commons is rea-world. wikidata already modells millions of wikipedia categories, disambiguation pages, policy pages, help pages, etc... i wouldn't worry too much
18:54:46 <RandomDSdevel> @Keegan: You 'ninja'd me!
18:54:47 <Lydia_WMDE> PKM_: has which is the most important stuff
18:55:01 <PKM_> Lydia_WMDE: thank you!
18:55:06 <Lydia_WMDE> you're welcome
18:55:10 <Romaine> Keegan: sounds interesting but difficult to find all the issues
18:55:26 <Romaine> I have done some harmonisation with the date fields etc on Commons
18:55:30 <RandomDSdevel> DanielK_WMDE: I think Jheald is worried about server space.
18:55:36 <Keegan> Romaine: Guillaume is building an awesome tool on labs that will be up and running very soon
18:55:55 <Jheald> One problem with Wikidata, is that bots limited to 12 edits/minute it can take months to add millions of properties
18:55:57 <bawolff> Disk space is cheap :)
18:55:58 <Romaine> interesting
18:56:00 <Keegan> It finds the files and helps fix them, like the add-information gadget does on commons
18:56:27 <DanielK_WMDE> RandomDSdevel: raw storage space is not an issue. scaling some of the database indexes might be once we go beyond 100 million items. that'S still a long way off.
18:56:28 <Lydia_WMDE> Jheald: the development of the software will also take months though
18:56:39 <bawolff> And also insignificant compared to the size of files that get uploaded to commons
18:56:44 <RandomDSdevel> bawolff: On the order of…? And that's in bulk, right?
18:56:59 <Jheald> RandomDSdevel: I'm more worried about the size of dumps, and the ability for people to use clones of the WD database to do cool things, with getting all the wikicruft as well
18:57:10 <DanielK_WMDE> Jheald: going slow is good. there is no need to rush. the old stuff is not going to break.
18:57:21 <DanielK_WMDE> Jheald: goign slow means people can check, backtrack, improve, etc
18:57:26 <fabriceflorin> One of the things we will want to firm up soon is the data model for some of the core properties used by the multimedia tools we maintain (e.g. Upload Wizard). So if anyone is interested in working with us on the data model, that would be really helpful. Perhaps we could even form a little workgroup, if enough folks are interested.
18:57:43 <Lydia_WMDE> Jheald: if things are marked as the right thing it is easy to exclude them from dumps for example
18:57:50 <Lydia_WMDE> like "exclude all templates"
18:58:03 <DanielK_WMDE> Jheald: we may need the ability to filter the dumps by some kind of item-type. e.g. "exclude wikimedia internal stuff".
18:58:09 <DanielK_WMDE> i like that idea
18:58:12 <bawolff> RandomDSdevel: I don't know specifics, its just really not a sort of thing I'd worry about
18:58:17 <RandomDSdevel> Jheald: By the time this is all done, we probably won't need clones, right? We are working on cross-wiki data transfer, are we not? That would relegate data capture to some API or something…
18:58:41 <multichill> DanielK_WMDE: All our internal junk is instance of "Wikimedia Category Page" "Wikimedia ...." etc
18:58:45 <Keegan> Time to start wrapping up!
18:58:55 <DanielK_WMDE> multichill: yea. filtering dumps based on that would be ncie
18:58:58 <Keegan> The hour will officially end in a couple of minutes
18:59:04 <Jheald> RandomDSdevel: I'm not talking about us. I'm talking about AI startups, Skynet, etc
18:59:06 <Romaine> DanielK_WMDE: indeed, slow is fine, the backlog to get fields currently in templates ready for structured data will take a long time
18:59:33 <Keegan> Like I said, this channel is not booked for the next hour so continued chatter is fine, but the official part will end and some will be going back to work :)
18:59:34 <Lydia_WMDE> any last burning questions?
18:59:37 <PKM_> Can someone add the important initiatives to the project page so people can sign up to participate?
19:00:07 <Jheald> Fabrice: Need data model for API. But also data model at a file / item level -- much more I think still to do on that
19:00:10 <RandomDSdevel> Jheald: But we could eventually just provide an API for data access instead of letting external, non-wiki sites make clones of the databases, right?
19:00:11 <Lydia_WMDE> PKM_: cleanup is already linked on the main page. we'll be adding more there as they come up
19:00:16 <Keegan> PKM_: Yes, we will be populating
19:00:18 <Lydia_WMDE> PKM_: and you can subscribe to the newsletter linked there
19:00:25 <PKM_> Lydia_WMDE: cool.
19:00:27 <Keegan> Okay, that's all folks!
19:00:30 <Keegan> #endmeeting