Grants talk:PEG/Europeana/GLAMwiki Toolset

From Meta, a Wikimedia project coordination wiki

Draft Status[edit]

This request was lasted edited in December 2013. As a matter of housekeeping, if there is no response to this comment by 14 March 2014, we will consider the request withdrawn and change the status accordingly. If you wish to reopen the draft, you can always change the status and resubmit it. Thanks. Alex Wang (WMF) (talk) 19:00, 11 March 2014 (UTC)[reply]

As the Wikimedia-liaison coordinator for Europeana, I am reactivating this grant, and moving it to the "drafts" section of Grants:PEG/Requests while it is being developed. Ping Alex Wang (WMF) and Asaf Bartov (WMF Grants). Wittylama (talk) 14:12, 22 December 2014 (UTC)[reply]
FYI Alex Wang (WMF) and Asaf Bartov (WMF Grants) - we (Europeana) intend to finalise and submit this grant request before the end of January 2015. ok? Wittylama (talk) 09:40, 15 January 2015 (UTC)[reply]
Hi Wittylama. Thank you for the notice. Let us know if you'd like to discuss the draft any time before then. Cheers, Alex Wang (WMF) (talk) 18:15, 15 January 2015 (UTC)[reply]
Yes Alex, that would be most helpful. I've spoken about this grant at length with Asaf at Wikimania London, and on-and-off with him and many people over the months since. Would Monday be possible (skype, google hangout...) - 10pm my time is 1pm your time. Wittylama (talk) 23:33, 15 January 2015 (UTC)[reply]
Hi Wittylama. I'll send you an email so we can coordinate the best time to talk. Alex Wang (WMF) (talk) 23:37, 15 January 2015 (UTC)[reply]

Question[edit]

"Currently, the primary users of the tool are people already familiar with Wikimedia Commons and have the technical capability for mass-upload using their own scripts." Are these two separate groups or are they one and the same? That is, are there any people using the tool who are NOT familiar with/using Commons? Whiteghost.ink (talk) 22:43, 12 February 2015 (UTC)[reply]

    • While there are SOME people who have used the tool who are NOT already familiar with Commons, this is a small minority. The sentence was meant to indicate that the two groups you saw are one-and-the-same. I've changed to "Currently, the primary users of the tool are those people already familiar with Wikimedia Commons who have the technical capability for mass-upload using their own scripts." I hope that makes more sense :-) Wittylama (talk) 15:07, 13 February 2015 (UTC)[reply]

CSV[edit]

Comma separation is a great technology, but how would this interact with our use of commas in categories and sometimes descriptions? Jonathan Cardy (WMUK) (talk) 15:27, 13 February 2015 (UTC)[reply]

csv is actually a terrible format in my opinion (escaping rules are weird, different people disagree on the rules. I like tsv much better) but there are standard ways to escape commas in csv files, so that sort of thing is usually not too much of an issue. Bawolff (talk) 15:52, 13 February 2015 (UTC)[reply]
Jonathan Cardy (WMUK), if I understand your point correctly then I think you're confusing the input for the output. The "accept CSV" element of the grant is about what kind of document containing all the file's metadata will be accepted by the system - which is then converted during the metadata-mapping process. Currently the only thing the system accepts is a file in the XML format. This is not about adding commas into Wikimedia Commons at the end of the process. Does that help?
Bawolff - If the database that you're exporting from is worth it's salt then it should be able to export in CSV or TSV at your discretion - and should do so either way in a consistent manner. There is always the intermediate step of manually mapping the metadata you upload (as XML, JSON, CSV...) to the relevant Commons template fields - so it's never going to automatically put the metadata in the wrong place. However, I don't know how that would work if there were commas in the originating metadata information itself (e.g. in the filename). That's something that would need more technical knowledge - which by your question I'm guessing you have, and by my answer, I'm sure that I don't! :-) Wittylama (talk) 19:40, 13 February 2015 (UTC)[reply]
there's an example of how commas in csv work at w:Comma-separated_values#Example. But basically if the field is in quote marks than the comma is not considered a delimiter -e.g. foo,"bar, baz" is two fields not three. Bawolff (talk) 19:45, 14 February 2015 (UTC)[reply]

I would argue for using JSON instead of CSV. Beyond what I outlined below, JSON is a well-defined format while CSV has lots of flavors which cause incompatibility problems. Some tools assume ISO-8859-1 character encoding, some assume UTF-8, some look for a BOM to autodetect UTF-8-ness. Some break if there is a BOM. Some use \" to escape quotes, some use "". Despite the name, some applications actually use semicolon as a field separator. Sometimes the same application might alternate between the one or the other, depending on region settings (Excel is infamous of doing that). --Tgr (WMF) (talk) 21:49, 16 February 2015 (UTC)[reply]

Future of flat inputs in a Wikidata-based world[edit]

Right now GWT maps to MediaWiki template key-value lists, which is essentially a flat format. GWT only understanding flat inputs is an inconvenience as it puts the burden of flattening on the user, but it does not limit its expressive power. Wikidata, though, has a tree format (statements can have qualifiers which are theselves statements, or refer to other data items which contain a list of statements) and some image metadata is inherently hierarchical (e.g. artwork -> derived artwork -> physical manifestation), so image metadata will probably take advantage of that tree structure. This means that metadata mapping will be the mapping of one tree to another tree; if GWT can only handle flat inputs, that will be a serious limitation. So the goal "Support non-flat metadata formats" seems like a dependency of "Prepare for structured metadata" to me.

For the same reason, you might want to consider making JSON the main goal and CSV the stretch goal in "Accept CSV dataformat for input". CSV is a flat format, JSON is a hierarchic one; it is easy to convert CSV to JSON with existing tools, while converting JSON to CSV is not really possible. --Tgr (WMF) (talk) 21:32, 16 February 2015 (UTC)[reply]

Maintenance and longevity[edit]

Hi :) My main concern with this grant comes from the simple fact of having a grant, i.e. financing a software with a single money-input. I really believe there is a need for someone's job description to have "correct bugs of GLAMwiki Toolset" and for this job to be as perennial as Wikimedia Commons and GLAM parternships. The fact that, even with this grant, "my" bug, (which is critical for me since I can't use the GWT) will not be corrected yet. If we want the GWT to be a serious software and widely used (and I want that), the wikimedia movement really needs to put more money on the table than this. Léna (talk) 16:46, 18 February 2015 (UTC)[reply]

Just as a note, the bug has now been resolved. It appears that the issue was with WMF's beta cluster infrastructure, and not the GWToolset per se. However the fact that it took this long to be looked into basically proves Lena's point. Bawolff (talk) 20:37, 21 February 2015 (UTC)[reply]

Withdrawn[edit]

As per the announcement I just made, this grant is now withdrawn. As mentioned in that email, we will return with a grant request for a proof of concept for the GWT as an independent system. Ping Alex Wang (WMF), Siko (WMF), and Asaf Bartov (WMF Grants). Wittylama (talk) 12:09, 24 February 2015 (UTC)[reply]

Thank you. We look forward to it. Asaf (WMF) (talk) 17:28, 25 February 2015 (UTC)[reply]

Following this, to complete the circle, here's the link to the subsequent announcement that we're pulling out of development all together. Wittylama (talk) 14:56, 16 April 2015 (UTC)[reply]

Thanks. It's good we have the full documentation on this on-wiki. Asaf (WMF) (talk) 19:42, 16 April 2015 (UTC)[reply]