User talk:Yurik/Storing data

From Meta, a Wikimedia project coordination wiki

Data scope, and new hosting wiki[edit]

First, what kind of data are we talking about here? Simple locations are already on Wikidata, and geoshapes are planned, and will allow for useful qualifiers like start/end date for shapes. A lot of the content that would be used to fill in a graph is likewise already on Wikidata. Is this about more general agglomerations of existing data that are more presentation-relevant than content-relevant?

Re a new domain for hosting this data, if I recall correctly, there were some plans to create a new central wiki for global gadgets, Lua modules, and templates. Perhaps such a central wiki could also host this kind of data. --Yair rand (talk) 00:04, 18 February 2016 (UTC)[reply]

Yair rand, for maps, it will be great to have geoshapes in Wikidata, but there are a few issues: geoshape relates to a place, but to draw a map, you need multiple ones. For example, you might need all countries, or all states, or districts, etc. Getting 200 country shapes one by one is bad, so we might have to build a service to combine the result of a sparql query, and to re-encode it as topojson to optimize the download (topojson can reuse same verticies for multiple objects). All this is doable, and in my view, even preferable to storing topojsons as blobs. P.S. I would highly recommend using topojson for retrieval and possibly storage - much more compact than geojson, can be easily converted both ways.
For data, the issue is even harder. There is tons of arbitrary data, both treelike and flat, that is needed for graphs and tables. Basically every piece of data that's currently sits in a table in the wikis could be moved to a storage like that and shared across all wikis, used in Lua to generate tables, and for Graphs, across multiple languages. For data examples, see graph demos - all of them would need it.
Lastly, new domain seems bad to share both the code for wiki and data. They would have very different problems and need very different communities. Code will need to be checked by devs (that includes shared templates too), and data will mostly be subject to legal checks, e.g. are we allowed to host it, etc. So not sure sharing would be good here.--Yurik (talk) 02:02, 18 February 2016 (UTC)[reply]
Working directly with Vega and data blobs is complicated. Tbh, having those curated in a dev-heavy environment might not be a bad idea. This might no longer make sense if we get some tools to greatly simplify editing data blobs, though.
In any case, to figure out whether these would fit on the same wiki, it's important to figure out how the editing dynamics and curation surrounding any of them in particular would work. Presumably, the editing of data blobs, maps, and templates need to be firmly integrated into the "client wikis". This is actual content. If the blobs aren't editable directly from the projects using them, they won't get centralized in the first place. (There will need to be at least an option to have these on client wikis in any case, for various reasons.) Templates, Lua modules, gadgets, some maps, and graphs will need localization, but I don't really know whether data blobs will. Data blobs and some templates might both need help of some bots. Lastly, maps and data blobs will probably also require a community specifically dedicated to them.
Data blobs and maps share some parallels with Commons media, and curating these will probably involve some similar policies. However, diluting Commons' scope with non-media files may be harmful to the functioning of both.
How about an out-of-the-box option: Beta Wikiversity. The wiki seems to currently function as an incubator for new Wikiversity projects, and as a central hub for Wikiversities. Wikiversity, which is dedicated to "learning resources", might be just what's needed for graphs, maps, and similar pieces of content. Obviously, the project itself would have to approve this, but it seems much more within its mission than any of the other options here. (I'm also optimistic that this option might result is some much-needed development support for the neglected project. :) ) --Yair rand (talk) 19:18, 24 February 2016 (UTC)[reply]
Yair rand, Interesting thought! Could you engage Wikiversity folks and see what they think? --Yurik (talk) 22:50, 24 February 2016 (UTC)[reply]
@Yurik: Posted here. Also notified wikiversity-l and the English Wikiversity colloquium. --Yair rand (talk) 17:12, 25 February 2016 (UTC)[reply]
@Yair rand: thanks, i posted a link to a related doc there. Also, milimetric (WMF) has expressed his desire to lead this effort, so connecting everyone :) --17:37, 25 February 2016 (UTC) — The preceding unsigned comment was added by Yurik (talk) [reply]
(I think that ping might not have gone through due to sig typo. Re-pinging @Milimetric (WMF):, just in case.) --Yair rand (talk) 17:53, 25 February 2016 (UTC)[reply]
I got the second ping :) I am definitely going to work on this, but I have to finish some immediate commitments and incorporate that work in a way that will give the community veto power. Just due to the total lack of structure that we're about to discover we're in as an organization. But yeah, I'm stoked about this, and will come back by this page at the latest when this quarter's work is done. Milimetric (WMF) (talk)

commons[edit]

Evad37 when you say that "Content has to be public domain or freely licensed in both the source country and the U.S.", do you mean this only applies to commons? --Yurik (talk) 23:05, 23 February 2016 (UTC)[reply]

Yes - while WMF severs are located in the U.S., and so legally content only has to comply with U.S. law, the licensing policy on Commons requires compliance with the copyright laws of the source country too, which may be stricter or other different from the U.S. (c:Commons:Licensing#Interaction_of_US_and_non-US_copyright_law). English Wikipedia allows works which are public domain in the U.S. only (e.g. see w:en:Template:PD-USonly). Wikidata would want everything to be CC0 (from what I've read) so would exclude even more data than commons (e.g. CC-BY and CC-BY-SA data). A new domain could, as noted already, choose what rules/licencing it wants to use. I'm not sure about meta or mediawiki.org, but perhaps they would follow US laws only? - Evad37 (talk) 03:08, 24 February 2016 (UTC)[reply]