Talk:Beyond categories

From Meta, a Wikimedia project coordination wiki

Moving towards a structured storage and tagging system[edit]

A possible solution, that has been commented several times, would be to move Commons to a structured system for storing file metadata (similar to Wikidata but not necessarily the same). Once done this, it would be easier to get rid of categories, using Wikidata items as tags. --Micru (talk) 22:37, 5 May 2013 (UTC)[reply]

So how and what could we, the WM community, do to achieve this goal? -- Kosboot (talk) 01:30, 6 May 2013 (UTC)[reply]
In the long run there ought to be a better way for the community to express technological concerns to developers. This is entirely a development problem, but the impetus for fixing it is community demand (and funding/staff). This is a large problem developmentally, so if it is to be fixed, community support has to be behind it.
The role of the community is to described the desired end product, talk things through, then create a request for exactly what we want done. Ideally, these requests would all go to the same public forum to some kind of queue for standing proposals, but no such forum exists right now which can be accessed by the community. The developers would probably be the ones to decide how they wanted to receive proposals of this sort. For now, the proposal could go on this page. Blue Rasberry (talk) 14:45, 6 May 2013 (UTC)[reply]
What about Tech? It seems to be pretty close to what you're describing. Klortho (talk) 16:01, 6 May 2013 (UTC)[reply]
Tech is cool. I think I would like some combination between that and Wikipedia:Perennial proposals. Tech seems to be a place to handle one-time small concerns. Blue Rasberry (talk) 16:39, 6 May 2013 (UTC)[reply]

Overall goals and scope[edit]

The previous discussion topic was exactly on target, I think -- that's what I had in mind when I first made the inquiries to the mailing lists. But I don't think it should be restricted to Commons. Ideally, we'd be able to find a solution that would work across the wikis. I am going to try to write a few paragraphs in the content page about my ideas about the goals and scope of this effort should be, but it is very tentative. I am not sure how realistic they are, and very fuzzy about the means to implement them, and I'm sure I will get a lot of the terminology wrong. So, I'm just intending that to be a starting point -- please feel free to make corrections and adjustments, and/or to talk about it in more depth here on the talk page. Klortho (talk) 02:59, 6 May 2013 (UTC)[reply]

Of course this should not just be on Commons. The tags can be translated and they can constitute base information for giving information cross-wiki. There are probably hundreds of categories for monarchs, for example, and they could change at any time, so there is no reason to definitively translate them. But if we could translate just the word "monarch" without worrying about category intersection, and if we could tie that attribute to the wikidata identifier for the article, then instantly all languages get that information when it is applied just in one place. This must be what you are thinking.
This can never happen with category system because that system necessitates individual translations for each category because of the arbitrary intersections. Blue Rasberry (talk) 20:57, 6 May 2013 (UTC)[reply]
For me the big difference between applying this approach to Commons and to Wikipedia is that most articles on Wikipedia have (or should/could have) related wikidata page, and images on Commons will likely never be represented on wikidata. As a result we could be doing categorization of Wikipedia articles on Wikidata, but we can not do that with images. User:Multichill proposed to install wikibase software on Commons and use it for categorization. --Jarekt (talk) 14:40, 7 May 2013 (UTC)[reply]
The ultimate solution will probably have some overlap for translation if nothing else. After tags are translated to 250 languages for either the Wikipedias or Commons I am sure that there will be some way to migrate all those translations to the other projects should they want to use any of the same tags. Blue Rasberry (talk) 18:07, 7 May 2013 (UTC)[reply]

Wikidata query potential[edit]

On the Wikidata mailing list we've been discussing how much data we'd need to replace the category pages with visual query pages. The user experience for current scenarios would remain essentially the same. You could scroll down to the bottom of the article for Arnold Schwarzenegger and click on 20th-century American actors, but instead of going to a category page you would go to a query page that shows you the same results. The advantage would be that from a query page you have many more options about what to do next compared to a category page. For example, you could delete the American restriction on the query, which is the same as going to the 20th-century actors supercategory. But you could also add an extra politician restriction, and then you would be looking at 20th-century Americans that are both actors and politicians, which you can't do with the category system. An intuitive visual query page pulling from Wikidata would provide many more compelling user experiences compared to a tag system or the current category system. An early prototype of querying Wikidata is here: http://toolserver.org/~magnus/ts2/wdq/ I'd estimate that Wikidata has currently imported about 1/4 to 1/3 of the data we'd need to start getting comparable results to the existing categories. Wakebrdkid (talk) 20:03, 6 May 2013 (UTC)[reply]

You seem to understand exactly what we want. What can we do to support this effort? Where has this been discussed on Wikidata? Blue Rasberry (talk) 20:49, 6 May 2013 (UTC)[reply]
If I'm not mistaken, he's referring to this same thread. (Is there any way to get a view of a whole threaded conversation from mailman archives, or can you only view one message at a time?) Klortho (talk) 03:38, 8 May 2013 (UTC)[reply]

efficiency[edit]

In previous discussions where category intersection has come up, efficiency has somewhat been a sticking point. There are categories with over a million articles and its important that any category intersection system can scale to wikipedia size, where the naive approach of using joins in sql (aka what DynamicPageList does) doesn't. Various people have suggested using lucene/solr to address the problem (see comments on bugzilla:5244). Perhaps the wikidata people would know how best to make such a system scale properly. Bawolff (talk) 16:12, 7 May 2013 (UTC)[reply]

I wonder how flickr does it. For example I intersected tag "paris" and "1914" and got this list. I am no expert on databases, but I though that databases in general can be optimized for some specific types of queries, so they can perform them really fast. Or different database software (like lucene/solr) which is already optimized for specific set of tasks can be used. --Jarekt (talk) 16:48, 7 May 2013 (UTC)[reply]
I am not an expert either, but I think it is not right to imagine these as SQL queries or as indexed full text queries. If it's implemented using RDF triples (Semantic Web) then I'd think these would be closer to SPARQL, which is specifically optimized for these types of queries. But I don't know anything about Wikidata's specific implementation. Klortho (talk) 03:44, 8 May 2013 (UTC)[reply]
looks like wikidata will be using full text indexing (eg solr) - http://lists.wikimedia.org/pipermail/wikitech-l/2013-May/069413.html Bawolff (talk) 18:45, 14 May 2013 (UTC)[reply]

Classes[edit]

First of let me say that having invested time and effort into the existing category system I would really hate to see it done away with, however I can see how it can be supplemented by a classification system in addition to the existing categorisation one.

These ideas have come about from a discussion on Commons, in which the analogy of library came up. Two possible analogies came up, one in which every book was on display on the shelves and sorted so that they can be found but clearly on display so that they can be easily browsed, the second is if books on similar topics were placed in well labelled boxes, with some boxes inside other ones, and the boxes sorted. At the moment Commons tends towrds the things in boxes analogy, and works well enough if you can understand the labels, were it breaks down is if because of language e.g. English is not your first language, or naming conventions, e.g. you're looking for Flemish paintings in the Netherlands and the files you are looking for are in the box Southern Netherlandish paintings in Amsterdam. If you understand both the naming conventions and how to arrange the boxes you can of course place the second box in the first, however if you are a casual user of the system this will not be obvious, and anyway you're looking for a file for a particular purpose with no interest in helping to maintain Commons.

So can the all the books on display on the shelves approach work? In a real library every book is usually given a number based on the Dewey system and then placed on the shelves in accordance with those numbers. If every file on Commons was given a unique number based on a number of fields, then a file can be called up by requesting that identitification number, or a user can make a request in a number of fields, and the files that most closely match that request be located.

The current description of Categorisation at Commons is that of a tree with each subcategory being a branch that links back eventually to the trunk. In fact because of the way subcategories intersect I think its more akin to multiple vine plants with their vines intertwined.

A classification system would be best visuallised not as a tree, but a multi dimensional matrix, with rows and columns representing different fields with the intersections of those fields describing a file in the Commons database. So using the example Churches in Amsterdam, rather than using Amsterdam as a description for a file we can perhaps have one field which is the geocode of the file, the co-ordinates of the object is unique and will do away with churches in Amsterdam, being a subset of churches in the netherlands, which is a subset of churches in europe etc. One field may be what an object is so say 00001 for buildings, 00002 for vehicles, 00003 for artworks, 00004 for events. Subsequent fields may be a descriptor so for field two, say 00001 for religous, 00002 for military, 00003 for government, and for field three say 00001-for Adventists-, 00002 for Baptists, 00003 for Calvinists. So our churches in Amsterdam would in addition to being categorised as such would have an identification number along the lines of for example 1,00001-2,00001-3,00003-52°22′23″N 4°53′32″E.

Such a string would in itself be meaningless to anyone but the most nerdish of users, however users would not search for a file by its numbers but by which member of each field it belonged to, and this can be realised in multiple languages.--KTo288 (talk) 11:44, 8 May 2013 (UTC)[reply]

I think it would be more effective to simply have multilingual categories. Cryptic ids should be hidden from users when possible. In a library you need a short identifier that can be ordered since the book can only be in one place and you need to physically find that place quickly. In computers there is no need to physically go to the location. Bawolff (talk) 04:26, 9 May 2013 (UTC)[reply]

de.Wikipedia.org[edit]

I believe German Wikipedia seems to have implemented a category system of the kind this page is taking about. Rather than complicate categories which can lead to redunandcy, theirs seems to work with words like tagging. -- Kosboot (talk) 15:48, 11 August 2013 (UTC)[reply]

IEG proposal on category systems in WMF wikis[edit]

I have submitted a proposal for an Individual Engagement Grant for the first phase of a project looking at the category systems in Wikimedia wikis. In this first phase I will research the nature of the English Wikipedia's category system, as the first step in designing ways to optimize category systems throughout WMF wikis. In later phases, I plan to

  • Research how readers and editors utilize the category system in the English Wikipedia.
  • Investigate the category systems in other language Wikipedias and in other WMF projects.
  • Explore the value and feasibility of using Wikidata as the basis for the category system across WMF wikis. If deemed appropriate by the community, work with the community to develop and implement this.
  • Utilize user-centered design methodologies to prototype various enhancements to the category system to improve the user experience. If deemed appropriate by the community, work with the community to develop and implement such enhancements.

If you would like to endorse this proposal, you can do so here. I would also appreciate any other feedback, pro or con, which can be posted here. Thanks! Libcub (talk) 06:53, 7 April 2014 (UTC)[reply]