Talk:Category math

From Meta, a Wikimedia project coordination wiki

I'm working on this[edit]

I have an alpha version on my wiki directory. The implementation of categories is a table that has a row for each category/page combination, so it's basically an index. I have the SQL pretty much nailed down to find pages belonging to 2 or more categories (category addition), or to find categories listed in pages with the categories you're interested in. I have a plan for a much cleaner, more intuitive interface, but haven't had time to write it yet. The plan is to start with an initial cateogry, then have the option of clicking subcategories or intersecting categories, and as you refine, you continually get a list of pages with the categories you're specifying. Just gotta write the code... If anybody want to help (or just see it) I'll post my source on my user page (make a sub-page, actually).--Aerik 08:50, 8 Apr 2005 (UTC)

Update Apr 25 2005: I have a working first draft draft of this at the address above that lists categories that intersect with a given list of categories (logical AND), categories belonging to the selected categories (subcategories), and pages belonging to the selected categories. I am not working on the interface and excluding results from a given list of categories.--Aerik 08:18, 25 Apr 2005 (UTC)
Slight update: I posted my current source code at User:Aerik/Intersections_code - it's still under development and needs scrubbing, but is functional fyi.--Aerik 09:33, 5 May 2005 (UTC)Reply[reply]
Aerik. I've tried it out, and it seems to work great!
Hope people dont mind, but I've pushed this to the top of the page, as the implementation of category math, and I've refactored the page a bit. Now that we have an implementation to discuss, I think we can move discussion and opinions onto this dicussion page, and on the Category_math we say 'this is what we have so far, and this is where we're going' type information. Does anyone have alternate implementations to link to? -- Harry Wood 14:46, 10 November 2005 (UTC)Reply[reply]

intersection of categories[edit]

I'd like to have a feature showing me the intersection of two or more categories. I'm working on a german wiki about school issues. It would be helpful to search like: articles in the categorie music and XI century.

Second I propose to enable searches within a categorie: all the articles containing "Beatles" in the categorie "music". -- Thkoch2001 6 July 2005 15:12 (UTC)

Se we have this now (intersections) if you look Aerik's implementation.
Searching within a category would be handy too (probably more handy than some of the more advanced kinds of category math), although that's out of scope of this discussion. It may be discussed elsewhere already. -- Harry Wood 14:46, 10 November 2005 (UTC)Reply[reply]

I think Intersections is by far the most important of the category math functions discussed here. I recently showed my workmates the category feature. They started using it, and quite quickly decided they wanted to do intersection queries. As such, it might be more useful to present intersection queries more prominently in the user interface, and make it easier to do a basic intersection query. Subtraction and Union are going to be used less often I think, so they can be tucked away on a special page for advanced users. Perhaps on the category pages themselves, we could have a little 'find intersection' link at the bottom. -- Harry Wood 14:46, 10 November 2005 (UTC)Reply[reply]

User Interface[edit]

For me, this is the real bugaboo. I hope others have some good ideas.

For advanced users, it would be good to be able to type something like poet&(irish|british)&!writer, e.g. using & for and, | for or, ! for not. Or other symbols if people find that more useful, I'm just listing these because I'm a C programmer :)

For beginners, the 'search' box to the left needs an 'advanced' option, which goes to a full page with three or four text entry boxes, and between each of them is a pulldown menu listing the options 'and', 'or', 'and not', 'or not'. With a brief explanation below stating that 'or' means inclusive or (as opposed to xor/either-but-not-both, which is generally useless in searching) and whether A&B|C means (A&B)|C or A&(B|C).


One: Requesting something like "!A" would have the potential to put a lot of load on the server as it goes through all categories except A. You should probably disallow expressions like this (however, I can't think immediately of an algorithm to find all expressions like (!A)|B or (!A)&(!B).).
User entry: AND, OR, and NOT spelled out is standard, as are + for AND and - for NOT. Straight left-to-right precedence makes sense (so your example would be (A&B)|C, and !A&B is (!A)&B).
For creating fixed autocategory pages, a link like w:en:Autocategory:(Irish people) AND (Poets) would be no fun to type in. (The parentheses are to express that AND is not part of a category name.) The better solution would be to make an actual page at en:Category:Irish poets containing some text like:
Autocategory: (Irish people) AND (Poets)
Nickptar 23:44, 16 Apr 2005 (UTC)
I like the idea of the "predefined" page. This could work similar to red links: Each time a user enters a category name that doesn't exist yet it will provide UI to define it. It will be stored and expire after 12 months if nobody uses it. SebastianHelm 07:13, 15 Jun 2005 (UTC)
Disallowing expressions that would include articles by default instead of excluding them would be easy: replace every category with "false", so (!A)|B would become (!F)|F, which would become T|F, which would become T. (!A)&(!B) would also become T, but "proper" expressions would become F. Alternatively, just disallow the use of ! on its own and add an &! operator, which would do category subtraction. -- 23:14, 19 June 2006 (UTC)Reply[reply]

UI directly on Category page[edit]

Alexandre Van de Sande proposed on Categorization to put the UI direcly on the Category page. The problem with this is of course to make all possible categories easily accessible. This could work with a UI like the following. I'm aware that this is not trivial to program, but I wanted to describe an ideal solution. This is just a brainstorming, after all!

Scenario: Alexis wants to find a XX century French painter.

(1) On the main page, she clicks on "Find by Category".

  • This opens a page with 5 - 12 widgets such as
    • "People" and "Abstraction" lists;
    • "Country", combo box;
    • "Year" text box with a checkbox that says "exact";
    • "Other" browse widget. (In order to find e.g. "Self mutilated nuts")
Note: The selection of the widgets could be defined by the category "Fundamental" or self-generated, based on usage. The decision which kind of widget is used could depend on simple heuristics, such as the number of elements and the length of the category names.
Optional: Lists could allow multi-select.

(2) Under "People", she selects "by occupation".

  • This rearranges the lists. "People" is replaced by "Occupation"; "Abstraction" disappears; "Year" is replaced by "Born" and "Died"; and maybe some new widgets.
Note: This behavior is controlled by what category entries occur in the articles. The information is stored in a table that is updated once a month (or when forced by the developer).
Optional: On the top, breadcrumbs allow Alexis to go back. (Maybe not necessary if we can rely on the browser.)

(3) Under "Occupation", she selects "painter". This opens a new list for "Style".
(4) Alexis selects "France" in the country list.

  • Now the number of results is less than 200. The page therefore displays a list of all articles that fit the criteria directly below the selection lists.

(5) Alexis decides to narrow the list further down. She enters "1950" in the "Died" box. Since the "exact" checkbox is unchecked by default, this does not filter articles but weigh them depending on how close the date is to 1950.

  • Articles will be displayed in order of decreasing weight.
Note: Since years are arranged on a linear scale we could display articles in two columns, one for all articles with older dates, and the other one for articles with dates after 1950.

SebastianHelm 08:54, 15 Jun 2005 (UTC)

Naive UI- Pseudo Full Text Search[edit]

Agreed about the UI being a huge problem. First I really want to say this is a very very important thing for some collections with disparate non textual material (eg Commons).

Another approach for a simple UI is to present what looks like a full text search box. Maybe it just is the regular search box with a query checkbox ticked. Anyway, in this query mode, users type in what they are looking for, and the engine matches their words against known words in the category taxonomy. A disambguation screen can be presented with pick lists of candidates. The matching is done by lists of words that the category authors encode. These are not synonym lists, but "words that are associated with this category" that a user might type in if they were not knowlegable about the categorization taxonomy being used of the Wiki. (Which, let's face it, will almost always be the case for typical users- even advanced users for the classification taxonomy outside their domain of interest.) MakThorpe 19:39, 24 March 2006 (UTC)Reply[reply]

Semantic Mediawiki[edit]

Maybe you should read about the Semantic MediaWiki project, it also has to deal with intra category searchs. HTH --surueña 14:07, 22 February 2006 (UTC)Reply[reply]

relation to DPL[edit]

This may be a stupid question, but isn't this very similiar to a DPL, just its a special page, instead of inline extention thingy? Bawolff 01:20, 27 March 2006 (UTC)Reply[reply]

Not at all similar. Among other things, this would most importantly allow you to do set intersections. Eg: Picture one possible User Interface scenario in Commons- Clcking "Category: US Navy Battleships" and also clicking "Category:naval actions" from a pick list would give you a search result giving you the intersection of those two sets, or in boolean terms Battleship AND "naval actions. This has some very powerful applications, and affects taxonomy evolution, since it is not necessary to produce category Hacks to achieve the same result eg: US Navy Battleship Naval Actions. It also has implications for how categories are applied: That is, that each and every image should have categories applied. Although Commons' gallery articles will be delivered as search results, they will do so rarely since they will not have distinguishing particulars. An image would both have to have the battleship tag and the naval actions flag for the hit to be produced. Certainly looking way down the road, this sort of thing good be done with attribute inheritance as in OODBMS's, but the proposed operations have significant enough server load implications to not get too far ahead of ourselves. Besides, in terms of formalisms, it would be improper to inherit from an aggregation since the semantics of that aggregation are not explicit. Proper inheritance would be by ascending the category tree.
More than you asked, but just to give you an idea of the breadth of the implications of this sort of feature... I focussed on Commons, but it really is useful for any collection where lack of textual information associated with objects makes FT search an ineffective mechanism. I'd like to take this opportunity to once again cheer on Aerik and any other developer(s) who are now or in the future choosing to work on this implementation and associated functionality. MakThorpe 08:11, 27 March 2006 (UTC)Reply[reply]

New proposal for user interface at English Wikipedia[edit]

We've desinged a user interface for implementing "Category math". We are calling it "Category intersection". Please take a look. Thanks. -- SamuelWantman 23:49, 30 August 2006 (UTC)Reply[reply]