Jump to content

Categorization with field-value pairs

From Meta, a Wikimedia project coordination wiki
(English) This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.
Translate

One prominent application of field-value pairs is for automatically categorizing articles in Wikipedia or other MediaWiki sites. This proposal outlines how to make this happen.

Acknowledgements

[edit]

Most of this work has already done by Magnus Manske, by the way, and is included in MediaWiki; he used links in a pseudo-namespace Category:, so there's a minor variation in syntax and the back end, but the idea remains essentially the same.

Rationale

[edit]

It's tedious to maintain long lists of articles pertaining to a particular category. This proposal would automate some of this work.

Design

[edit]

Markup

[edit]

Article editors would mark up articles with one or more fields indicating the category they belong to. For example, w:William Carlos Williams might have markup like this:

 [[category=American poets]]
 [[category=20th-century Americans]]

Articles for categories should/could(?) have markup to indicate that they are categories, and may have markup for categories they belong to. For example, w:American poets might have:

 [[type=category]]
 [[category=Americans]]
 [[category=poets]]

Articles for categories may also have Wikitext for explanations of the category or for describing the idea in general.

Category names are just article titles. They can be used for other namespaces, too. Say, for w:Wikipedia:User preferences help, we could have:

 [[category=Wikipedia:Help]]

(NOTE: Category names here are examples; good category naming and categorization is something to be set by policy in the MediaWiki installation, and probably also to grow organically from practice.)

Article rendering

[edit]

For all articles, the categories will be extracted and shown as links out-of-page, like interlanguage links. For example, for William Carlos Williams:

   Other languages: Deutsch Francais Espanol
   Categories: [[American poets]] [[20th-century Americans]]

The links would have the same display as in-page links, that is, broken links -- no such article -- would be in red or with a question mark.

Category rendering

[edit]

In addition, articles that are tagged with [[type=category]] would have additional output after the article text to show the articles in that category, and show the sub-categories of that category. For en:American poets, we might see:

    American poets started a new tradition separate from their English
    counterparts in the early 18th century, and have continued... (etc.)
    Subcategories:
    * w:Language poets
    * w:Romantic poets
    * w:Beat poets
    Articles:
    * w:William Carlos Williams
    * w:Edgar Allen Poe
    * w:Elizabeth Bishop

Only articles that directly reference this category will be displayed; for example, if w:Allen Ginsberg were tagged with [[category=Beat poets]], his name would not appear on the American poets page, but on the Beat poets page. (If, however, his page was tagged with both categories, it would appear on both pages. There's flexibility here for different kinds of article-category relationships.)

Note that category articles can also be in categories themselves, and their categories would be listed in the same space (by the interlanguage links) as any other page. So American poets might have:

     Other languages: Dansk Hindi
     Categories: w:poets w:Americans

Advantages

[edit]
  • Categorization and site navigation becomes less of a chore.
  • Text for category articles stay relatively static; the articles in the category carry the category relationship information.
  • Multiple categories possible.
  • Using fields means categories can be in different namespaces. May be of limited utility except for Wikipedia: and the like.

Disadvantages

[edit]
  • Inflexible rendering.
  • Rendering further clutters already crowded interlanguage-links area.
  • Huge amount of manual work to do to categorize English Wikipedia
  • Lists of articles and sub-categories may get unreasonable long (although this can be ameliorated with careful categorization schemes).
  • "Invisible" category fields may be difficult for novice editors to grasp.
  • Unclear generalization of what the "type" field means. What other types would there be?
  • Unclear whether categories refer to the article or to the subject of the article. For example, would [[category=biography]] be appropriate for w:William Carlos Williams?
  • May get abused for purposes other than categorization where more specific fields may be more appropriate. For example, a status field for articles ([[status=stub]]) may be useful, but could be preempted by [[category=stub]] before the "status" field is implemented.
  • Fuzzy distinction between part-whole relationships and category-member relationships. For example, is w:Timeline of Quebec history (1608 to 1662) part of w:Timeline of Quebec history? Or is it in the category of w:Timeline of Quebec history? (See Series of articles for a visual way to show this distinction.)
  • Have to tag categories with [[type=category]], or it won't work. This is an optimization to keep from doing extra database hits when showing a page.
  • Conflates the category-instance relationship with category-subcategory relationship. In the example above, "William Carlos Williams" is an instance of the "American poets", but "American poets" is a sub-category of both "poets" and "Americans". This may or may not be a real problem.

References

[edit]