Jump to content

Case insensitivity of page names

From Meta, a Wikimedia project coordination wiki

Note to Authors: Please consider merging with Case_sensitivity ?

(Please also let me know if this text is inappropriate; I'm a Wiki newbie.)

- MattEngland 21:53, 16 Apr 2005 (UTC)


A lot of people think it would be good for article titles to be case insensitive, so a link to en:red dwarf and en:Red Dwarf and en:RED DWARF would all go to the same place. A nice ideal, probably, but some implementation details:

  • It would be necessary to have a way to define the canonical case form of each title. Perhaps "move page" could be adapted, perhaps something new and exciting could be added.
  • If initial lowercase letters are then allowed, the canonical form of common nouns might be changed to be initial lowercase, which opens more consistency issues; a link at the beginning of a sentence will be capitalized, requiring someone to notice and fix it later if the article is created from that link.
  • Currently there are sets of pages which conflict. In many, there are just redirect(s) to a single canonical form. In others, there are multiple pages (as red dwarf above) which would need to be manually reconciled in a conversion.
  • MySQL doesn't grok Unicode at present, so we can't just proclaim the title fields to be case-insensitive. We'd need to add a second field with a case-folded form (probably lowercase) and use that for comparisons. At this point we're restructuring the database, and it may be worth going ahead and making other massive structural changes for general efficiency, since conversion will require downtime (see below). This would take some effort and testing.
  • Conversion would require taking the wiki offline for a while. Hopefully we'd be back up within a day or so.

Comment

[edit]
  • Some people think its not a good idea to make titles case-insensitive. They claim that such flexibility would cause naming convention problems and hurt the uniformity that currently is forced by the limitation.
    • The only limitation on case in titles at present is that the first letter will always become uppercase in the displayed title. Making titles case-insensitive would create a new limitation, that no two titles that match except for case could exist as separate articles. Separately, it is possible to remove the restriction on lowercase first letters in displayed titles.
      • Separately, it is possible to remove the restriction on lowercase first letters in displayed titles. Why isn't it also possible to enforce upper case on all lower case first letters?
  • The title capitalisation could be specified in wiki-source: eg: <title>h2g2</title> at en:H2G2/en:H2g2,
    • This wouldn't interact well with page moves, as the title tag may no longer reflect the title of the page.
  • One idea is that case sensitivity is an irritant mostly for users of the "Go" feature, and case-folding as mentioned above in the context only of the "Go" and "Search" features might solve most irritation while involving a minimum amount of work on the articles.
  • I spend about 50 percent of my wiki editing time... spent on redirecting pages! This sure ain't wiki-quick. (rem: wiki is Hawiian for quick) I dispise the case sensitivity for knowledge bases, and correcting capitalization errors is very difficult. Also, if you move a page with a /subpage link (with the /) the sub-page doesn't get moved! possible solution: have case-sensitivity be enabled by category, or namespace. So those chemists with capitalization requirements can have 'em, but the rest of us arn't stuck with a lot more work. AaronPeterson 11:22, 20 Jul 2004 (UTC)
  • Being forced into case sensivity is a real nuisance for admins of projects that do not need this. In my opinion, having the opportunity to define all project's page titles as case insensitive is not just a good idea - it is a necessity. You should be given the choice of configuring case sensitivity / insensitivity for page titles right in the install script. This should not be too hard to implement for the start. Of course converting an existing project is way harder...

"Currently there are sets of pages which conflict".

Could a list be generated of such conflicts? Ie, pages A and B, which differ only in case, where A is not a redirect to B, B is not a redirect to A, and both pages do not redirect to the same page C. If a list was posted, they could be fixed... -MyRedDice

Someone will have to write a script to do that. --Brion VIBBER 16:17, 5 Sep 2003 (UTC)

Language Ambiguity

[edit]

Different languages have different rules about capitalization (and probably even specialized subsets, such as Physics and Psychiatric texts). Matching up multiple case use can get quite complicated, particularly in situations when multiple languages are mixed within one Wikipedia. Caseless collisions would be a reasonable way to force disambiguation.

Proposal

[edit]

Why not change a bit the way the titles work? One could define the displayed title AND additionally define synonyms: Example:

  • First set the main title: "The yearly turnover of US economy" (= the title that is displayed)
  • then define synonyms: "turnover", "US turnover", "US economy" etc.
  • Each main title and all synonyms should be editable (in an editfield) when one clicks on "Edit this page"

I personally cannot think of anything more confusing than casesensitivity. In the seldom cases of WANTED casesensitivity it should be triggered by special checkboxes.


Possible Solutions

[edit]

a "unique filename" or "not unique capitalization wise"flag could be used to indicate that there are no other articles by the same name...

All articles that are unique, and have english style captialization, should be 100 percent capitals, if the "unique" flag is on.

when loading an article to edit, it should automatically toupper the search, and if the unique flag is not set, go to the mathcing capitalization page.

The result would be a "case preserving, case sensitive, case forgiving" system. (oh, poo, a new field would have to exist to preserve the filename as intended)

tions and either load that page up, or provide a warning, or an option to do so... and when the page doesn't exist, the system should search for other pages with alternate capitalization... all wiki pages should be stored in caps only if the "unique filename" flag is on, and that should indicate that it is _the_ article for that.

Transitioning

[edit]

OK, so maybe there are current case-conflicts in the wiki, but there's a simple way to transition once the new fields are added to the database. Any time a page is to created, a check is done against the case-insensitive column for collisions. That way, no new collisions are created, and old ones can be phased out the same way syntax errors are &#151; by creating a list of "collisions-to-be-dealt-with". -- Phyzome on the en: wiki