Case sensitivity of page names

From Meta

(Redirected from Case sensitivity)
Jump to: navigation, search

Case sensitivity in MediaWiki is both a blessing and a curse.

Whether or not the trade off is worth it, is beside the point.

Some times case matters, and preserving case allows mediawiki to handle those few cases where case matters.


Contents

[edit] The problems

  • Awkward title capitalization
  • Awkward sentence embedding
  • Failed search queries

[edit] Problemed pages

http://www.sosdg.org/~larne/pagenames.html

[edit] Possible solution1

  • automatically redirect to a page that has same spelling but different capitalization (have the computer do the disambiguation pages when a spelling doesn't match an existing page)

Negatives: Performance and possible search engine duplicate content penalties caused by MediaWiki's redirection mechanism.

[edit] Possible solution2

manually create disambiguation pages, then switch the wiki to case preserving case insensitive.

Negatives: lots of manual labor

Plusses: maybe performace will increase due to lower number of pages? and perhapse a mysql setting could be toggled to let it search faster too (because of no duplicates when doing a case insensitive search)

[edit] Possible solution3

    • implementation would go as follows: there would be an all lowercase database table that would get the page name, and if there are more than one capitalization methodology for pages that exist, it should go to the page that matches, or if none match, show a list of pages with the same spelling, and have a way to create a new page with that spelling.

Mysql could probably handle the merge quickly... between the "multiple document table" and the unique/main table

Minuses: hard to code?

Plusses, keeps case sensitivity for those who care.

there could be a bit of a speed up, because most pages would just have the lowercase lookup succeed...

actually, this could allow for quicker searches, because the table would be unique, so once a match is found, it doesn't have to look for any more. If there are multiple pages, a flag could be set on the main table to indicate that there are multiple pages

[edit] Possible solution4

  • A flag that says "title capitalization ok" could make the wiki renderer capitalize words in the title to title capitalization
  • Users could be nagged before creating a page without creating a page with standard convention for titles.
  • the redirect pages could be made..

[edit] Possible solution5

A script could automatically go arround making title proper, and all lowercase redirect pages for all existing pages....

[edit] Possible solution6

Change the attributes of the table "cur" so that the field cur_title is not BINARY. Then searches return results however the actual page is capitalized.

[edit] Possible solution 7

You can alter the 'page' table to contain non-case sensitive column for the page title, this solves the problem, and is easy to implement but introduces several international caveats. This process is described in detail here: http://obstinate.org/computing/making-mediawiki-urls-non-case-sensitive/

[edit] Option for any solution: Per-site Preferences

Would it alleviate some of the pain of "global/forced" application if the case-sensitivity was enabled by default and disabled (on a per-site-install basis) per admin direction? Further, when presented with a "disable case sensitivity" interface (particularly if it were displayed from a "Preferences" page), the software could convey to the admin which languages the case-insensitivity option currently supports; and/or associated documentation could also speak to which languages were supported and which not.

(Sorry if I missed this already presented somewhere; also I'm new to MediaWiki, haven't found a site-wide "Preferences" place yet other then LocalSettings.php - MattEngland 22:04, 16 Apr 2005 (UTC))

The above post is really quite old, but I'm going to respond to it anyway. I think most people who set up MediaWiki don't care at all about case-sensitivity and [[page]] going to a different page than [[Page]]. Hence the default should be not case sensitive. I think that most people go for case sensitivity simply to be able to display the first letter in lowercase. --Romanski 21:29, 8 January 2007 (UTC)

[edit] Major Issues

Some languages have different code sets! and mysql doesn't support UTF-8, so the data that went into the database isn't nice clean American readable ASCII so conversions to lowercase arn't super easy, as the language may or may not have lowercase!

[edit] related code

LanguageUtf8.php and Utf8Case.php

There would need to be two tables for every 1, a lowercase one, and an uppercase one??

(actually, the database can already search case insensitive...) (there could be a problem with duplicate searches..)

[edit] Problems with the possible solutions

  • it has to be coded :(

Non trivial

Language dependant


[edit] Lazy IRC paste

From a conversation on IRC.

<MrDarkUser> TimStarling: I rooting for case-insensitive page lookup after case-sensitive lookup
<MrDarkUser> boy oh boy my spelling is bad.. s/I/I'm
<MrDarkUser> I don't like making tons of redirects, and I don't like miss capitalization of titles..
<MrDarkUser> ... Also... if page names were all stored in lower case.. , with the exception being case preserving.. there could be a performance increase
<MrDarkUser> because there are 1/2 as many characters to choose from when searching, and it would lower the number of pages in the wiki that are created just to deal with poor capitalization that have to get searched through...
<MrDarkUser> of course.. this all has to get coded...
...
<TimStarling> I don't think the performance issues would be significant
<TimStarling> actually...
<TimStarling> it would take longer to parse pages with lots of links, especially on UTF-8 wikis where there's no mb_string
...
<MrDarkUser> TimStarling: oh.. you are right that it would take longer to parse links... mb_string?
<MrDarkUser> I just assumed that the to_lower function was very fast
<TimStarling> it's kind of non-trivial, unfortunately
<TimStarling> and language-dependent
<MrDarkUser> How do other wiki's deal with it? (a question that I shall have to try to look up)
<MrDarkUser> mediawiki is the only wiki that I know of that is case sensitive.. and that almost kept me from using it
<TimStarling> see LanguageUtf8.php and Utf8Case.php
<MrDarkUser> language-dependent... hurm.. there are seporate wiki's for each language though? or at least namespaces..
<Kate-> they inherit LanguageUtf8

Personal tools