Jump to content

特殊な言語コード

From Meta, a Wikimedia project coordination wiki
This page is a translated version of the page Special language codes and the translation is 26% complete.

The language of a Wikimedia wiki can be found in the lang="..." and xml:lang="..." attributes of the <html> element of each page (or other elements for specific subcontents in multilingual pages); they are also used for styling in CSS language selectors. These language codes should generally be canonical language tags as defined by BCP 47.

In most cases, the subdomain names that we use for projects correspond to language codes, but there are some remaining exceptions. This usually occurs for historical reasons, where a valid ISO 639 code (or registered and non-deprecated BCP 47 variant code) was still not available at the time of creation of the project, but also because some former ISO 639 codes where deprecated or removed as they encompassed an group of languages that are now considered distinct.

Deprecated or removed ISO 639 codes are still considered valid in BCP 47 (where existing codes are not removed) most often as possible fallbacks for missing translations or to allow upward compatibility, even if they are no longer recommended for modern use and newly created contents (using these codes can potentially create unsolvable disputes in Wikimedia unless they are distinguished with distinct translations using newer codes). In some cases, some early distinctions in ISO 639 have also been removed because they were introduced artificially for a temporary time (sometimes for non-neutral political reasons) but not well supported by users, and when they unnecessarily complicated the task of translators, or when they too frequently required the use of language fallbacks or automatic transliterators (when a reliable standard and orthographic conventions was adopted between most users of different script variants), or because of development of education for better mutual understanding and acceptation of multiple variants in vernacular use.

Subdomains that do not match their lang attribute

サブドメイン 言語 プロジェクト(群) 注記
als Local name: Alemannisch
English name: Alemannic
Language familyGermanic
Wikipedia, Wiktionary, Wikibooks, Wikiquote Uses gsw which matches the language's ISO 639-3 code.
bh Local name: भोजपुरी
English name: Bihari
Language familyIndo-Aryan
Wikipedia
Tracked in Phabricator:
Task T41968 stalled

Ambiguous legacy code. Uses bho which matches the language's ISO 639-3 code for one language of the family.

roa-rup Local name: armãneashti
English name: Aromanian
Language familyItalic
Wikipedia, Wiktionary Uses rup which matches the language's ISO 639-3 code.
simple Local name: Simple English
English name: Simple English
Language familyGermanic
Wikipedia, Wiktionary Uses en of ordinary English.
zh-classical Local name: 文言
English name: Classical Chinese
Language familySinitic
Wikipedia Classical Chinese has ISO 639-3 code lzh.
zh-min-nan Local name: 閩南語 / Bân-lâm-gú
English name: Minnan
Language familySinitic
Wikipedia, Wiktionary, Wikibooks, Wikiquote, Wikisource Min Nan has ISO 639-3 code nan.
zh-yue Local name: 粵語
English name: Cantonese
Language familySinitic
Wikipedia Cantonese has ISO 639-3 code yue.

Miscellaneous:

  • All subdomains of wikimedia.org

Subdomains that do not conform to a valid ISO 639 language code

サブドメイン 言語 プロジェクト(群) 注記
als Local name: Alemannisch
English name: Alemannic
Language familyGermanic
Wikipedia, Wiktionary, Wikibooks, Wikiquote
Tracked in Phabricator:
Task T6793 stalled

Alemannic has ISO 639-3 code gsw. ISO 639-3 code als is assigned to Tosk Albanian instead.

bat-smg Local name: žemaitėška
English name: Samogitian
Language familyBaltic
Wikipedia
Tracked in Phabricator:
Task T27522 stalled

Samogitian has the ISO 639 code sgs.

cbk-zam Local name: Chavacano de Zamboanga
English name: Chavacano de Zamboanga
Language familyPidgin and Creole
Wikipedia
Tracked in Phabricator:
Task T124657 stalled

Chavacano de Zamboanga has no ISO 639 code as an individual language. ISO 639-3 code cbk is assigned to Chavacano, a superset of Chavacano de Zamboanga.

eml Local name: emiliàn e rumagnòl
English name: Emilian-Romagnol
Language familyItalic
Wikipedia
Tracked in Phabricator:
Task T36217 stalled

ISO 639-3 code eml for Emilian-Romagnol is now retired and split into egl (Emilian) and rgn (Romagnol).

fiu-vro Local name: võro
English name: Võro
Language familyFinno-Permic
Wikipedia
Tracked in Phabricator:
Task T31186 stalled

Võro has ISO 639-3 code vro.

iu Local name: ᐃᓄᒃᑎᑐᑦ / inuktitut
English name: Inuktitut
Language familyEskimo-Aleut
Wikipedia ISO 639 considers iu/iku not a single language, but a macrolanguage comprising ike and ikt. MediaWiki agrees (see phabricator), but: falls back to ike, called ike-cans; adds ike-latn; has no ikt support. CLDR considers Cans an aspirational script.
ksh Local name: Ripoarisch
English name: Ripuarian
Language familyGermanic
Wikipedia ISO 639-3 code ksh is assigned to Kölsch, a subset of Ripuarian.
map-bms Local name: Basa Banyumasan
English name: Banyumasan
Language familySunda-Sulawesi
Wikipedia Banyumasan has no ISO 639 code as an individual language. ISO 639-1 code jv/jav is assigned to Javanese, a superset of Banyumasan.
nds-nl Local name: Nedersaksies
English name: Dutch Low Saxon
Language familyGermanic
Wikipedia Duplicated with Low German's nds.
nrm Local name: Nouormand
English name: Norman
Language familyItalic
Wikipedia
Tracked in Phabricator:
Task T25216 stalled

Norman has no ISO 639 code as an individual language (However, two dialects of Norman, Guernésiais and Jèrriais, are sharing ISO 639-3 code nrf). ISO 639-3 code nrm is assigned to Narom language instead. ISO 639-3 lumps Norman with French, as with most varieties of northern France.

roa-rup Local name: armãneashti
English name: Aromanian
Language familyItalic
Wikipedia, Wiktionary
Tracked in Phabricator:
Task T17988 stalled

Aromanian has ISO 639-3 code rup.

roa-tara Local name: tarandíne
English name: Tarantino
Language familyItalic
Wikipedia Tarantino has no ISO 639 code as an individual language. ISO 639-3 lumps it with Italian, as with most varieties of northern Italy.
sh Local name: srpskohrvatski / српскохрватски
English name: Serbo-Croatian
Language familySlavic
Wikipedia, Wiktionary
Tracked in Phabricator:
Task T127680 stalled

sh was originally ISO 639-1 code for Serbo-Croatian but is no longer active. However, it remains a valid BCP 47 language tag. There is the ISO 639-3 code hbs for Serbo-Croatian. In CLDR aliases, sh maps to sr-Latn.

simple Local name: Simple English
English name: Simple English
Language familyGermanic
Wikipedia, Wiktionary
Tracked in Phabricator:
Task T110190 stalled

Simple English has no ISO 639 code but has a registered IETF variant subtag simple
However, even if the simple code is valid as a standard subtag for BCP 47, because it is only registered as a generic subtag for language variants for various base languages like en-simple or fr-simple (using the now standard variant subtag is preferable to using multiple subtags including an unregistered private extension, like "en-x-simple"). As a plain tag, "simple" means nothing in BCP 47 or ISO 639 (as it is not a plain language).
Note that under ISO 639 rules, Simple English is a variant or dialect or special orthography of English (so it can be registered as a variant subtag of English, like "formal" or informal" used in German or Dutch), defined as a subset for some limited usage. The IANA database for IETF's BCP 47 already indicates this and BCP 47-aware applications should have no problem to identify the language as being part of normal English, as long as it is properly tagged as "en-simple" and not just "simple". Also the content of the Simple English Wikipedia is hard to assess if it is really in "Simple" English or just normal English, as there's NO standard for such "simplication" but only an editorial community decision which is not really enforceable, except for some presentation rules. For example, there does not exist any "Simple English" dictionary, and all "Simple English" users refer to normal English dictionaries. Simple English is only a stylistic decision made by different authors with different preferences or perception of what is "simple" enough for them individually; in fact, even the "Simple English" Wikipedia could be fully integrated within normal English Wikipedia by better classification of its content or by using decicated portals for some public with limited understanding of English, and by making sure the English Wikipedia does not enter into too much complex details without separating them into subpages or detailed sections: it should be possible to describe any topic in simpler terms before using more complex terms that would first be defined, using didactic/pedagogical rules for sorting this content. And there are different view about how to simplify English, depending on the audience (for youth? for non-native speakers? for disabled people? for some specific country? Are simplifications the same in US, Canada, UK, Australia, South Africa or India?).

zh-classical Local name: 文言
English name: Classical Chinese
Language familySinitic
Wikipedia
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30443 stalled

Classical Chinese has ISO 639-3 code lzh.

zh-min-nan Local name: 閩南語 / Bân-lâm-gú
English name: Minnan
Language familySinitic
Wikipedia, Wiktionary, Wikibooks, Wikiquote, Wikisource
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30442 stalled

Min Nan has ISO 639-3 code nan.

zh-yue Local name: 粵語
English name: Cantonese
Language familySinitic
Wikipedia
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30441 stalled

Cantonese has ISO 639-3 code yue.

その他:

  • tokipona – defunct Wikipedia subdomain
  • ru-sib – defunct Wikipedia subdomain, hoax in fictional “Siberian” language
  • be-x-old – fixed and redirected to be-tarask Wikipedia subdomain (see phab:T11823)

Other distinctions

サブドメイン 言語 プロジェクト(群) 注記
ms Local name: Bahasa Melayu
English name: Malay
Language familySunda-Sulawesi
Wikipedia, Wikibooks, Wiktionary Malay language used to be "ms", just like Indonesian language is "id", but since the Malay Wikipedia inception, the code "ms" has become the code for macro language (not individual language).

There are many individual languages under "ms"/"msa", including Indonesian ("id"/"ind"), Banjar ("bjn"), Minang ("min"), three living languages with their own Wikimedia projects, as well as Malay (individual language) ("mly"-Deprecated 2008 or "zlm"-Malay or "zsm"-Standard Malay / Malaysian Malay / Malaysian language)

It should be noted that the creation of Malay Wikipedia, Wikibooks, and Wiktionary all predates the change in the language code in 18 February 2008, with the latest one, Malay Wikibooks, created on 24 August 2004.

See also:

ak Local name: ak
English name: Akan
Language familyNiger-Congo
Wikipedia;
Closed: Wikibooks, Wiktionary

Are these two sets of wikis in the same language? See Wikipedia article.


Note that this situation is quite similar to the artificial distinction between Luxembourgish and Moselle Franconian, or between Serbian, Croatian and Bosnian: they are also clusters of dialects of the same mutually intelligible base language with just minor differences (for terminology choice or their preferred orthography, but multiple orthographies exist for all these dialects). It's hard (and in fact impossible) to make a real distinction at linguistic level, this is purely an ethnopolitical distinction and native speakers in one region going to the other region where the other dialect cluster is referred by a different name will be known there to use the other cluster name and will speak/write without more problems than in their origin ethopolitical community. This adaptation also occurs within each cluster, based on social interaction or level of formality (e.g. in religion, or for prestige, or for vernacular speech and jargons in the street or used by younger or less educated people).
tw Local name: Twi
English name: Twi
Language familyNiger-Congo
Wikipedia;
Closed: Wiktionary
de-formal Local name: Deutsch
English name: German
Language familyGermanic
Not used as host names but included as pseudo-variant subtags (unregistered) for some translations in translatewiki.net (used in Meta-Wiki for pages like policies when refering directly to wiki users according to their preferences): we should have used a private-use extension
nl-informal Local name: Nederlands
English name: Dutch
Language familyGermanic

テクニカル・ランゲージ・コード

特殊なランゲージ・コード qqx は、ページで使われている全ての システム上メッセージ のidsを表示するために使用できます。

関連項目