Jump to content

Wikimedia language code

From Meta, a Wikimedia project coordination wiki
Translate this page!


be-tarask: Факты, аргумэнтацыя, рэалізацыя і апытаньне, зьвязаныя з моўнымі кодамі у Вікіпэдыі.

en: Facts, argumentation, implementation, and poll related to language codes on Wikipedia.

gem-CH (Schwyzerdütsch, Schwiitzertüütsch): Tatsachä, Begründige, Umsetzigä und Abstimmige zu de Schprooch-Chürzel vo de Wikipedia.

de: Fakten, Argumentation, Implementierung und Umfrage zu Sprachkenncodes bei Wikipedia.

es: Los hechos, discuten, puesta en práctica, y encuesta relacionada con los códigos de la lengua en Wikipedia.

fr; Les faits, l'argumentation, l'exécution, et le scrutin liés aux codes de langue sur Wikipedia.

gr: Τα γεγονότα, η επιχειρηματολογία, η εφαρμογή, και η ψηφοφορία πληροφοριών αφορούσαν τους γλωσσικούς κώδικες σε Wikipedia.

it: I fatti, l'argomentazione, l'esecuzione e lo scrutinio si sono riferiti ai codici di lingua su Wikipedia.

id:Kebenaran, pendapat, pelaksanaan, dan permintaan pendapat berhubungan dengan pengkodean bahasa yang ada di Wikipedia

jp: ウィキペディアの言語コードに関連する事実、議論、実施、および投票について。

ko: 사실, 논의, 실시, 및 정보 투표는Wikipedia에 언어 부호에 관련시켰다.

nl: Feiten, argumentatie, implementatie, en opiniepeiling hadden op taalcodes betrekking inzake Wikipedia.

pt: Os fatos, a argumentação, a execução, e a votação relacionaram-se aos códigos da língua em Wikipedia.

ru: Факты, аргументация, реализация/внедрение и опрос, связанные с языковыми кодами на(?) Википедии.

uk: Факти, аргументація, впровадження і опитування щодо мовних кодів у Вікіпедії.

vi: Những sự thật, luận chứng, thực hiện và trưng cầu liên quan đến mã ngôn ngữ tại Wikipedia.

yg: ꬸꭏ'ꬸꭏꬰ"、ꬾꭔꬸ'ꭃꭘꬸ"、ꬸꭏꬰ'ꬸꭏ",ꭄꭊ"ꭁꭒꬸ'ꭁꭏꬲ"ꭂꭏꬸ'ꬱꭏ"ꬹꭊꬲ'ꬱꭔꬷ"ꬱꭒ"ꬱꭒ'ꬱꭔꬸ"ꬵꭊꬺ'ꭂꭐ"ꬳꬽꭐꬸ'ꬶꭏꬺ"ꭃꭔꬽ'ꬿꭊꬺ"Wikipedia。

zh-cn: 事实、辩论、实施, 和信息民意测验与语言代码关系了在Wikipedia 。

zh: 關於在 Wikipedia 所使用的語言代號的事實、討論、實施,和民意投票。

zh-yue: 關於喺 Wikipedia 所使用嘅語言代號嘅事實、 討論、 實踐, 同埋民意投票。


Other languages: Deutsch | English   edit

From Wikipedia:Multilingual coordination

The Wikipedia community is committed to including any and all languages for which there are Wikipedians willing to do the work. We are aware that many of the world's 6,500 languages are not well-represented on computers or the web, and we are committed to working with language speakers and computing organizations to support as many languages as possible.

One standard for marking the languages used in 'net documents is RFC 3066. For the most part, this specifies using ISO 639-1's two-letter codes where available, ISO 639-2's three letter codes where two-letter codes are not available, and another set of codes (or regional/dialect specifiers tacked onto the above) where possible.

These codes are used in HTTP Accept-language and Content-type headers, in the HTML 'lang' attribute, and XML 'xml:lang' attribute. They're also used as the first element of the hostname for each Wikipedia's language editions: fr.wikipedia.org, nah.wikipedia.org etc.


Existing language codes and coverage[edit]

There exist at least 7602 languages on the world, because that is the number of SIL codes. It is assumed that 90% of the world's languages are likely to disappear by 2050 [1]

  • en:ISO 639-3 used by SIL/ethnologue (3-letter)
    • maximum: ca 17,000 codes
    • current: 7,602 languages (as reference how many languages approximately exist)
  • ISO 639-1 (2-letter)
    • maximum: ca 676 codes
    • current: 180

Language codes that look like country codes[edit]

There are approximately 50 "conflicting" language and country codes, listed in Language codes/Conflicts. A "conflict" occurs when a country uses the same code as a language that is not widely used in that country. Theoretically, country and language codes are orthogonal so the conflict does not exist.


2 letter language codes are too similiar to ISO 3166-1 country codes[edit]

This could have a mnemonic advantage; however, it could provoke confusion when the country and language codes conflict. For example, be.wikipedia.org could mean Belorussian Wikipedia or Wikipedia Belgium 2-letter subdomain often used by companies to run country-specific websites. See Language codes/Conflicts for a full list.

  • PRO 3-letter-language-code:
    • needless confusion and FAQ-writing can be avoided
    • users will not think that they get country specific content
    • in the long run ability to provide country specific content via a 2-letter-system. e.g. nl.wikipedia.org as entry for netherlands.

Small languages[edit]

Small languages without 2-letter code will get the 3-letter code. This is not nice. It's like saying: You are small, you get the longer URL, everybody will see that you are not in the group of the big languages. 3-letter code is better here.

Most of the world's languages don't have a three-letter code, either! However, even the 2-letter codes cover the vast majority of speakers. I'm in no hurry to mess with things overmuch just yet, but I'd be perfectly willing to make 3-letter codes available in the sort term as aliases, and any 3-letter-only languages that people want Wikipedia in can be set up using the 3-letter codes. --Brion VIBBER 05:33 7 May 2003 (UTC)
This seems to me to be a case of excessive political correctness. I am American but speak fluent Swedish, less than fluent Norwegian and read but do not speak Danish. My Swedish and Danish friends often speak of their languages as being small languages, which they are relative to English (Swedish is the largest of the Nordic languages with c. 11 million speakers.) They nonethless have 2-letter codes, as do hundreds of smaller languages. The practical question is whether the speakers of any language which has been assigned a 3-letter code have proposed to start a Wikipedia. Robertgreer 19:09, 30 December 2007 (UTC)[reply]

RFC 3066[edit]

Tags for the Identification of Languages, RFC-3066 language code assignments

Basic gist:

  • Use 2-letter codes from ISO 639-1 where they exist (en, fr, eo)
  • Fall back to 3-letter codes from ISO 639-2 where there isn't one (ger, art, cel)
  • Fall back to IANA-defined tags elsewise (i-tsu)
  • Use country or region/dialect/subgroup subtags where necessary to distinguish some of the more general codes (sgn-US, cel-gaulish, art-loglan)

Wikipedia is young and there is no need to repeat the mistake of using ISO 639-1. Fallback rule is nice, but it is easier if one does not need the rule at all, because one only uses 'one' code system and not 'two' like RFC. That does not mean not to allow aliases. Of course the old 2-letter codes can still be used, but should maybe get the status of depriciated, like we know it from HTML tags. We should allow every outsider to enter wikipedia by 3-letter code. 'Aliases would be fine.' Tobias Conradi 19:35 8 May 2003 (UTC)

Agreement with Tobias. My modification would be to fall back to 3-letter codes if the 2 letter code is also a country code for a nation that does not (largely) use the language. For example, "be" is Belgium, but also Belorussian, so we should use "bel" instead. See Language codes/Conflicts -- Kowey 19:03, 21 Dec 2003 (UTC)

I agree whit Brion. Please do not force existing Wikipedia's to use a different language code. Giskart 10:54 7 May 2003 (UTC)


Details about aliases /redirects[edit]

will the content be available in to ways, or will there be a server redirect? If so in which direction? Tobias Conradi 19:35 8 May 2003 (UTC)

I was intending to use redirects, so a visit to eg http://epo.wikipedia.org/wiki/Interreto would send you to http://eo.wikipedia.org/wiki/Interreto . This doesn't have to be permanent in that direction, but it would maintain the status quo; note that using redirects rather than just having aliases should cut down on weird things like login cookies not being available when accessed from the alternate URL, or search engines crawling and indexing the site multiple times. --Brion VIBBER 21:02 8 May 2003 (UTC)
I would vote for http://eo.wikipedia.org/wiki/Interreto as redirect to http://epo.wikipedia.org/wiki/Interreto . Then we have clear interfaces (forever?) and people get used to 3-letter. Will there be problems with this way?
Well, it's ugly as heck and inconsistent with usage of language codes elsewhere on the net. I'd rather not do it that way, and others have expressed the same opinion (see Giskart's comment above). And again, 3-letter codes won't cover all possibilities. Some languages will require dialect/region specifiers on top of it, or don't have any 3-letter code to work with at all, so consistency isn't going to be achieved. --Brion VIBBER 00:36 9 May 2003 (UTC)

Where to use the codes[edit]

Thoughts on language integration proposes that language codes be put as part of the URL path and not the domain name:

Personally, i feel this would create a lot less confusion over country codes. -- Kowey 10:14, 6 Jan 2004 (UTC)

This also would make the URL shorter because the "wiki" in the path is abandoned. Tobias Conradi

The html-code would be smaller for links to other wikis, because domain is allways the same Tobias Conradi 14:19, 20 Sep 2004 (UTC)

Informal Poll[edit]

Please make it known if you belong to one of the Wikipedias on Language codes/Conflicts because this likely affects you more than people on en, fr, etc.


Things that everybody agrees on

  • Wikipedias are for languages, not countries
  • We should use redirects to keep things compatible
  • HTTP headers and XML/HTML attributes should definitely follow the RFC

Keep 2 letter codes (RFC)[edit]

Switch to 3 letter codes[edit]

  1. Kowey - Malay (ms); no confusion more user-friendly (or don't stick it in the domain name)
  2. Tobias Conradi; ("or don't stick it in the domain name" is NOT a solution")
  3. Nightstallion (?) 16:28, 8 August 2006 (UTC)[reply]

Don't care[edit]

Seeing how many votes this poll has, it doesn't look like anyone else cares either -- 21:33, 31 Oct 2004 (UTC) |}

See also[edit]