有多種書寫系統的維基百科

From Meta, a Wikimedia project coordination wiki
This page is a translated version of the page Wikipedias in multiple writing systems and the translation is 30% complete.

這篇文章描述了每個使用多種書寫系統的維基百科。[1]如果您是下列語言之一的母語使用者,並且在書寫系統之間需要自動文字轉換系統,那麼歡迎您幫助我們編寫一個包含字母和轉寫規則的比較表。我們可以幫助您創建對應的轉換器。

Some third party sites exist that help integrate transliteration efforts and welcome helps. (original message was written by Kprwiki)

有自動轉換系統的語言

Wikis in those languages are implemented language conversion systems, either within the MediaWiki software (see MediaWiki.org documentation for more technical informations), or via local scripts or gadgets.

完全支持

古英語

盎格魯-撒克遜語有兩種書寫系統:拉丁字母和盧恩字母。盎格魯-撒克遜維基百科的每個頁面都已啟用自動轉寫系統。

The Balinese language has two writing systems: Latin and Balinese scripts.

An automatic transliteration system is developed on Balinese projects to convert from Latin to Balinese scripts, it's unclear if the reverse converting system is supported or not.

There are three variants for Latin scripts:[2]

  1. DHARMA transliteration (ban-x-dharma)
    Transliteration rules following DHARMA project "strict transliteration".
    Mostly follows ISO 15919, with modifications for precision and broader coverage.
  2. Palmleaf.org transliteration (ban-x-palmleaf)
    Transliteration rules developed for Palmleaf.org.
  3. Puri Kauhan Ubud transliteration (ban-x-pku)
    Transliteration rules developed at Puri Kauhan Ubud and widely used in Bali.
    Also the default Balinese to Latin transliteration variant.


中文

白話文(即標準漢語(cmn),在維基媒體網站使用漢語大語言代碼zh)有兩種主要的書寫系統:簡體中文(zh-Hans)和繁體中文(zh-Hant),並且在不同的華語地區有不同的本地化詞彙和語法。

中文維基百科(zhwiki)和其他zh.wiki*项目[note 1]支持六個變體:

  1. 大陆简体(zh-Hans-CN)
  2. 香港繁體(zh-Hant-HK)
  3. 澳門繁體(zh-Hant-MO)
  4. 大马简体(zh-Hans-MY)
  5. 新加坡简体(zh-Hans-SG)
  6. 臺灣正體(zh-Hant-TW)

在URL、維基數據標籤和可能的資料庫方案中,變體標籤被簡化為zh-cn、zh-hk、zh-mo、zh-my、zh-sg和zh-tw。然而,對於維基數據,目前還不清楚(連同 zh-hans、zh-hant和原始「zh」)哪些變體應該使用,哪些不應該使用。

Special:Translate驅動的/zh翻譯頁面也支持這些變體。

錯誤報告和功能請求可以提交至中文維基百科(對於有語言隔閡的用户,請至討論頁)。

哥德语

哥德语有兩種書寫系統:拉丁字母和哥德字母。哥德语維基百科的每個頁面都已啟用自動轉寫系統。

2024年更新:此自動轉寫系統在Vector 2022中不起作用,請考慮切換到其他外觀以進行轉換。

伊努克提圖特語

加拿大伊努克提圖特語有兩種書寫系統:在努納武特部分地區使用伊努克提圖特音節文字,其它地區則使用拉丁字母。兩者之間的自動轉換系統已經建立。由於音節文字沒有大寫字母,因此從音節文字轉換至拉丁字母僅顯示小寫拉丁字母。

伊努克提圖特語維基百科啟用了自動轉換,需注意的是變體代碼並非維基百科的「iu」(ISO 639-1大語言),而是使用ike-Cans表示音節文字,ike-Latn表示拉丁字母。

库尔德语

Tracked in Phabricator:
Task T199895

库尔德语視地區使用三種書寫系統:

  • 土耳其和敘利亞使用拉丁字母,
  • 伊拉克和伊朗使用阿拉伯字母,以及
  • 前蘇聯使用西里爾字母,但由於不再使用,因此此處不導入該文字系統。

库尔德语維基百科支持拉丁字母和阿拉伯文字之間的自動轉換系統。

库尔德语拉丁-阿拉伯轉換器

The Tachelhit language has two writing systems: Tifinagh and Latin, some materials also mentioned that Arabic scripts were used to describe, but they're even too old to be useful in this topic.

An automatic transliteration system from Tifinagh to Latin has been supported on the test wiki, the reverse conversion is recently deployed later to the MediaWiki software.

吳語

Tracked in Phabricator:
Task T59138

吳語有兩種主要的書寫系統:簡體字和繁體字。

兩種書寫系統之間的自動轉換和zhwiki一樣是值得擁有的。最近自MediaWiki 1.41起受支持。

部分支持

粵語

粵語繁體字轉簡體字

粵語可以用繁體字或簡體字書寫。

粵語維基百科有一個以JavaScript編寫的繁簡單向轉換系統小工具。(這個phabricator任務詳細介紹了將其更改為系統提供的轉換器的流程。)所有文章均以繁體字編寫和編輯,因為從繁體到簡體的轉換比從簡體到繁體的轉換更可靠:部分繁體字被合併為同一個簡體字。

粵語繁體字轉羅馬字

粵語也可以用羅馬字母書寫。現存主要有三種粵語羅馬字變體:Penkyamp方案粵拼粵語耶魯拼音 。粵語維基百科的最終目標是將所有三個拼音版本整合為轉寫功能,使粵語和非粵語母語的讀者能夠閱讀所有三個拼寫法的粵語文章。

Penkyamp轉寫工具可以在這裡找到。完整的漢字轉Penkyamp列表可以在這裡找到。

克里米亞韃靼語

Tracked in Phabricator:
Task T23582 resolved
Tracked in Phabricator:
Task T326864 invalid

克里米亞韃靼語有三種主要的書寫系統:拉丁字母、西里爾字母和阿拉伯文字。

克里米亞韃靼語維基百科主要使用拉丁字母,但自克里米亞被俄羅斯併吞以來,西里爾字母一直被用作該地區事實上的官方文字。

with major works to the MediaWiki core code base, the conversion between Latin and Cyrillic is developed to crhwiki and crh test Wiktionary projects.

We are still waiting volunteers on opinions about Crimean Tatar Arabic scripts, should there also have crh-arab conversion opinion? Let us know your opinion on talk page.

Since January 2023, there are also discussions to add supports for Dobrujan Tatar (crh-RO), a dialect of Crimean Tatar language in Romania.

贛語

贛語有三種主要的書寫系統:簡體字和繁體字,以及羅馬拼音。

贛語維基百科目前有簡繁自動轉換系統,但沒有羅馬拼音轉換系統。自動轉換為贛語羅馬字是值得擁有的,以便非贛語使用者更容易學習和理解贛語。

The Serbian language has two writing systems, Cyrillic (sr-Cyrl) and Latin (sr-Latn), with two major dialects. So there are in theory four variants in the language:

  1. Cyrillic alphabet Ekavian (sr-Cyrl-ekavsk)
  2. Latin alphabet Ekavian (sr-Latn-ekavsk)
  3. Cyrillic alphabet Ijekavian (sr-Cyrl-ijekavsk)
  4. Latin alphabet Ijekavian (sr-Latn-ijekavsk)

The Serbian Wikipedia supports an auto-converting system for the two writing systems, but not dialects since there are few difference between those.

Currently the variants' codes are wrong, "sr-ec" and "sr-el"; they are waiting for patches to fix.

Tracked in Phabricator:
Task T268033 resolved

Serbo-Croatian is a pluricentric language with four standardized varieties (Bosnian, Croatian, Montenegrin and Serbian), two major pronunciations (Ijekavian and Ekavian), and two writing systems/scripts: Latin (sh-Latn) and Cyrillic (sh-Cyrl). Based on consensus between its editors, a one-way Latin-to-Cyrillic transliterator has been implemented on Serbo-Croatian projects on 1 December 2022 (except Serbo-Croatian Wikivoyage test project, which it uses hbs and currently hbs doesn't have conversion supports).

A proposal to implement the converter for both scripts was proposed here.

The Tajik language uses three writing systems by region,

  • Cyrillic alphabet in Tajikistan.
  • Arabic alphabet in Afghanistan.
  • Latin alphabet.

Tajik Wikipedia currently has an auto-converting system for two of the writing systems (Cyrillic - Latin) but not into Perso-Arabic.

See references for Cyrillic - Perso-Arabic converting system developement at tajpers.narod.ru.

Tracked in Phabricator:
Task T258975 resolved

The Talysh language has three writing systems: Latin, Cyrillic and Perso-Arabic.

A one-way automatic transliteration system from Latin to Cyrillic was developed; no support for the reverse yet, nor any supports for Talysh Arabic script.

The Uzbek language has three writing systems:

  • Latin,
  • Cyrillic and
  • Arabic alphabet.

Uzbek Wikipedia currently has an auto-converting system for two of the writing systems (Latin - Cyrillic) but not into Perso-Arabic.

An automatic conversion between the three writing systems is desirable since the Perso-Arabic script is used in Afghanistan. Converter into Arabic could be developed, and if oneday deployed, the Southern Uzbek Wikipedia Test would be unnecessary.

有待實施的自動轉換系統的語言

Wikis in those languages don't support automatically language conversions, but there are useful external tools to help readers to read wikis in different scripts. Hopefully, in the near future, those tools can be introduced to the wikis, or even the MediaWiki software.

Regarding language scripts used on those wikis:

  1. Either just picked up the most used one script;
  2. Or have pages in at least two scripts, that may or may not have templates for navigation.
Tracked in Phabricator:
Task T31218 declined

Azerbaijani language has three writing system: Latin, Cyrillic and Perso-Arabic alphabet.

The Azerbaijani Wikipedia is written in the Latin script.

However due to the incompatibility of the Latin and Perso-Arabic scripts a South Azerbaijani Wikipedia was created in July 2015.

An automatic conversion between the Latin and Cyrillic scripts is desirable to make the wiki readable for Azerbaijanis living in Dagestan.

The Batak languages can be written using the Latin script and the Batak script (Surat Batak). There is already Latin - Surat Batak converter [2].

Belarusian (Classical and Official orthographies)

The Belarusian language has two writing systems, Cyrillic and Latin.

In addition, this language is written in two spelling varieties, Classical Belarusian (used until 1933) and in the Russifying Official Belarusian introduced in 1933. This situation necessitated the creation of two separate en:Belarusian Wikipedias. Both are written in Cyrillic.

Hence, the introduction of a Latin converter is a pressing need for both, especially for the en:Belarusian diaspora and the Belarusian democratic opposition.

There is also a versatile convertor that converts between Cyrillic and Latin, and between Classical and Official Belarusian:

Furthermore, this converter also offers conversion into Archaic, that is, Old, Belarusian, which is none other but the en:Ruthenian language, written either in Cyrillic or Latin letters.

NB1: The following converter should be avoided:

because it does not convert from the Belarusian Cyrillic to the Belarusian Latin alphabet, but transliterates the Belarusian Cyrillic on the model of the Russian romanization in line with the official document en:Instruction on transliteration of Belarusian geographical names with letters of Latin script, which denies any official role to the Belarusian Latin alphabet.

Last but not least, until the mid-20th century Belarusian was written by Muslims in a third national alphabet, namely, in Arabic letters, known as the Belarusian Arabic alphabet. No Cyrillic/Latin - Arabic converter has been developed yet, but some shcolars are working to this end. See also Revised Proposal to encode Arabic characters used for Bashkir, Belarusian, Crimean Tatar, and Tatar languages.

NB2: In late 2021 a project of the Latin alphabet-based Belarusian Wikipedia, that is, the Biełaruskaja Wikipedyja łacinkaj, commenced.

The Buginese language can be written using the Latin script and the Lontara script. There is already a Latin - Lontara converter, which need only small edits to be ideal. There is also Latin - Aksara Lontara online converter [3].

The Chechen language has 2 writing systems: Cyrillic and Latin alphabet.

An automatic conversion from Cyrillic into Latin writing systems is desirable since many Chechens living outside of the Russian Federation cannot read Cyrillic.

果阿孔卡尼語

孔卡尼語有五種書寫系統:天城文、拉丁字母、卡納達文、阿拉伯文字和馬拉雅拉姆文。果阿孔卡尼語維基百科有使用天城文、拉丁字母和卡納達文書寫的條目。儘管存在一個文字轉換器项目,但尚未開發。

在wiki上缺乏系統的情況下,外部工具Konkanverter被用來手動轉寫文字。

It needs to be investigated whether MediaWiki's LanguageConverter system can be used to implement the script conversion.

Girgit, a tool for transliteration between the three scripts has been released under the GPL. It is worth investigating whether it can be integrated to the Konkani Wikipedia.[3][4]

The Karakalpak language has two writing systems, Latin and Cyrillic.

Currently kaawiki is using the Latin script, and doesn't have a conversion system

There has a Karakalpak converter on Transliteration.kpr.eu, it supports conversion from Cyrillic to Latin, but the reverse conversion isn't working for now.

The Kyrgyz language has three major writing systems. These are Cyrillic Kyrgyz, Latinized Kyrgyz, and Perso-Arabic Kyrgyz (used in Xinjiang, China).

An automatic conversion between the three writing systems is desirable since the Kyrgyz in China do not use Cyrillic.

Arabic to Cyrillic converter is under developement (tentative source codes) so that Chinese Kyrgyz can also contribute to Wikipedia even without knowledge of Cyrillic.

Laz

The Laz language has two writing systems: Georgian script and Latin script. An automatic conversion into Georgian would be desirable to enable more Laz users from Georgia.

The alphabet is on Wikipedia, in Georgian and Latin.

The Polish language is typically written in Latin letters. Yet, in western Belarus Catholics mostly identify as Poles and speak the local Slavic vernacular, defined as Polish. However, they have no knowledge of the Latin alphabet. Hence, (mostly devotional) Polish-language books are published for them in Cyrillic.[5]

Supplying the Polish Wikipedia with a converter to such Polish Cyrillic would enable this Polish minority population of 300,000 to enjoy access to the Polish Wikipedia, which is one of the world's largest wikipedias.

There are some readily available converters of this kind, namely

The Sindhi language can be written using modified Persian alphabet and Devanagari script. Most Sindhi people youth in India do not know the Persian alphabet, and use Devanagari, leaving the current Wikipedia available solely for those in Pakistan.

A Sindhi Arabic to Devanagari Conversion tool can be created (based on this table and this table), tested and then installed on Sindhi Wikipdia in order for Sindhi articles to be read in the Devanagari script at the click of a tab. That also eliminates the need to have a separate wiki written in Sindhi Devanagari.

The Sundanese language can be written using the Latin script and the Sundanese script (Aksara Sunda). There is already Latin - Aksara Sunda converter [4].

The Tatar language has three major writing systems. These are Cyrillic Tatar, Latinized Tatar, and Perso-Arabic Tatar.

An automatic conversion between the three writing systems was very desirable in order to avoid Tatar script conflicts.

As of September 2021, there's a Tatar Cyrillic to Latin conversion tool available at baltoslav.eu, but no reverse conversion supports yet.

The Turkmen language has three writing systems: Latin (used in Turkmenistan), Perso-Arabic alphabet (used in Iran and Afghanistan) and Cyrillic (historically used in Turkmenistan).

An automatic conversion between the three writing systems is desirable because although officially, Turkmen is rendered in the Latin alphabet, the old Cyrillic alphabet is still in wide use and many political parties in opposition to the authoritarian rule of President Niyazov continued to use the Cyrillic alphabet on websites and publications, most likely to distance themselves from the alphabet that Niyazov created.

The Uyghur language has three writing systems, Arabic, Latin and Cyrillic.

The Latin alphabet is used by Uyghurs in Turkey, Western countries and parts of Xinjiang, the Cyrillic alphabet is used in CIS countries whereas the Perso-Arabic script is used officially in Xinjiang.

An automatic conversion between the three writing systems is desirable to prevent conflicts between users with different preferences. Actually that's existing: Yulghun.

曾經有,現已移除自動轉換系統的語言

哈薩克語

Tracked in Phabricator:
Task T268143 resolved
Tracked in Phabricator:
Task T350684 resolved

哈薩克語有三種書寫系統:西里爾字母(kk-Cyrl)、拉丁字母(kk-Latn)和波斯-阿拉伯文字(kk-Arab)。

2023年末,哈薩克語的MediaWiki語言轉換器被移除。

沒有自動轉換系統的語言

Unfortunately, those languages are having no supports on language conversion, either within wikis or externally. The problems regarding scripts used by their contents are same as above section. Sorted according the similarity of the required conversion system.

Hopefully, in the near future, the language conversion tools can be developed and deployed for them.

阿拉伯、西里爾和拉丁字母

The Shughni language has three writing systems: Latin, Cyrillic and Perso-Arabic alphabet.

The Shughni Wikipedia test is written in the Cyrillic, Latin and Arabic scripts.

An automatic conversion at Wikimedia Incubator between the Latin and Cyrillic scripts is desirable to make the wiki readable for the 40,000 Shughni people in Tajikistan and 20,000 Shughni in Afghanistan. Transliteration to the Shughni arabic script can be made at a later date.

西里爾和拉丁字母

Bosnian language uses two writing systems: Latin and Cyrillic alphabet. Currently Bosnian Wikipedia uses Latin scripts, but no Cyrillic support. Some materials mentioned that Bosnian language was using Arabic scripts before 1900s, but not useful for modern develops.

A Cyrillic-Latin converter for Bosnian would be perfect.

It's possible that Lojban can be written in both Latin and Cyrillic, see Lojban grammar Wikipedia article.

諾蓋語

諾蓋語可以用西里爾字母和拉丁字母書寫,在孵育場的諾蓋語測試維基百科主要使用西里爾字母書寫,但社群詢問是否也可以用拉丁字母顯示內容。

羅馬尼亞語

Tracked in Phabricator:
Task T169453 declined

羅馬尼亞語可以用拉丁字母或西里爾字母書寫。目前羅馬尼亞語維基百科僅使用拉丁字母,因為一些用户認為西里爾羅馬尼亞語應該標記為「摩爾多瓦語」。

起因於摩爾多瓦語維基百科刪除提案,兩種書寫系統之間的自動轉換也被提起。然而,由於一些大規模的社群利益衝突,如今相關提案已陷入膠著,不太可能被再次觸及。

前孵育場管理員解釋說,Fandom上有一個西里爾羅馬尼亞語(或摩爾多瓦語,如果你喜歡的話)项目。

The Vlax Romani has major two major writing systems. These are Latinized Romani, and Cyrillic Romani.

阿拉伯和拉丁字母

The Brahui language has two main writing systems: Arabic script and the Latin script. This is because:

  1. The current online Arabic keyboard does not contain the required number of vowels for Brahui.
  2. Sometimes vowels are used as consonants depending upon their position in a word. This is quite confusing for people who are getting literacy instruction in the Brahui language.

A system that can convert between the two scripts would help resolve script issues from hindering the growth of the language.

Komering language has three major writing systems: Latin (officially used), Arabic (used by local Muslims), and Komering (but currently doesn't registered at Unicode, where they treat this as Rejang scripts). An idea to consider developing a conversion system is discussed at incubator:Talk:Wp/kge/Halaman Utamo.

Malay language is normally written using Latin alphabet called Rumi, although a modified Arabic script called Jawi script also exists. Rumi and Jawi are co-official in Brunei. Efforts are currently being undertaken to preserve Jawi script and to revive its use amongst Malays in Malaysia, and students taking Malay language examination in Malaysia have the option of answering questions using the Jawi script. The Latin alphabet, however, is still the most commonly used script in Malaysia, both for official and informal purposes.

An automatic conversion from Latin to Jawi script should be set up.

References:

阿拉伯和婆羅米系文字

The Haryavni language has two writing systems, they are Devanagari used in India, and Shahmukhi (a modified Arabic script) used in Pakistan.

Currently the Haryavni Wikipedia test on Incubator has much more articles written in Shahmukhi (being populated since later 2023), and some finger-counted articles written in Devanagari created at least five years ago.

Tracked in Phabricator:
Task T12034 declined

The Kashmiri language has three writing systems. These are Devanagari Kashmiri, Perso-Arabic Kashmiri and Romanized Kashmiri.

An automatic conversion between the three writing systems is very desirable in order to avoid Kashmiri script conflicts. However, an accurate conversion script is very difficult to develop (see also [5])

Punjabi

There are several different scripts used for writing the Punjabi language. In the Punjab province of Pakistan, the script used is Shahmukhi and is essentially the same as the Urdu script. In the Indian state of Punjab, Sikhs and others use the Gurmukhī script. Hindus, and those living in neighbouring Indian states such as Haryana and Himachal Pradesh sometimes use the Devanāgarī script. Shahmukhi and Gurmukhī scripts are the most commonly ones used for writing Punjabi and are considered the official scripts of the language.

What about the set automatic Gurmukhī - Shahmukhi transliteration based on this source [dead link] like in e.g. Kazakh wikipedia.

So every one can read these both wikis in Gurmukhī or Shahmukhi scripts.

The Tamil language can also be written in Arwi (Tamil Arabic script). A Tamil to Arwi Conversion tool can be created, tested and then installed on Tamil Wikipdia in order for Tamil articles to be read in the Arabic script at the click of a tab. That also eliminates the need to have a separate wiki written in Arwi.

婆羅米系文字和拉丁字母

The Meitei language can be written using Meitei (or Meetei Mayek), Bengali and Latin scripts, and has several dialects. An automatic conversion system was proposed on Incubator, see incubator:User talk:Artoria2e5#A query.

The Pali language can be written using Devanagari, Brahmi and Latin scripts. An automatic conversion system was proposed here.

The Sylheti language can be written using Sylheti Nagri and Bengali scripts. The Sylheti test projects on Incubator are exclusively using Sylheti Nagri, and only use Bengali scripts in some talk pages.

A proposal to create conversion system is discussed at langcom mailing list, but a survey at Incubator shown that some contributors said something against implementation of such a conversion system.

漢字和拉丁字母

Automatic Han to Latin conversion may be difficult but perhaps possible with reasonable accuracy. Completely automatic Latin to Han conversion is either impossible or extremely difficult and will almost certainly be inaccurate without knowledgeable human intervention (indeed, this is a similar problem to an input method for Han characters). Without the latter, only contribution in Han is possible. This would then disadvantage contributors who only know the Latin orthography.

閩東語

閩東語有兩種主要的書寫系統:繁體字和福州話羅馬字(其書寫系統稱為「平話字/Bàng-uâ-cê」)。

Mindong Wikipedia currently does not have an auto-converting system for the two writing systems. An automatic conversion from Traditional Chinese characters into Romanized Foochowese would be desirable to avoid conflicts between users with different preferences and enable users to comprehend the meaning of every word more easily.

6104個常用漢字轉福州話羅馬字的列表可以在這裡找到。

閩東語轉寫工具可以在這裡找到。


客家話

客家話有兩種主要的書寫系統:繁體字和客家話羅馬字(查看現有漢字 --> 客語字典)。

客家話維基百科目前還沒有兩種書寫系統的自動轉換系統。從繁體字到客家話羅馬字的自動轉換是值得擁有的,以避免不同偏好的使用者之間的衝突,並讓用户更容易理解每​​個字的含義。

4000個常用漢字轉客家話羅馬字的列表可以在這裡找到。

閩南語

閩南語有兩種主要的書寫系統:閩南語羅馬字和閩南語繁體字

閩南語羅馬字和閩南語繁體字兩種書寫系統之間的自動轉換是值得擁有的,以避免具有不同文字偏好的用户之間的現有衝突

越南語

曾有人建議將孵育場曾經的喃字維基百科測試项目與使用國語字的越南語維基百科結合起來,這將是極其困難的。

孵育場已不存在Wp/vi-nom。這裡有一個基本同等的项目。

不同的拉丁文字/正寫法

Norwegian (Bokmål and Nynorsk)

The Norwegian language, while is in nowadays only using Latin scripts, has several major orthographies, too hard to count the detail numbers.

Currently the well known orthographies are:

  1. Bokmål, the Norwegian Wikipedia currently uses, the supreme-court-defined official orthography, and probably the one that Google Translate supports (as that only supports one "Norwegian"), or may be other machine translation tools;
  2. Riksmål, probably also used by Norwegian Wikipedia, though the evidences are not yet provided, no IETF language tag as of September 2021;
  3. Nynorsk, the Nynorsk Norwegian Wikipedia currently uses;
  4. Høgnorsk, IETF language tag hognorsk, also used on nnwiki, but only on some pages that can be counted by fingers (see nn:Special:Prefixindex/Nn/)

There were some historic recordings on nowiki that their wiki was just one Norwegian Wikipedia, but later the Nynorsk Norwegian speakers passed a consensus to split their articles, to found a nnwiki, and nowiki is de facto Bokmål Norwegian Wikipedia. There are, however, other users don't agree with histories, and want to merge both back to one nowiki, using scripts to convert them.

閩南語

有人在用户討論頁留言提出了自動轉換兩種主要的拉丁字母正寫法的可能性。這兩種分別是白話字(POJ)和臺羅拼音Tâi-lô)。兩者的字母(包含複合字母)間可1對1轉換。轉換表是可用的ang:(古英語)的簡單文字轉換工具可能可以用。順帶一提,如果實施得當,它甚至可以充當基本的拼字檢查器。另請參閱這篇論文這篇部落格文章

The Nigeria Yoruba and the Benin Yoruba orthographies are different. The Yoruba Wikipedia uses the Nigeria Yoruba spelling.

The Nigeria Yoruba orthography is based on Samuel Crowther’s 1852 orthography, which was influenced by the Church Missionary Society writing system. The Nigeria Yoruba orthography rules were standardized during 1875 Yoruba Orthography Conference. In 1966, the Western Nigeria Ministry of Education set up a committee to review the orthograpic rules and the Report of the Yoruba Orthography Committee was published in 1969 and following reactions, a larger committee published the Report of the Enlarged Committee on Yoruba Orthography in 1972.

In 1971, the Joint Working Party was set-up to achieve practical reforms in multiple Nigerian languages, and the Yoruba Working Party accepted most of the recommendations of the Orthography Committees. In 1974, the Joint Consultative Committee on Education, set-up by the Federal Ministry of Education, approved that the recommendations of the Joint Working Party be used by all Ministries of Education in Nigeria and the West African Examinations Council.

The Benin Yoruba orthography is based on the Benin National Alphabet created by the National Linguistic Commission in 1975 and adopted in law the same year. The Benin National Alphabet defines several Benin language orthographies, including a Yoruba one. The national alphabet was updated a few times, including in 1990 and in 2006.

The main difference between the Nigeria Yoruba and the Benin Yoruba orthographies are as follow: ẹ ọ p ṣ in Nigeria are spelled ɛ ɔ kp sh in Benin.

西里爾、拉丁和蒙古字母

The Kalmyk language can be written using the Cyrillic script and the Todo script.

An automatic conversion between the two writing systems are necessary because the 'Kalmyks' (known as Oirats in China) use the Todo script only.

The Manchu language has three writing systems: Manchu script, Jurchen script, and the Latin script.

  1. The Manchu language is near extinction in terms of native speakers, however a lot of enthusiasts and academics are learning it as a second language. When they learn it, in China I believe they mainly use Manchu script and in the west they learn the language in both the latin and Manchu scripts.
  2. A little snag we might run into is the fact that Manchu script is normally written vertically, from up to down. However, if need be, that rule can be bent and we can do it horizontally and people can manually rotate their screens if they wish to read it in Manchu script.
    The vertical script is now supported.
  3. The Jurchen script is used for writing an earlier stage of Manchu, the Jurchen language. If it ever works out properly in unicode, we might create a separate Jurchen wikipedia like how we have separate modern and old English wikipedias.
  4. All in one, it was required too many times by langcom that conversion system for Manchu should be deployed as soon as possible.

The Mongolian language can be written using the Cyrillic script, the Classical Mongolian script and the ’Phagspa script see unicode(Mainly for art).[6].

An automatic conversion between the three writing systems are desirable to prevent the creation of a Mongolian Wikipedia written in the Classical Mongolian script and the Latinized Mongolian script.

The Xibe language can be written using either Latin script or Xibe scripts. Currently the Xibe test Wikipedia has many contents in Xibe scripts, previously many of them were using Latin, they were manually converted to Xibe in later 2023.

An automatic conversion between both writing systems is desirable for readers.

其它轉換系統

Peul/Fulfulde has two major writing systems. Latin script, en:Adlam script. Arabic Ajamiya is also used in Cameroon and neighbouring countries.

There are already some pages that have been converted manually, for example: Gine/adlam

en:Javanese language is the language primarily spoken in the island of Java, and also by the Javanese diaspora in Indonesia and Suriname.

(1) There are two writing system: traditional Hanacaraka (also called Carakan, an Abugida script) and Latin. Latin is more prevalent to the extent of almost all publication in Javanese (albeit only in small number) are all in Latin. A one-to-one conversion is possible from Latin to Hanacaraka. Hanacaraka only recently (2009) got it's own Unicode, and there exist a Hanacaraka Unicode font and several non-Unicode fonts. Since the Unicode hasn't been supported by TrueType, it's using SIL's Graphite.

Currently Javanese Wikipedia already request WebFont to be implemented. In the future it is desirable to see automatic conversion like the Chinese or Cyrillic projects.

(2) Another thing to be considered: Javanese language has (at least) two registers (sets of vocabulary) based on social standing: polite/palace Javanese (krama) and brash/market Javanese (ngoko). Both are used in Central Java, the former is more commonly used in publication, while the latter are more commonly used in conversation. In some places the usage of the latter is also found in publication, mainly in Suriname (for example the ngoko language is used in Suriname-Javanese Bible, which to the eyes and ears of the Javanese people would be vulgar), where the former is no longer in use, due to historical and geographical reasons.

The same also true for East Javanese people, who opposed vehemently the use of the former due to its association with aristocracy, and for people from other ethnicity all around Indonesia. Therefore there are four combinations/variants in Javanese language:

  • Hanacaraka krama
  • Latin krama
  • Hanacaraka ngoko
  • Latin ngoko

Converting from krama to ngoko sometimes only requires one-to-one mapping of vocabulary, but in other instances requires one-to-many or many-to-one, or even a change in the grammar.

(3) Historically, there's also third (and even fourth) script that was used to write Javanese, that is Arabic script (called Pegon alphabet and Arab gundul alphabet), and long before that, Sanskrit/Pallava (Old Javanese/Kawi script). http://www.omniglot.com/writing/javanese.htm

The use of these old scripts would in Wikimedia projects is still non-existent, but probably in the future would be beneficial for Wikisource and Javanese Wiktionary

(4) Javanese Hanacaraka is still related to Sundanese and Balinese language, and Wikimedia projects currently has Sundanese Wikipedia and its sister projects, and Balinese Wikipedia.

韩语

韓語維基百科Dajimo中有關於引入朝鲜汉字한자漢字 hanja)以使用自動轉換的討論。

關於韩国和朝鲜之間的韓語语法差異也有一些討論,但對文字轉換的必要性仍在分析中。

The Ladino language has major two major writing systems. These are Latinized Ladino, and Rashi script (variant of the Hebrew script).

An automatic conversion between the two writing systems are desirable to prevent the duplication of articles. However, this can meet a very hard-to-resolve technical challenge, see talk page for details.

Tagalog language can be written in Latin or Baybayin scripts. But as Baybayin scripts are shelted by local governments, it seems that there are lack of supports on a potential conversion system.

註釋

  1. 截至2021年9月,中文維基詞典和維基文庫僅啟用簡繁轉換系統;而在中文維基教科書、維基新聞和維基語錄中,zh-Hant-MO合併進zh-Hant-HK,zh-Hans-MY合併進zh-Hans-SG

參見

More lists of Wikipedias by various criteria :  [ 编辑 ]