User:Yes0song/Automatic conversion in Korean language

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
  • Caution:
    • This is not my "current" opinion. This is a "historical" suggestion.

※ Caution: Because my English ability is not good, English in this page can be odd. Sorry -_-;

This is my proposal and idea about the automatic conversion in Korean language Wikiprojects (such as Korean Wikipedia, etc.). The automatic conversion system in Korean was planned that conversion of Hanja->Hangul and between North and South Korean.

Korean language has many variant dialects and writing system. So this proposal was limited in South Korean (Hangul only; ko-kr), South Korean with Hanja (ko-hanja), North Korean (Hangul only; ko-kp).

If you have any question, leave a message in en:User talk:Yes0song or ko:사용자토론:Yes0song.

ko-kr -> ko-hanja[edit]

It is impossible. Computers cannot understand human language, so they cannot distinguish among homonyms.

ko-hanja -> ko-kr[edit]

It needs some reconstruction of MediaWiki. Current MediaWiki uses Unicode Normalization Algorithm, and changes hanja letters in CJK Compatibility Ideographs -> hanja letters in Unified CJK Ideographs. In KS code, though a hanja letter is same, if it has variant sounds, it was mapped in variant codes (unlike Chinese or Japanese code). For example, 李 has two sounds and in Korean, so it was mepped in two different code. In Unicode, a representative letter is mapped in Unified CJK Ideographs, others are mapped in CJK Compatibility Ideographs.

Because Unicode Normalization Algorithm on Hanja vanish sound information about letters, the algorithm is an obstacle to Hanja->Hangul Automatic Conversion System. So I think this algorithm must be off in the system.

There are examples below (Unicode Normalization Algorithm off):

  • (: 樂)
  • (: 樂)
  • (: 樂)
  • 樂 (樂: 樂)

Hanja letters that KS Code does not include needs special informations (sound of characters, initial law (두음법칙, 頭音法則), 사이시옷(ㅅ), etc.).

Special conversion formations[edit]

  1. ko-kr: '한글(漢字'; ko-hanja: '漢字'
  2. ko-kr: '한글(漢字', ko-hanja: '漢字(한글'
  3. ko-kr: '한글(漢字', ko-hanja: '漢字(한글'

I think it needs special tags or templates for these.


This section is my proposal a new faculty of MediaWiki.

Original wikitext.png

If an article was written in Ko-Hanja, it is difficult to edit for some other editors.

Hanja byeonhwan wikitext.png

This is changed form that hanja to hangul. This screen is not original wikitext, but comfortable to some editors. I propose MediaWiki support such fuction.

ko-kr/ko-hanja ↔ ko-kp[edit]

It need to make vocabularies DB between North and South Korean.