User talk:Yes0song/Automatic conversion in Korean language
I'm afraid calling the versions ko-kr and ko-hanja is a bit misleading considering the use of such codes in the new HTML and XML language tags – people might think that ko-hanja is Korean language written exclusively in hanja. How about the following:
- ko-Hang-KR – for ‘South Korean orthography, hangeul only’,
- ko-Hang-KP – for ‘North Korean orthography, hangeul only’ and
- ko-KR (or just ko) – for ‘Korean mixed script (hanja and hangeul)’?
Tagging the current style as "ko-Hang" would make sense, since it's hangeul only, but tagging something as "ko-hanja" (or "ko-Hant") would make less sense, since it's actually not just Hant, but mixed with hangeul. Wikipediatrician 15:20, 2 December 2006 (UTC)
- In ISO15924, there is
Korefor Han(Hanja)-Hangul mixed script. --Artoria2e5 (talk) 17:59, 26 May 2016 (UTC)
As Wikipedia:Manual of Style (Korea-related articles)#Spaces between words says, “[w]hile Hangul and mixed script (Hangul and Hanja together) use spaces between words, text written only in Hanja is usually written without spaces. Thus, gosok doro (‘freeway’ or ‘motorway’) is written as 고속 도로 (with a space) in Hangul, but as 高速道路 (without a space) in Hanja.”
Assuming that the default method will be to write articles in mixed script and then allow conversion to hangeul-only, what's the best way to handle spaces? Which of the following options are feasible, and which is the best?
- When editing articles, insert spaces as you would if you were writing in hangeul-only. This gives two possibilities:
- Readers enjoying their articles in mixed script will have to live with spaces within hanja terms (example: 高速 道路).
- The system will automatically strip text of spaces appearing in between hanja (example: 高速道路). The problem with this approach is that you would have to find a way to keep spaces between terms and remove only those within single terms. This will mostly affect “shortened” language without all the suffixes that usually make clear where one word ends and the next one begins, but will not be an issue with normal, grammatical article texts.
- When editing articles, spaces within a sequence of hanja may be omitted, and…
- …the system will insert them automatically when converting to hangeul-only (example: 고속 도로). I think this is near impossible.
- …as the system will be unable to automatically insert spaces correctly, readers choosing hangeul-only display will have to live with the spaces' absence (고속도로, or even monsters such as 국회의원선거권자과반수 “more than one half of voters eligible to vote in elections for members of the National Assembly”).
Wikipediatrician 03:27, 30 December 2006 (UTC)
Legal database for the converter
Wouldn't there have to be a free (that is, non-proprietary) converter, along with a free database containing hangeul values for all hanja? Note that the legality of the hanja database that is Wiktionary may be challenged sooner or later; see wikt:Wiktionary:Beer_parlour_archive/October_06#Han_characters_2. Wikipediatrician 23:25, 9 January 2007 (UTC)