Community Wishlist Survey 2017/Wikisource/Create new Han Characters with IDS extension for WikiSource

From Meta, a Wikimedia project coordination wiki

Create new Han Characters with IDS extension for WikiSource

  • Problem: Han-character (en:logogram, include en:Chinese Characters, en:Hanja, and en:Kanji)- is widely used in East Asia (China, Taiwan, Singapore, Mandarin area in Malaysia, HongKong, Japan, Korea, Taiwan and Vietnam). An enduring problem unsolved for digital archiving is "lacking of characters". Not only for characters in ancient books, even modern publications lacks for characters ( i.e. Some authors may created 300-400 unique new characters in certain books). It's difficult to deal when we archive them into WikiSource. Unicode gradually add new characters into the chart, but new Uni-han extension always takes time to go live. In the past WikiSource,even Wikipedia, used to deal this problem with image files to present those characters. But images cannot be indexed, unsearchable, even not exchangeable between computer systems.
  • Who would benefit: Mostly the contributors and readers of Chinese Wikisource. However, if this way is available, all Wikimedia projects in languages that use Han characters will be benefited. (such as Japanese, Vietnamese, Korean, and Chinese dialects version like Classical Chinese, Hakka, Wu, or Gan., )
    1. Further more, even Wikipedia (Zh Wikipedia already used a lot of lacking characters,now .) and Wiktionary also are benefited.
    2. Other 2D composite characters writing system: For instance, Ancient Egypt and Maya.
  • Proposed solution: Unicode IDS -Ideographic Description Sequence- defined how to composite Han character with components. We implement the function to dynamically render Han character with Ideographic Description Sequences(IDS) and extension in WikiSource like: <ids>⿺辶⿴宀⿱珤⿰隹⿰貝招</ids> It will generate a Han character image file(now rendered on the temporary server on wmflabs ) with IDS in metadata. This is a solution to resolve lacking of Han characters problem on all C/J/K/V books. The basis is that Han characters are not as the same level as European alphabets,but words. Han characters are an open set. They are composited on 2 dimension by more basic components which owns basic element ,like "affix" in English (English words are composite on 1 dimension). In academies,components based Han character composite technology are developed and adapted to handle ancient Han books. The most famous are Academia Sinica 's development and cbeta Sutras plan. Recent years, opensource IDS renders are developed stable, so we can use the same technology to benifit Wikisource for handling Han ancient books as the same as those academies.
  • More comments:


  • IMO there's no reason to limit this to Wikisource, as Wiktionary could also benefit a lot from this. NMaia (talk) 00:35, 28 November 2017 (UTC)Reply[reply]
  • Question Question: I support the general need to display unencoded characters. However, personally I think the quality of the generated characters is regretfully a bit substandard. Simply compressing each component together into a block is not aesthetic. Using images instead of web-fonts in this day and age is also suboptimal (even if it is SVG).
    The creator of this extension has probably poured their heart and soul into creating it, but may I suggest some sort of partnership with GlyphWiki instead? It is a website designed for hosting hanzi. Glyphs can be manually created and stored under IDS names, and the glyphs can be used in fonts. GlyphWiki supports generation of webfonts. Suzukaze-c (talk) 03:01, 3 December 2017 (UTC)Reply[reply]