Wikimedia Blog/Drafts/Converting from non Unicode (Nudi, Baraha, ...) font encoding to Unicode Kannada

From Meta, a Wikimedia project coordination wiki

Not published. Ed Erhart (WMF) (talk) 17:27, 24 June 2015 (UTC)[reply]

Title ideas[edit]

  • Converting from non Unicode (Nudi, Baraha, ...) font encoding to Unicode Kannada
  • Non Unicode to Unicode Kannada

Summary[edit]

A brief, one-paragraph summary of the post's content, about 20-80 words. On the blog, this will be shown in the chronological list of posts or in the featured post carousel on top, next to a "Read more" link.

  • A tutorial on converting Kannada text in old legacy font encoding like Nudi, Baraha, to Unicode Kannada.

Body[edit]

Kannada non-Unicode text when opened in Notepad

People have been using computers for typing and printing Kannada text for more than 25 years. Kannada typesetting on computers has been most popular in the world of desktop publishing (DTP); people make use of specialised software packages to lay out the pages.

Even now, many people still use these packages for Kannada DTP work. The text entered into these packages is actually stored as font glyph codes rather than character encodings. Non-Unicode truetype fonts like Nudi, Baraha, ShreeLipi, Akruti, etc., are some of the most popular fonts being used. The system does not understand these characters as Kannada characters. Any text based operations like search, replace, sorting, spell-check, text-to-speech, etc., are not possible with this kind of text. Employing Unicode for all digitisation works of Kannada text solves this problem. Usage of Unicode for Kannada has become prominent only recently. All websites like Facebook, Twitter, Wikipedia, Wikisource, etc., want the text in Unicode only. There is still a large amount of text entered and stored with old non-Unicode font based encodings. These are mostly present in the form of PageMaker files. This blog explains the process of converting the text present in PageMaker into Kannada Unicode text.

Kannada non Unicode text in PageMaker

Kannada and Culture Department of Govt of Karnataka have released Unicode complaint opentype fonts and Unicode based software for Kannada under GPL. These are available for free download at their website (https://www.karnataka.gov.in/kcit/pages/kannadasoftware.aspx). Download and install “Ascii to Unicode Kannada Converter” from this page. This software works in Windows only. Now you are ready to convert the text from PageMaker file into Unicode.

Open the PageMaker file. Select the Text tool depicted by a big “T” shaped icon. Click anywhere in the text area. Select the entire text (Ctrl-A followed by Ctrl-C). Now open Notepad and paste this text into that (press Ctrl-C). The text will appear gibberish in Notepad. Don’t worry about it. Save the file as plain text file (.TXT file). Remember where you have saved the file.

Kannada non-Unicode to Unicode converter released by Govt of Karnataka

Now run the “Kannada ASCII Unicode Converter” software. In the first textbox enter the name of the ASCII file to be converted (the file you just saved from Notepad). In the bottom textbox enter a filename for the Unicode text file that will be created by the software. Select the default “GOK (Kuvempu Nudi Baraha)”, or other encoding as the case may be, as the encoding from which the text has to be converted. Click on the button written “ಪರಿವರ್ತಿಸಿ”. It will show the progress of conversion.

Message from the converter when the conversion is completed

Once the conversion is complete, it will display appropriate message to indicate the completion of the conversion. If you open the text file created by the software, it will have the text converted into Unicode. This text can be used in Wikisource, Wikipedia, etc.

Kannada Unicode text after conversion when opened in Notepad

Pavanaja U B, Program Manager, CIS-A2K

Notes[edit]

Ideas for social media messages promoting the published post:

Twitter (@wikimedia/@wikipedia):

Converting from non-Unicode Kannada text into Unicode Kannada

Facebook/Google+

  • ...