Wikimedia Blog/Drafts/Odia language gets a new Unicode font converter

From Meta, a Wikimedia project coordination wiki

This was a draft for a blog post that has since been published at https://blog.wikimedia.org/2014/06/20/odia-language-gets-a-new-unicode-font-converter/

Title[edit]

Odia language gets a new Unicode font converter

Body[edit]

Screenshot mock-up of Akruti Sarala - Unicode Odia converter

It's been over a decade since Unicode standard was made available for Odia script. Odia is a language spoken by roughly 33 million people in Eastern India, and is one of the many official languages of India. Since Unicode's release, it has been challenging to get more content on it, the reason being many who are used to other non-Unicode standards are not willing to make the move to Unicode. This created a need to have a simple converter that could convert text typed in various non-Unicode fonts to Unicode. This could enrich Wikipedia and other Wikimedia projects by converting previously typed content and making it more widely available. The Odia language recently got such a converter, making it possible to convert two of the most popular fonts among media professionals (AkrutiOriSarala99 and AkrutiOriSarala) into Unicode.

All of the non-Latin scripts came under one umbrella after the rollout of Unicode. Since then, many Unicode compliant fonts have been designed and the open source community has put forth effort to produce good quality fonts. Though contribution to Unicode compliant portals like Wikipedia increased, the publication and printing industries in India were still stuck with the pre-existing ASCII and ISCII standards (Indian font encoding standard based on ASCII). Modified ASCII fonts that were used as typesets for newspapers, books, magazines and other printed documents still exist in these industries. This created a massive amount of content that is not searchable and reproducible due to not being Unicode compliant. The difference in Unicode font is the existence of separate glyphs for the Indic script characters along with the Latin glyphs that are actually replaced by the Indic characters. So, when someone does not have a particular ASCII standard font installed, the typed text looks absurd (see Mojibake). Whereas text typed using one Unicode font could be read using another Unicode font in a different operating system. Most of the ASCII fonts that are used for typing Indic languages are proprietary and many individuals/organizations even use pirated software and fonts. Having massive amounts of content available in multiple standards and few content in Unicode created a large gap for many languages including Odia. Until all of this content gets converted to Unicode making it accessible in different platforms, searchable on the Internet, sharable and reusable, the knowledge base created will remain inaccessible. Some of the Indic languages fortunately have more and more contributors creating Unicode content. There is a need to toil on technological development to convert such non-Unicode content to Unicode and open it up for people to use.

Akruti Sarala - Unicode Odia converter user manual

There are a few different kinds of fonts used by media and publication houses, the more popular one is Akruti. Two other popular standards are LeapOffice and Shreelipi. Akruti software comes bundled with a variety of typefaces and an encoding engine that works well in Adobe Acrobat Creator, the most popular DTP software package. Industry professionals are comfortable using it for its dependable reputation and seamless printing. The problem of migrating content from other standards to Unicode arose when the Odia Wikimedia community started reaching out to these industry professionals. Apparently authors, government employees and other professional were comfortable in at least one of the standards mentioned above. All of these people type using either a generic and popular standard, Modular, or an universal standard, Inscript. Fortunately, the former is incorporated into Mediawiki's Universal Language Selector (ULS) and the latter is in the process of getting added to ULS. Once this is done, many folks could start contributing to Wikipedia.

Content that has been typed in various modified ASCII fonts includes useful encyclopedic content that could help grow content on Wikisource and Wikiquote. All of these need to be converted to Unicode. The non-profit group Srujanika first initiated a project to build a converter for conversion of two different Akruti fonts: AkrutiOriSarala99 and OR-TT Sarala. The former being outdated and the other being less popular. The Rebati 1 converter which was built by the Srujanika team was not being maintained and was more of an orphan project. Fellow Wikimedian Manoj Sahukar and myself used part of the "Rebati 1 converter"'s code and worked on building another converter. The new "Akruti Sarala - Unicode Odia converter" can convert the more popular AkrutiOriSarala font and its predecessor AkrutiOriSarala99 which is still used by some. Odia Wikimedian Mrutyunjaya Kar and journalist Subhransu Panda have helped report broken conjuncts which helps in fixing all problems before publishing. Odia authors and journalists have already started using the font and many of them have regular posts in Odia. We are waiting to have more authors contribute to Wikipedia by converting their work and wikifying them by taking help from the community. A growing community can always use more hands.

Of late, a beta version of Shreelipi fonts - Unicode converter based on Odia Wikipedian Shitikantha Dash's initial code is released. It works with at least 85 % accuracy.

Even after getting the classical status, Odia language is not being used vividly on internet unlike some other Indian languages. The main reason behind this is our writing system had not been web-friendly. Most of those in Odisha having typing skills, are using modular keyboard & Akruti fonts. Akruti is not web-compatible as we know. There are thousands of articles, literary works, news stories typed in Akruti fonts lying unused (for internet). But thanks to Subhashish Panigrahi and his associates, as they have developed this new font converter that can convert your Akruti text into Unicode. I have checked it. It's error-free. Now it's easy for us to write articles online (for Wikipedia & other sites).

Yes, we are late entrant as far as use of vernacular languages on the internet are concerned. But this converter will help us to go godspeed. Lets make Odia our language of communication & expression.

-- Subhransu Panda, Journalist, author and publisher

Subhashish Panigrahi, Odia Wikipedian and Programme Officer, Centre for Internet and Society

Quick links

Notes[edit]