Jump to content

Promoting the South Asian languages projects

From Meta, a Wikimedia project coordination wiki

The Indian languages projects have very low numbers of articles compared with the population speaking these languages. Several reasons explain this. This page and the mailing list at WikimediaIndia-l is for coordinating solutions for these problems.

Note: Please sign when adding comments. (To sign, Add --~~~~ at the end of your comment and please make sure you're logged in).

The problems[edit]

  • The majority of computers in India do not have support for Unicode and/or Indian languages, including fonts.
    • Many still have old versions of MS Windows (9x), which have at the best a very limited support for Unicode.
    • Even recent versions of Windows are not localized in Indian languages.
  • Linux has gone a long way on the path to supporting Unicode. So linux livecds are another option to consider. But with linux there is another problem, hardware detection and winmodems. Since for wikipedia, getting online is essential, and I think most of our target audience is still on dial-up, we need a reliable modem support.** A lot of the linux deployments are either old or do not include hindi fonts. The ones that do include hindi fonts do not include good hindi fonts that can display the text perfectly.
Things have been improving on the internet side. Broadband has become cheaper and people are getting to surf net through Linux, as well. Some LiveCDs have been released as well. --H P Nadig 7 July 2005 07:46 (UTC)
I use the ubuntu linux distribution jaunty jackalope (9.04 I think) and dial up support and hardware detection are no longer a problem from what I understand, linux really does seem to be a good alternative -- Akaash Mukherjee oct 21 2009
  • People whose primary language of communication do not have access to computers and the Internet.
    • People who have easy computer and Internet access mainly use English as a working language.
  • There is an almost total absence of keyboards for indian languages .. using the universally accepted QWERTY keyboard by using overlays of indian characters clashes strongly with the simple logic of the indian alphabets .. all indian alphabets share a great deal of similarity of sound .. but are very different in appearance .. so a h/w which follows the logic of the language is needed .. it will meet the needs of many indian languages. - pk sharma.
    • There is a strong need to change the approach as it presently exists .. and totally stop forcing use of the qwerty keyboard.
    • There is a strong need to develop configurable virtual keyboards which can be used on touch screen pc's so that each person/community can configure it to his/their convenience. (GOK does solve this problem.)
    • There is a strong need for touch type of input devices which can work with the myriad pc's already in use
    • Then there are the problems of approaching the problem inherent in 'indianizing' an english language based system .. be it h/w or s/w - pk sharma
    • The many ways of spelling the same word creates a chaos in finding a logic for making a dictionary .. which is so important for making spelling checkers
      • (FYI: als:, ksh:, and to some extent nds:, share the problem of of spelling variants, ksh: Wikipedians currently create dozens of redirects per article. ksh:User:Purodha suggests an extended "search" that automatically scans for possible variants. He would be willing to code it, but would also need input as to which requirements to meet)
        • That would be great. For Hindi, the only one I know enough about, a large portion of the problem could be solved by catching a few of the most common discrepencies. 1. The difference between using the anusvara or the half nasal consonant: लम्बा vs लंबा 2. chandrabindu vs anusvara: हाँ vs हां 3. in polite imperatives, whether the ending is spelled with the य or not: दीजिए vs दीजिये 4. Using the nukta or not:ज़ vs ज. Probably others, but people can work them out together. Those four cover some of the most common things that multiply the number of possible variants. For form sake both your comment and mine should be moved to the talk page as they delve a bit too much into implementation. - Taxman 00:21, 21 June 2006 (UTC)[reply]
    • The logical translation of webpages, dialogue boxes, panels, menus makes things wierd for the newer Indian users
  • No good speech to text software is yet available. A lot of Indian languages are phonetic, and good speech to text software could make a world of difference.

Measures needed to be taken[edit]

  1. Brainstorm ways of promoting native language participation, such as coordinating with existing native language blogs, newsgroups, mailing lists, etc. Submit articles or links to newspaper columnists (online or offline). Work through various post secondary educational institutions and professional organizations to promote contributions. i.e. solving the technical impediments is a start, but more participants will solve the problem too.
  2. Writing tutorials for Unicode and Indian languages support (fonts, keyboard, etc.)
  3. Promoting localized versions of Linux when the hardware would support it
  4. Resolving existing bugs on free software (Firefox, Mozilla, etc.)
  5. Lobbying for compliance with Unicode, localized software, etc.
  6. Develop good speech to text for Indian languages. A lot of Indian languages are phionetic.


Requesting help in designing a barnstar[edit]

All Indian Languages using wikipedians,
Further promotion of Indic language wikipedias needs mass media publicity like Print,Radio,telvision , aproaching in academic Institutions.
Also if possible a level of field support from network engineers in India for presentation and training in usage of Indian Languages and wikipedia can be a added boon.
May I call on all indic language wikipedia users,to put our hands together to design special barnstars or suggest one which can be awarded to all those who will support our noble cause with actual field support among Indian education institutions and Indian mass media or actual field 'network'help to users.
I belive we should award barnstars to such users who can support the cause through online publicity like blogs , groups etc. also.
Please do share any good development in this regards to all indic language wikipedias at विकिपिडिया: देवनागरी टेम्प्लेट परियोजना

(for 1.) I reckon there're already tutorials on language wikipedias, such as this one.
(for 2.) I guess me and few others here are alltime Linux users, but the real scenario isn't so. Most users are Windows users, so Language packs for Linux isn't that big a issue, given the present context.
(for 3.) The latest ubuntu pkg files for firefox (the same works for debian) and fedora are pango enabled builds. They support indic by default (You might just have to add MOZ_PANGO_ENABLE=1 to /usr/bin/firefox). On windows, though, there aren't any problems at all with indic rendering.

Its not a problem on WinXp, but it *is* a problem on Win9x. I tried for 40 minuts to get gujarati working on win 98 but the font rendering is far from perfect on Firefox, I couldnt get IE to work at all. --Spundun'
I didn't take win9x into account when mentioning windows, since M$ itself seems to have stopped supporting those old versions. I was of the impression that Unicode supprt is non-existant on win9x. Is Unicode support there on Win9x? -H P Nadig 7 July 2005 06:38 (UTC)

(for 4.) that is something we can do, (and already most people have been doing... see Indlinux) . The tamil localised CDs are being distributed for free.
--H P Nadig 16:31, 2 Jun 2005 (UTC)

I think even a mundane speech to text software for Indian languages will make a lot of difference. I dont know of any major work being done in this area..

--Anshul 00:28, 27 Jun 2005 (UTC)

Here are a few things that I would like to discuss

  • Regarding Indic scripts, People, take a look at this [1].
  1. Is it possible to create a button in the editing menu ( the one with B I Ab for Internal link etc) which can be activated by clicking on the button so that there is a conversion of whatever is written in Roman text to Devanagari or other Indian scripts text automatically or into another box as in the webpage under consideration?
  2. Is it possible to enable such auto-conversion in all the places of a certain wikipedia where a user can type such as the search box, create article box etc so that the user can easily search, navigate and create article in respective wikipedia in the font being used without having to install or understand a thing. (I think most of the users can type in Roman to generate Indian text as I have seen a lot of people type in Roman script while chatting in Indian languages.)


  1. Is it possible to create and enable a system of interconversion of scripts in all the Indian wikipedia so that those who can not read one script but can understand the language contribute to it. Eg- I understand a bit of Urdu and very little Bengali but I can not read the respective scripts. Had they been available in other South Asian script like Devnagari, Ranjana or Prachalit, I would have been able to read and contribute to those wikipediae. I think such interconversion of scripts would help a lot not just in editing but also in understanding and celebrating the harmonious commonalities of our languages.
    • This automatic translation is very necessary for the developement of Romani Wikipedia, as currently it uses both Devanagari and Latin. There are wikipedias like Serbian and Chinese that are using a automatic translation only from one script (considered the main one) to another, generating a temporary page. This is a partial solution for rmy:wp since it would be necessary a technology to make possible edits for a single article from one script or another, that would change accordingly the same article written with the different script (eventually preserving in the history page the name of the script with which the edit was done). I tried to implement this partial solution with help from users of Serbian wiki, but everything got stuck because we didn't find a monospaced Devanagari font. Anyway, I presume that it must be a solution for this, personally, I didn't find until now the time available to solve it. Desiphral 15:19, 5 November 2006 (UTC)[reply]
      • Yes, we should consider this too. Now the scope of this kind of wiki improvement is extended to other South Asian languages (and probably others if interested), so we should explore as many solutions as possible and then see which is the most convenient and possible. Desiphral 19:42, 6 November 2006 (UTC)[reply]

  • Is it possible to create a bot or some code that allows the sharing of media and templates within the South Asian wikipediae? I think this might help to the closely related languages to create just minor modifications in the template of a similar wikipedia and use it instead of creating a template from scratch. Sharing of media esp. of the South Asia helps to create a better base of users and using the same media in two versions increases unifromity.
  • One thing I would like to bring into notice is that most of the South Asia use Dial-up or low speed connections. Most of them begin the use of wikipedia from the Main Page. So, I think it would be prudent not to place animations, large and unnecessary images and large number of links and featured boxes there in the front page. There is a very fair chance that people will not be able to see the page due to the slow data transfer and even if they see pages, they might not find it enjoyable to wait for a long time to see a page. So, lets not put unnecessary and huge "objects" in the vital pages (pages with large number of views) and specially the main page.
  • I think instead of listing all the main headings of the encyclopedia in the main page(in a box), a browserbar and an alphabetic index and a link to page with such a box can shorten the front page with minimal damage in navigation.

--Eukesh 17:32, 27 October 2006 (UTC)[reply]

Promotion via Sponsorship[edit]


  • Here are some propositions:
    • The articles should be written in Indian languages from scratch or translated (probably from the English Wikipedia).
    • The articles should be encoded in Unicode.
    • The articles sould be released under the GFDL licence (free to edit and copy)
    • The articles should be more than 500 words.
this could perhaps be relaxed to "articles should be at least a monograph" .. writing 500 words by non-expert touch typists for the indian languages could be intimidating -- pk sharma
true. Typing Indian Languages isn't like typing English :) --H P Nadig 7 July 2005 07:15 (UTC)

This list should be completed with specific Indian topics (Indian states and cities, personalities, events, etc.).

Check Promoting_the_Indian_languages_projects/List_of_articles_that_Indian_languages_should_have. Add your suggestions to the page. --H P Nadig 7 July 2005 07:15 (UTC)
  • A proposition of a list of Indian subjects is welcome.
addition to the subjects is nice, but some few could be incharge of creating order out of chaos .. maybe following the categorisation of the english wikipedia would be right .. conformity could be ensured and others could also read the popular 'english' language articles in the same 'groupings' -- pk sharma
  • How many articles in each languages do we sponsor ? 1,000 ? 5,000 ?
why limit ? keep it open --pk sharma
  • Which languages would you be working on ?
  • For each language, we will need at least two persons to review the quality of the articles (teachers, writers, etc.). Could you provide the name and quality of these persons, and the name and grade of the students ?


Add your suggestions here (Order: Name, Languages interested to participate in, role - C : Contributor, R - Reviewer):

  • add pkATsharmas.com as a free volunteer for hindi .. born 1952, b.com(hons) from St.Xaviers, tired businessman, looking for better things to do ;-D .. no firing please, constructive suggestions welcome -- pk sharma
  • Sarai,(Hindi - C/R, Urdu - C/R )

[orgn added by viyyer]

  • Kapil Bhatia ,Hindi-C
  • Eukesh, (Nepal Bhasa -C/R, Nepali -C/R, Hindi-C/R, Sanskrit-C/R, Marathi-C, Bengali-C, Bhojpuri-C, Pali-C)


The number of speakers is for native speakers. Statistics from List of Wikipedias.


  • as:
  • Number of speakers: 20 M
  • 31 articles (24-01-2007)
  • People willing to review articles:
  • People willing to write articles:


Bishnupriya Manipuri[edit]

  • bpy:
  • Number of speakers: 450,000
  • 11,813 articles (24-01-2007)
  • People willing to review articles:1
  • People willing to write articles:1
  • en:Bishnupriya Manipuri language spoken in eastern India & eastern part of Bangladesh. Most of spekers live in rural area with no internet connection, so this wiki has lack of editors. This laguage uses mosly Bengali/Assamese and few Devanagari writing systems. Please visit my talk page to advice me. Usingha 01:14, 10 November 2006 (UTC)[reply]
  • Usingha aka: Uttam Singha/উত্তম সিংহ/उत्तम सिंह My BPY wiki Talk or make a BPY wiki comment


  • gu:
  • Number of speakers: 46 M
  • One contact with a Gujarati writer
  • 270 articles (24-01-2007)
  • People willing to review articles:
  • People willing to write articles:
  • For details, see /Gujarati for the details


  • hi:
  • Number of speakers: 501 M
  • 6759 articles (04-02-2007)
  • Two contacts from the JN University, New Delhi
  • One contact from IGNOU,New Delhi
  • People willing to review articles: Rajeevmass (rajeevmass@gmail.com), Rajeev Tiwari (rajeevkumartiwari@yahoo.com),Rajeevmass (rajeevmass@gmail.com), Arun kumar(arunsias@gmail.com)
  • People willing to write articles:Rajeevmass (rajeevmass@gmail.com),Arun kumar(arunsias@gmail.com)
  • Any body who wants enrich 500m peoples language hindi and for that want to discuss the article, can contact Arun kumar(arunsias@gmail.com)
  • See /Hindi for details.


  • kn:
  • Number of speakers: 55 M
  • 16,333 articles (24-08-2015)
  • People willing to review articles:
  • People willing to write articles:


  • ks:
  • Number of speakers: 6.5 M
  • 350 articles (24-01-2007)
  • People willing to review articles: 0
  • People willing to write articles: 0


  • I would like to help in some way, by convincing people to contribute.
  • My contributions are mainly to the English wikipedia, as I'm more familiar there.
  • Am based in Goa.
  • Number of speakers: about 1.7 to 5.3 million (depending on whom you ask).
  • Written in five scripts: Devanagari, Roman script, Kannada, Malayalam and Perso-Arabic.
  • It could be easy to get started with Roman for sure, and possibly Devanagari/Kannada scripts.
  • I can talk to my friends who are doing a lot of writing in this language.
  • But Konkani doesn't seem listed currently; is that because no article exist? If so, how to get it listed? 16:07, 26 August 2006 (UTC) fredericknoronha at gmail.com -- 16:38, 26 August 2006 (UTC)[reply]


  • ml:
  • Number of speakers: 35 M
  • 2 020 articles (24-01-2007)
  • People willing to review articles:
  • People willing to write articles:

മലയാളം (Malayalam) (Malay means mountain Aalam means people) is the language spoken by people who stays in the land (Kerala) between Sahya Mountain and Arabian sea. Initially it had lot of influences from Tamil and Sanskrit. Later Thunchathu Ramananujan tried to purify the language, hence considered as the Father of Malayalam. People who speak Malayalam are called Malayalees.

Recent Developments[edit]

HelpWiki Group - A group of malayalam enthusiasts has raised some fund and planning to give out bounties among wikipedia users. As a contest this is oriented towards school/college students of kerala, but not only limited to them. HelpWiki is currently having a budget of Rs. 50,000/- and this amount will be awarded as bounties to participants (best three will get Rs. 5000 each). For more details and discussions see http://groups.google.com/group/helpwiki, http://mlwiki.blogspot.com/ or http://helpwiki.blogspot.com/


  • mr:
  • Number of speakers: 70 M
  • 7,338 articles (24-01-2007)
  • People willing to review articles:1
  • People willing to write articles:1

Nepal Bhasa[edit]

  • new:
  • Number of speakers: 1 M
  • 1,937 articles (24-01-2007) (most of the articles are primitive, blank or stubs)
  • People willing to review articles:1
  • People willing to write articles:1
  • Nepal Bhasa is chiefly used in Nepal and to some extent in parts of India like Sikkim and Darjeeling. The language is similar to Indian languages and is the most Indianized Sino-Tibetan language. Plus, the lanuguage is written in Devnagari and has problems similar to other South Asian languages. So, I think its better to list it here as well.--Eukesh 16:08, 27 October 2006 (UTC)[reply]


  • ne:
  • Number of speakers: >22 M
  • 409 articles (24-01-2007)
  • People willing to review articles:1
  • People willing to write articles:1


  • or:
  • Number of speakers: 33 M (2007)
  • 12580 articles (03-05-2017)
  • People willing to review articles:
  • People willing to write articles:


  • pa:
  • Number of speakers: 88 M (27 + 61)
  • 129 articles (10-07-2007)
  • People willing to review articles:
  • People willing to write articles:


  • rmy:
  • Number of speakers: 5-6 M (out of 12-16 M Romani people, as some castes or individuals lost it's use)
  • 194 articles (24-01-2007)
  • People willing to review articles:1
  • People willing to write articles:1

Romani is a Central Indo-Aryan language spoken in Europe (mostly Southeastern and Central), America, Australia, Singapore and probably other areas. Until recently it was not a written language. There are contemporary efforts for standardisation in Devanagari and Latin writing systems (see Romani writing systems). Desiphral 15:19, 5 November 2006 (UTC)[reply]


  • si:
  • Number of speakers: 14M
  • 159 articles (24-01-2007)
  • People willing to review articles:
  • People willing to write articles:


  • ta:
  • Number of speakers: 74 M
  • 66,021 articles (06-01-2015)
  • People willing to review articles:
  • People willing to write articles:


  • te:
  • Number of speakers: 80 M
  • 60,207 articles (06-01-2015)
  • People willing to review articles:
  • People willing to write articles:


  • ur:
  • Number of speakers: 70 M
  • 64,083 articles (06-01-2015)
  • People willing to review articles:
  • People willing to write articles:
      I would very much like to participate in the project.

My name is Mohd Tanveer. I am a Dr of Urdu Language.my phd Topic "An Anaysis of The Literari Contibutions of Eminent Personalities During 1857" in JNU New Delhi.INDIA My mother tongue is Urdu. I have developed a chacter set for Urdu that is based on ASCII code. Please communicate and tell me how should I start participating. My email is mtanveer.w@gmail.com Thanks.

     Dr.Mohd Tanveer
     JNU New Delhi 

संस्कृत भाषा विकिपीडियास्य प्रचालक नामनिर्देशन[edit]

भवदीय,प्रचालक नामनिर्देशनकृते मतप्रदर्शन करोति.त्वां धन्यं वदामि Mahitgar १६:१५, १९ नवम्बर २००८ (UTC)

Other related languages[edit]

In alphabetic order:

  • Bhojpuri
  • Pali
  • Sanskrit: 200,000 (second language speakers)


Project India centric Localisation of IPA