Jump to content

CIS-A2K/Indic Languages/Statistics/2011 Annual Update

From Meta, a Wikimedia project coordination wiki

Update as on 2012 Feb 29. To take seasonal variations into account I have included data for December 2011. The sections are updated as per the updated statistics. Not much variations are noted for community statistics. But there are varions in the number of readers.

I have compiled the, statistical update of the Indic language Wikipedias for the year 2011. In this report, my aim is to provide an analysis as well as my perspectives on the health of various Indic language communities as well as the state of various Indic language wikipedias for the year 2011. (The period of analysis is editor contributions between 2011 January 1 and 2011 December 31). (Read 2010 report here). As always, a lot of the data for this report and analysis are based on the statistical data published at http://stats.wikimedia.org. Thanks to Erik Zachte for compiling all this information. I must also point out that this annual update contains a number of insights that are derived not from this data but directly from community members who have shared a very real-world picture.

2011 has been a very interesting year for Indic Wikipedias with the results of community building bearing fruit and some communities emerging as shining stars.

Here is my executive summary after analyzing the data for 2011:

  1. Every Indic wikipedia community that has focused on community building has done well. Progress is slow but is steady and sustainable.
  2. Doing outreach is not enough. Communities which have provided adequate support systems for newbies are beginning to show early results.
  3. Projects where the emphasis is on article count are in trouble. Further the over usage of bots for article creation has affected the community strength of few languages.
  4. Readership of Indic projects continues to increase and it makes our effort on community building not just central but now urgent as well.

There is so much potential – but that also means so much work!

Starting from this report, I would like to a slightly different from how we have looked at these figures in the past. According to me, community is central. Community will give us content which will drive readership. Therefore, I would like to report in the following sequence.

  • Community
  • Content
  • Readership

This is not merely a structuring nuance. It reflects a very profound conviction that we should all focus only on community building. Content and readership will inevitably follow.



Community is the backbone of Wikimedia movement. It is important that respective language wiki communities is giving adequate importance to community building to achieve the goal of wikimedia movement. The following table give information on 3 important parameters about community.

  • Number of newly registered users who converted to wikipedians (new Wikipedians)
  • Number of users who had at least 5 or more edits a month (active Wikipedians)
  • Number of users who had 100 of more edits a month (high active Wikipedians)
Number of Community members. The numbers in superscript in each column just shows the top three wikipedia community in that category.

Some of the important information that we can make out from this table are.

  • Most of the major Indic languages have number of speakers in crores. However, it is unfortunate to note that even the most active Indic wiki communities - Tamil and Malayalam - have only about 100 active users. (Tamil & Malayalam are the 5th & 11th most spoken languages of India respectively. This points to a huge disconnect between number of speakers and number of active users for all Indic language wikis.)
  • The shining stars of 2011 are Assamese, Sanskrit, and Odia among the small indic wiki communities, Marathi among the medium-sized indic wiki communities, and Tamil among the larger indic wiki communities.
  • My personal view is that Assamese is the biggest success story of 2011 for a new community. Their systematic efforts have resulted in a high conversion rate in attracting newbies. (I have noted that in the recent outreach sessions in Assam, ~10% of attendees ended up becoming wiki editors. This is not only because of strong outreach but also because of staying in touch and providing them the "blanket" of warmth of a community.)
  • Marathi, Tamil, and Malayalam are doing well in converting new users to wikipedians.
  • Marathi’s growth in converting newbies to Wikipedians is laudable especially when considering the fact that one year ago, the story was so different. Congrats to Marathi community for taking care of newbies and thereby building the community.
  • Tamil leads the path amongst all Indic communities on community building. Tamil community had held organic growth (that is, no bots or translation tools for article creation) as a very core belief and the results are fantastic. Also there are many projects inside wiki (for example, the photo contest.) I would encourage Tamil community to continue down this route and to encourage more community-wide collaboration so that a wider cross-section of editors feel they own the projects and more Tamil speakers join the project.
  • Assamese and Odia have driven growth through outreach and WikiProjects and this is wonderful as these are the most sustainable and effective channels. As you might know, one year ago there were no wiki communities for both the wikis.
  • Odia needs to focus on getting greater heterogenity in the community (age, sex & and profession wise) of editors. Right now, the profile is largely young people who are known to each other prior to wiki. Odia wiki needs more diversity. This applies to many other Indic Wiki communities where current active users belong to a particular age group.
  • Sanskrit is a great story on how a partnership can result in new community members. But the challenge is how to embed this partnership seamlessly into the Sanskrit Wikipedia community. For instance, it is extremely important that Sanskrit wikipedians start documenting in Sanskrit wikipedia all the outreach programs as well as all work on Sanskrit wiki - and start having more on-wiki discussions. These all are essential to building an integrated wiki community.
  • Malayalam has the highest number of active editors but in the last one year, the growth in the number of active editors is not encouraging. Even though community had done many outreach programs conversion rate is very less. Malayalam needs to have more impactful outreach sessions – as the number of outreach session happening now is decent, but the results are not.
  • Telugu and Kannada have shown declines in the active community members. It is important that we start communicating more frequently and regularly as a community, start collaborating on WikiProjects with other editors (even if there are only 2 editors in a WikiProject) and get newbies to join (by conducting outreach and providing hand-holding to them after these session.) I would add that Telugu needs to get more young editors. It is happy to note that both the language communities identified the importance of increasing the number of active users and already started working towards it. I am hopeful that the situation can change in the next few months with concerted community efforts.
  • Urdu, and Nepali must work across political borders to redress the issue of declining editors – which is alarming given the small community sizes.


Number of articles. The numbers in superscript in each column just shows the top three wikipedias in that category.

Number of articles is an important parameter, but it has misguided some wiki communities. Hindi is the biggest Indic language wikipedia in terms of number of articles. Hindi wikipedia crossed the 1 lakh article milestone 2011 August 30. Odia and Assamese shown higher percentage in article growth due to the fact that till 2011 both these wikis were inactive and were having less than 500 articles. Among active communities, Tamil and Sanskrit wikipedia’s growth in number of articles is impressive since both the communities insist on the biological growth of the articles.

  • Considering the number of articles and number of active editors, far and above the greatest success on content is Tamil and Malayalam. Tamil community has consistently driven new articles. I am sure community is giving importance to enhancing the quality also. Malayalam has focused on article quality and the edits per article show a consistently high level.
  • Edits per article for Bengali article is high. Bengali needs more active editors and need more articles as a natural outcome of that.
  • Newari and Bishnupriya Manipuri are in trouble in terms of number of articles (and number of active users also) because of the use of bots (in the past) to create articles and the lack of emphasis on community building and collaboration to improve articles. Similarly, Hindi, though it crossed the 1 lakh article count (which is great) has a big issue of not having an adequate community size to manage the volume and nature of tasks created by bots.
  • Assamese and Odia showed dramatic growths in the number of articles. Both the communities need to increase the strength of community which will increase the number and quality of articles.
  • Inspiringly, Sanskrit with just 50,000 speakers now has >7,000 articles – which is probably one of the best article to population ratios in the world. It needs much more collaborative editing though.



Readership is increasing for all Indic language wikipedias and the total figure has moved from 2 crore page veiws per month to 3 crores – which is more than 40% jump!

Number of readers. The numbers in superscript in each column just shows the top three wikipedias.
  • Most of the major Indic language wikis show 40% or more growth in readership which is quite impressive. Remember for most of the communities, this has happened without any outreach efforts. This suggests that we have readers and this will continue to increase.
  • Increase in readership for Odia, Assamese and Nepali point to strong potential for community building and outreach in these communities. This is a common conclusion across languages.
  • Incredibly, Sanskrit has 4.5 lakh page views in December 2011- for a total number of 50,000 language speakers, which averages out to 9 page views per speaker (if one considers all speakers!)



I know you might think that I keep repeating myself but the lessons I am taking from the above (for ALL languages) are as follows:

  • Focus on community building through community interaction (through meetups, talk pages, village pumps, and mailing lists.)
  • Focus on community building through community collaboration (WikiProjects or planning outreach efforts or advocacy)
  • Focus on community building through doing more outreach, better outreach, and being supportive of newbies.
  • Do not get obsessed by article counts or readership. These are natural outcomes of community building.
  • Stay away from bots and translation tools for article creation as they do more harm than good. Use bots in such a way that it is not affecting the growth of community.

I welcome your comments on this annual update. Please discuss it on the talk page. You can also reach me at shiju(_AT_)wikimedia.org