Wikimedia Blog/Drafts/Vachana Sanchaya: 11th century Kannada literature to enrich WikiSource

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Vachana Sanchaya: 11th century Kannada literature to enrich Wikisource[edit]

Palm leaf manuscript with "vachana" written in it

In the poetry of Kannada (an Indic language), Vachana sahitya is a form of rhythmic writing that evolved in the 11th Century C.E. and flourished in the 12th century, as part of the "Lingayatha" movement. More than 259 Vachanakaras (Vachana writers) have compiled over 11,000 vachanas. 21,000 of these verses which were published in a 15 volume"Samagra Vachana Samputa" by the government of Karnataka have been digitized. Two Wikimedians along with a Kannada linguist and author O L Nagabhushana Swamy are involved in the Unicode conversions, corrections and writing preface for these verses. The entire work is now available as a standalone project called "Vachana Sanchaya" and ready to enrich Kannada Wikisource.

This project was started a year ago when Kannada Wikimedian Omshivaprakash was trying to help Professor O.L. Naghabhushana Swamy and Kannada author and publisher Vasudhendra access the vachana (verses) of Vachana Sanchaya. Swamy had trouble using publicly available content on Vachanas since the data was in ASCII standard and searching text was a huge problem. I (Pavithra Hanchagaiah) started to help gather information about vachanas and document it in Unicode by writing scripts for open source software. Further discussions were had to get thousands of vachanas in the form of a database, so that they could be easily searchable with an index. This demanded us to build a platform supporting all these activities, which would help the linguistic researchers and students as well as members of the general public who have an interest in reading and studying Vachana literature. With this idea, Omshivaprakash started designing the model, and his colleague Devaraju started building it. In the meantime I was running various scripts to fix errors in conversion of ASCII text to Unicode, confirming that the data was ready to consume by the modules developed for concordance. We spent weekends & holidays executing this project from home. With the constant feedback and guidance from Mr. Swamy and Vasudendra, we learned how concordance of text is used by researchers and what would make it easier for them to research on Vachana Sahitya. Omshivaprakash worked on the architecture of the platform, decided the infrastructure requirements - free and open source software technologies were used to keep the platform active while managing the entire project. I was involved by providing critical hacks for digitization and gave feedback through suggestions.

Vachana Sanchaya Website Screenshot
Working System

Currently, the system has around 200,000 unique words in its repository. Vachana Sanchaya is meant for research rather than just a repository of text on web. While you search the words on our system, you can see who has used the word in all Vachanas. To make the research more readable, we highlight the text searched in each Vachana that would be displayed. To repeat the search for a specific Vachanakara (poet) you just need to click on his name on the graph on the results page. We have used MediaWiki's jquery-ime input tool architecture that helped us provide a feature to directly enter Kannada text in Unicode for searchs. So, just type, and get results!

Public Response

We are glad to see people accessing vachanas from our Facebook, Twitter and Google+ channels. There have been 500,000 pageviews to our site in the first few months of our platform's public launch. Interestingly, the most commonly searched Kannada words like “ಕರ್ಮ"(Karma en:Work/Deed) , “ಸತ್ಯ" (Sathya -en:Truthfulness ) and “ನದಿ" (River) have resulted in quick and easy results.

ಆಂಗೀರಸ, ಪುಲಸ್ತ್ಯ, ಪುಲಹ, ಶಾಂತ,

ದಕ್ಷ, ವಸಿಷ್ಠ, ವಾಮದೇವ, ನವಬ್ರಹ್ಮ, ಕೌಶಿಕ, ಶೌನಕ, ಸ್ವಯಂಭು, ಸ್ವಾರೋಚಿಷ, ಉತ್ತಮ, ತಾಮಸ, ರೈವತ, ಚಾಕ್ಷಷ, ವೈವಸ್ವತ, ಸೂರ್ಯಸಾವರ್ಣಿ, ಚಂದ್ರಸಾವರ್ಣಿ, ಬ್ರಹ್ಮಸಾವರ್ಣಿ, ಇಂದ್ರ ಸಾವರ್ಣಿ ಇವರು ಇಪ್ಪತ್ತು ಮಂದಿ ಪ್ರಪಂಚ ನಿರ್ಮಾಣ ಸಹಾಯ[ದ]ವರು. ಹತ್ತೊಂಬತ್ತು ಎಂದರೆ ಪುಣ್ಯನದಿಗಳು. ಅದು ಎಂತೆಂದಡೆ: ಗ್ರಂಥ

Plans for the future[edit]

Our system is extensible with respect to adding new feature - we have a review desk for researchers to help us with the review of content. Later we will also be adding required references to Vachanas from various research works that have been done around this literature. The content is available to the public through OpenData API and will be distributed as public domain through Wikisource once the review work is complete. This will open up the system for students, developers, researchers and anyone interested in working around building linguistic tools for Kannada and other Indic languages. This system is meant to evolves around other works rather than having to change and re-invent the wheel for one more such project. Vachana Sahitya will further help us to initiate Natural Language Processing (NLP) projects if more researchers get together to tag the words, glossary etc in the coming days. We can also fulfill the need of various language tools like spelling and grammar checker for users through crowd-sourcing the development. The next projects under the “Kannada Sanchaya” are Sarvagnana Vachanagalu and Dāsa Sanchaya which are in the pipeline with initial phases of work underwat. Our idea is to extend this platform from Vyasa to Muddanna and possibly the contemporary literature work available in the public domain.

In the news[edit]

Pavithra Hanchagaiah and Omshivaprakash HI, Wikimedians from India.

Edited by Subhashish Panigrahi, CIS-A2K.