Grants:PEG/Anderson/Script encoding proposal for Nepal
Grant request details
- Are you an organization, a group, or an individual?
- Please provide your name, or the name of the group or organization requesting this grant.
- Unicode Consortium
- Please provide the name (or username) of the main contact for this grant request. You do not need to disclose your legal name publicly.
- Deborah Anderson
- For groups and organizations only: Please provide the name (or username) of a second contact for this grant request.
- Lisa Moore
- Please link to any relevant documents, including your website if you have one.
- Organization’s website: http://www.unicode.org/
Project website: http://www.linguistics.berkeley.edu/sei/
Script proposal and background documents:
- Scripts of Nepal: http://www.unicode.org/L2/L2009/09325-n3692-nepal-scripts.pdf
- For "Prachalit Nepal":
- Comparison between content of two proposals  and : http://www.unicode.org/L2/L2014/14220-newar-nepaalalipi-compare.pdf
- For Ranjana:
- Official project name
- Script Encoding Proposals for Nepal
- Project start date
- This is the date you begin work on your project. Please include a month, day, and year. You must include a project start date.
- 1 August 2014
- Project completion date
- This is the date you finish work on your project. Please include a month, day, and year. You must include a project completion date.
- 31 December 2014
- Background to the project
- In order for a script to be used on a Wikipedia page today, its characters must be part of the international standard Unicode. While most modern (and some historic) scripts are in Unicode, a number of lesser-used scripts are not, including two scripts from Nepal, "Prachalit Nepal" and Ranjana. "Prachalit Nepal" is used for the Nepal Bhasa language, a language with 847,000 speakers. The language is most commonly written today in the Devanagari script, but this is not the native script. The "Prachalit Nepal" script is taught today from the primary to post-graduate levels, according to Suwarn Vajracharya, but online materials and book publications are severely hampered by the script not being in Unicode. Ranjana is still used today in signage, but is found primarily in religious and literary documents. It is used to write the Nepal Bhasa language, as well as Sanskrit and Tibetan.
- Users of these scripts in Nepal are very passionate about getting their scripts into the Unicode Standard and into Wikimedia projects (link), where they can write and read articles in their preferred (and native) script (Prachalit Nepal), and make their literary and historical works accessible to the world (in the Ranjana script). Their keen interest in Wikimedia is attested by the large number of submissions to the Nepal Bhasa Wikipedia (over 70,000 articles), but which are written in Devanagari. Chris Fynn, who has worked on a Ranjana font, will make several extensive texts in Ranjana available on Wikisource, once it is part of Unicode.
- The process of getting these two scripts into Unicode, which began at least in 1998, has been stalled due to communication issues between the user community and the standards committees. Two competing proposals for each script have been submitted, but the standards committees want to see a single submission for each script, with user community support. This project will bring three people with Unicode expertise directly to the user community to discuss and work on the proposals. The goal is to work towards a script proposal for each script that has user support and is technically mature.
- If successful, this project has the possibility of being replicated for other scripts not yet in Unicode, and could generate substantial growth for Wikimedia to areas currently under-represented in WM. It should be noted that another script that was added to Unicode (via Project Leader Anderson's project), Buginese, has generated over 14,000 articles on Wikipedia, despite being considered "to be under increasing threat as a living script" by http://scriptsource.org.
- Brief introduction to Unicode and how to get a script onto Wikimedia projects
- The Unicode Standard is the international character encoding standard that is used on all modern computers and mobile devices for sending text electronically today. It is supported by national body standards organizations, computer companies and font foundries the world over.
- In Unicode, each character is assigned a unique number, which remains the same, no matter what platform or which software. However, if the script is not in Unicode, the users must rely on non-standard solutions, which means that sending text in those scripts cannot be sent/received reliably, making text interchange very problematic.
- The process of getting a script into Unicode, and eventually onto Wikimedia projects, involves the following steps:
- write a script proposal that includes all the technical details required by the standards committees and has user community support
- get approval by the two standards committees (ISO/IEC JTC1/SC2/WG2 and Unicode Technical Committee), a process that typically takes 2 years
- once the Unicode Standard is published, create standardized fonts and ensure computers can support the fonts so the glyphs appear and behave as expected
- use standardized fonts in Wikimedia pages and elsewhere
- Please describe the project in a few sentences
- This project will support a meeting in Kathmandu, Nepal in October 2014, with script authors Pandey and Manandhar, other script stakeholders and Wikimedians in Nepal, Unicode member Constable, and project leader Anderson. The goal of the Nepal meeting is to discuss the proposals for the two scripts, to create a framework for reaching consensus (including documenting areas of agreement, and topics that require additional work), and to agree upon a concrete plan to progress the two script proposals for eventual review and approval by the two standards committees, so they can eventually be part of the standard and be used in Wikimedia projects. Wikimedians' presence at the meeting will help to set the stage for the adoption of the scripts into WM, once the scripts are part of the Unicode Standard.
Please request your grant in your local currency. WMF is able to grant funds in many currencies.
- Amount and currency requested
- Please provide an equivalent amount in US dollars using the exchange rate provided by Oanda on the date you open this request
For organizations only
This section only needs to be completed by organizations requesting grants. Individuals and groups that are not incorporated do not need to complete this section.
- Are you an incorporated organization able to provide local proof of nonprofit status within your country? For-profit organizations are not eligible to receive grants through the PEG program.
- Answer YES or NO.
- Does your organization currently employ or engage any fulltime or part-time staff or contractors? If yes, please tell us how many staff or contractors you employ or engage, along with their functions or a link to your staff page. For example, "2 full time program managers and 1 part-time contractor accountant at 50% FTE each."
- 1 Senior Software Engineer at 75% FTE, 1 Office Manager at 75% FTE, 1 Senior Editor/Project Manager at 50% FTE
Goals and measures of success
Please briefly describe what will be accomplished if the project is successful.
- The main goal of this project is to meet with the user communities in Nepal to discuss two Unicode script proposals and come to agreement on the concrete steps in order for the proposals to be submitted to the standards committees for review and approval, so the scripts will become part of the international character encoding standard Unicode (/ISO 10646) and be used in WM projects.
Measures of success
Please provide a list of measurable criteria that will be used determine how successful the project is. You will need to report on the success of the project according to these measures after the project is completed.
- Hold meetings in early October with users of the “Prachalit Nepal” and Ranjana scripts in Kathmandu, Nepal, to discuss two script proposals and create a way forward on how to come to consensus and progress the proposals to the two standards bodies. Such meetings are necessary because email communication which has taken place between the various parties since 1998--and especially since 2012--has not been successful.
- Submit a report to the two standards committees that summarizes the outcome of the meeting and identifies “next steps”. (Such next steps could be an agreement, for example, to submit to the two standards committees an interim script proposal by a given date, or the next steps could spell out who will do certain tasks - and when - in order to arrive at a joint final proposal for submission to the standards committees.)
- Submit a report to WMF outlining the results, lessons learned, and how the process here could be applied to other scripts, resulting in more languages being represented in WM.
Project scope and activities
This section describes what will happen if this project is funded. Who will do what, and when?
- List of activities
August – September 2014: Anderson will work with Allen Tuladhar to set up two meetings in Kathmandu for the two scripts of Nepal, one for “Prachalit Nepal” and one for Ranjana. Anderson and Tuladhar will ensure the various interested parties are invited, including the general user community in Nepal and other countries, Wikimedia contributors and potential contributors, academics, font implementers, Unicode Technical Committee member Peter Constable, Unicode proposal author Anshuman Pandey, and project leader Deborah Anderson. Tuladhar will act as Nepal coordinator, arranging for meeting rooms and general meeting logistics.
1-7 September: Anderson and Tuladhar will prepare and circulate an invitation to all the parties, with a tentative agenda, meeting logistics, and links to relevant documents (proposals for the two scripts and other documents, such as a comparison between the proposals; links to current documents are provided above at the top of this proposal)
2-15 October: Tentative schedule
Thursday, 2 October: Anshuman Pandey arrives in Kathmandu
Friday, 3 October: (Pandey recovers from jetlag)
Saturday, 4 October: Peter Constable and Deborah Anderson arrive in Kathmandu; interested parties meet with Peter Constable, Deborah Anderson, and Anshuman Pandey to discuss “Prachalit Nepal” in afternoon to discuss issues, using already prepared documents as the basis for discussion
Sunday, 5 October: continue discussion (discuss name, encoding model, characters)
Monday, 6 October: finalize plans to progress proposal for Prachalit Nepal
Tuesday, 7 October: BREAK (some discussion could spill over onto this day)
Wednesday, 8 October: BREAK (some discussion could spill over onto this day). (Anderson departs)
Thursday, 9 October: meet to discuss Ranjana (if Peter is available, he could attend this first day on “Ranjana”)
Friday, 10 October: discussions
Sat. 11 Oct.: finalize plans to progress proposal for Ranjana
Sun. 12 Oct.: BREAK
Monday 13 Oct.: font work by interested parties
Tuesday 14 Oct.: font work by interested parties
Wed. 15 October: Anshuman Pandey departs
20 October: Report on meetings will be written by Anshuman Pandey and Allen Tuladhar and sent to Wikimedia Foundation and to the Unicode Consortium for posting. Other relevant documents will be forwarded to Unicode Technical Committee by Anderson (Project Lead)
27-30 October: Kathmandu meeting report will be presented at the Unicode Technical Committee meetings in Cupertino, CA, by Anderson, who will answer any questions.
7 November. Anderson will relay comments and questions to Pandey and stakeholders via email, based on the discussion.
31 December. Anderson will submit a final report on the project to Wikimedia Foundation, outlining lessons learned from the project and how to make the process applicable to other under-represented languages in WM (with unencoded scripts), along with a fiscal report from the Unicode Consortium.
- On the Wikipedia pages for these scripts, Wikimedians in Nepal and other countries will be kept apprised of the progress of the proposal through the standards process.
- Anderson will continue to oversee the work on proposals and shepherd them through the standards process (i.e., through the two standards committees)
- Towards the end of the script encoding process (ca. 2 years), font specifications with the code points in the proposals can be finalized. At that point, a standardized font (i.e., with the Unicode-approved code points) can be created and Wikipedians will be encouraged to create content with the standardized font.
Please provide a detailed breakdown of project expenses according to the instructions here.
Grantees are subject to line-item scrutiny of expenses. Changes to the approved budget beyond 10% in any category must be approved in advance.
- Project budget table
|Number||Category||Item description||Unit||Number of units||Cost per unit||Total cost||Currency||Notes|
|1||airfare||Seattle-Kathmandu||roundtrip||1||$1248||$1248||USD||For Pandey to attend scripts meeting in Kathmandu:
depart SEA 1 Oct.; depart Kathmandu 15 Oct.
|2||lodging||(in Kathmandu)||night||13 nights||$40||$520||USD||$40 * 13 nights|
|3||meals and incidentals||(in Kathmandu)||M&I rate per day||13 days||$20||$260||USD||$20 * 13|
- Total cost of project
- Total amount requested from the Project and Event Grants program
- Additional sources of revenue that may fund part of this project, and amounts funded
- Peter Constable will fund his own trip to Nepal personally: travel from Sri Lanka to Kathmandu $415, one night in New Delhi on 3 Oct. $80, and lodging for 1 night in Kathmandu $40 = $535
- Deborah Anderson will fund her own trip to Nepal or seek alternate funding: travel from Sri Lanka to Kathmandu $415, costs to change ticket $129, one night in New Delhi on 3 Oct. $80, and lodging for 4 nights in Kathmandu, Oct 4-7, @$40/night = $784
- Anderson will not request any funding for her time devoted to this project (in-kind contribution from Script Encoding Initiative, UC Berkeley) 40 hrs.@ $66/hr = $2640
- Anderson will fund Anshuman Pandey $1000 towards work on the revised proposals = $1000
- Anderson will fund travel and visa from Bhutan for Chris Fynn, specialist in Ranjana who has been working on a font, to attend meeting = $614
- Meeting room and meeting logistics to be provided via Allen Tuladhar, Microsoft Innovation Center Nepal. (in-kind contribution)
- Unicode Consortium is waiving all overhead costs.
See a description of non financial assistance available. Please inform Wikimedia Foundation (WMF) of any requirements for non-financial assistance now.
- Requests for non-financial assistance
- Expert from WMF is needed to answer questions regarding this new grant submission and to help connect with WMF members in Nepal.
Resources and Risks
Highlights of the resources for this project:
- The organizers of this project have an excellent reputation in helping get eligible scripts into the Unicode Standard.
- We have secured an agreement with a local representative in Nepal to coordinate the meetings and provide logistics (Allen Tuladhar)
- An onwiki page has been set up to show community support for the project (link)
- The Admin at Nepal Bhasa Wikipedia, Eukesh Ranjit, is enthusiastically behind this effort to get the scripts into Unicode.
The core team members of this project are:
- Deborah Anderson (project leader)
- Anshuman Pandey (author of script proposals for these 2 scripts)
- Peter Constable (Unicode member)
Working with Anderson will be:
- Allen Tuladhar, Microsoft Innovation Center Nepal, author of an early proposal for the “Prachalit Nepal” script and a collaborator with the Lipi Guthi group in Nepal, a group that oversees all aspects of the language and has been coordinating the script encoding activities. Tuladhar will act as organizer on the Nepal side. (Tuladhar had earlier organized a meeting at Microsoft Nepal offices in 2013 between various Nepalese users and Pandey, but Pandey was not able to attend the meeting.)
Wikimedia personnel and their projected involvement:
- Eukesh Ranjit (admin at Nepalbhasa Wikipedia): will use the Prachalit Nepal and Ranjana scripts in Wikisource and Wikipedia
- Saroj Dhakal (active Wikipedian/Wikimedian in Nepal): is connected with those at Google who work on fonts and implementing new scripts; will help put meeting organizers in touch with the Nepal Bhasa Academy and Nepal Academy, Local Languages Department
- Chris Fynn (administrator on Dzongkha Wikipedia; contributor of images to Wikimedia Commons since June 2007): has worked on a Ranjana font and been involved in other Unicode proposals and font projects; has expertise on the process of getting scripts into fonts; will submit Ranjana materials into Wikisource
Other Wikimedia personnel consulted on this project:
- Ganesh Paudel (Wikimedia Nepal): moral supporter of the use of Prachalit Lipi in Wikimedia Projects.
- Prof. Bhimdhoj Shrestha, Tribhuwan University (Advisor Wikimedia Nepal): supporter of a meeting should be conducted in Nepal to finalize and initiate Ranjana script in Wikipedia
Others who are involved:
- Suwarn Vajracharya (International coordinator for encoding Nepallipi, and Chair, Nepal Study Center, Japan [NSCJ]); has worked on fonts for Nepal Prachalit and will submit content to Wikipedia in Prachalit Nepal (once the script is in Unicode and fonts are available)
- Anil Sthapit (representative to Nepal Lipi Guthi, organization overseeing script encoding activities)
- Devdass Manandhar (author of script proposals for “Prachalit Nepal” and Ranjana)
- Samir Karmacharya and Bishnu Chitrakar (co-authors of proposals for “Prachalit Nepal” and Ranjana)
- Patrick Hall (supporter of getting the scripts in Nepal supported in technology)
Special skills or qualifications the project lead and other team members brings to the project
Since 2002, the Project Leader Deborah Anderson has been working with user communities and experts to get scripts into the Unicode Standard through her project, the Script Encoding Initiative at UC Berkeley. To date, the project has successfully shepherded over 70 scripts through the approval process. As project lead, she explains the script encoding process to users, oversees the authoring of script proposals by veteran script proposal authors (such as Pandey) and others, ensures user communities and experts review the proposals, and presents the proposals to the standards committees (Unicode Technical Committee and the ISO Working Group 2, two committees on which she sits). Because she has been involved in script encoding for over 12 years, she brings considerable experience in identifying potential problems and providing guidance to keep the proposals from getting stalled.
Anderson and her project have been involved in work to encode the scripts of Nepal since 2009, supporting work on proposals for “Prachalit Nepal” and Ranjana (http://www.unicode.org/L2/L2012/12003r-newar.pdf, http://www.unicode.org/L2/L2009/09192-n3649-ranjana.pdf), which were written by seasoned Unicode proposal authors who are very familiar with the technical details that are required for a successful proposal. However, proposals need to have user community buy-in. In the case of Nepal, proposals for the two scripts were submitted by members of the user community in Nepal (for “Prachalit Nepal”: http://www.unicode.org/L2/L2014/14086-nepaalalipi.pdf and for Ranjana: http://www.unicode.org/L2/L2013/13243-ranjana.pdf). Because the authors in Nepal have not been able to attend meetings and interact directly with the Unicode Technical Committee, the proposals from Nepal presented technical problems. Email communication has not proven effective in trying to resolve the technical issues to make the proposals acceptable to both the Unicode Technical Committee and those in Nepal. As was the case with the Tangut script (see below, “Similar projects”), this project will bring together the proposal authors with two Unicode Technical Committee representatives (Anderson and Constable). Anderson is planning on attending the meeting in Kathmandu.
Anshuman Pandey will attend the meeting Kathmandu. Pandey has penned numerous script proposals since 2005 (see http://www‐personal.umich.edu/~pandey/), many of which are now part of the Unicode Standard. His documents on “Prachalit Nepal” and Ranjana will be two core documents for the meeting, alongside those of another participant from Nepal (Devdass Manandhar)
Peter Constable, who will attend the start of the “Prachalit Nepal” meeting, is a Unicode Technical Committee member and representative to ISO/IEC JTC1/SC2 for the Unicode Consortium. He has substantial experience in getting consensus from various parties, having brokered an agreement on the name of the “Tai Tham” script between various parties. As a Unicode Technical Committee board member, he can also explain the technical comments from the UTC regarding the encoding model.
This project is very similar to one in 2013 for the historic Tangut script proposal. Tangut had been stalled for several years, due to problems in communication with the experts in China, who were keen to encode the script, but had some reservations about the script proposal. Under Anderson’s leadership (and with financial support from the Luce Foundation), a face-to-face meeting was held in Beijing in December 2013. It was only after a face-to-face meeting with the experts that an agreement was reached. The meeting involved the academic users of this historic script, who came from China, Japan, and Taiwan, the script proposal authors (Andrew West and Michael Everson), and Project Leader Anderson The script was approved by the two standards committees. (Fonts were available for the script but did not have standardized code points; as the standards process progresses, the fonts’ code points can be adjusted to match those approved by Unicode. A similar scenario could apply to “Prachalit Nepal” and Ranjana)
This section is used to identify key risks or threats that would prevent you from achieving your project goals and how you would mitigate those risks and threats.
The main goal of this project is to bring together various parties interested in these scripts and to discuss the two script proposals, eventually arriving at a concrete plan on how to progress the proposals. Though differences of opinion will inevitably arise, it is important to not get side-tracked by areas of disagreement. Hence on the first day of meetings in Kathmandu it will be stressed that any topic on which no agreement can be reached will be put aside. However, in order for a proposal to progress in the standards approval process, it will be made clear that the basic set of characters and the encoding model need to be agreed upon by community consensus.
While most of the character repertoire is agreed upon (as evidenced by the proposals by Pandey and Manandhar), the following are areas of disagreement:
- (for “Prachalit Nepal”): name of script, representation of breathy consonants and vowels, encoding model, a few additional characters, and the names of some characters
- (for Ranjana): encoding model and questions on several characters
In the sections below, please describe how the project is related to the Wikimedia mission and Wikimedia's strategic priorities.
Fit to strategy
- How will this project support the key organizational objectives of
- increasing reach (more people will access or contribute to Wikipedia or our other projects),
- participation (more people actually contributing),
- quality (more content, more useful content, or higher-quality content),
- credibility (more trust in our projects),
- organizational maturity and effectiveness (how it will move you or the Wikimedia community forward),
- or financial sustainability (how it will help you achieve more in the long run)?
- This statement should address at least one of the strategic priorities listed here specifically. See Project and Event Grants program criteria for decision making.
At present, entries in the Wikipedia for the Nepal Bhasa language, a Tibeto-Burman language with 847,000 speakers in Nepal and India, are written in Devanagari (see “new.wikipedia.org” (http://new.wikipedia.org/wiki/%E0%A4%AE%E0%A5%82_%E0%A4%AA%E0%A5%8C). Contributions to “new.wikipedia.org” represent the second largest number of “Indic” entries to Wikipedia (over 70K articles), although the entries are not in either the native “Prachalit Nepal” script or the Ranjana script. While there are currently fonts for “Prachalit Nepal” and Ranjana available, the fonts are not standardized and hence the electronic text in these scripts cannot be reliably exchanged.
Judging from the strong identity of the speakers with these scripts and the number of speakers, it is likely that the availability of these scripts for entering text in Nepal Bhasa would encourage submissions to Wikis, and could even act as a driver for more widespread use of the scripts across the Internet.
The encoding of two scripts for the Nepal Bhasa language will meet the script requirements for this group of under-represented speakers in Nepal, thereby increasing the reach of Wiki projects and encouraging their participation.
- If the project will benefit a specific online community, please tell us.
Since the language Nepal Bhasa is currently represented online either using a non-native script, Devanagari, or with non-standardized fonts for the “Prachalit Nepal” or Ranjana scripts, users are faced with:
- using a script not originally intended for it (and one that carries for some users negative connotation, since it is the official script for the Indo-European Nepali language)
- using non-standardized fonts for these scripts, which mean that text cannot be interchanged reliably over the Internet
- using images in place of the script (which would not be searchable).
If the scripts are adopted by the user community, they would be able to produce content with standardized encoding and fonts in their preferred script (which could be “Prachalit Nepal” or Ranjana).
- Please provide a brief statement about how the project is related to other work in the Wikimedia movement. For example, does the project fit into a work area such as GLAM, education, organizational development, editor retention, or outreach?
The project presents a relatively new direction for the Wikimedia: it will provide the underlying structural support so under-represented populations can produce Wiki content in their preferred script – by getting the script into the international standard Unicode. As such, this effort represents a means of reaching out to linguistically under-represented populations, who can engage with Wiki projects in their preferred script (which often acts as a symbol of identity and pride).
Additionally, the project has the potential to make available extensive historical materials via Wikisource, since the “Prachalit Nepal” script was used in manuscripts and inscriptions since the tenth century, and Ranjana is used extensively for Buddhist scriptures in India, Nepal, Tibet, Mongolia, China, and Korea.
- If successful, will the project have the potential to be replicated successfully by other individuals, groups, or organizations? Please explain how in 1–2 sentences.
Yes. If this project is successful, other groups (or organizations) could be greatly encouraged to follow a similar path, namely: (a) develop content for a Wiki project (such as that done for http://new.wikipedia.org), (b) work with the standards committees to get the script approved, (c) develop fonts for the script using the approved code points, (d) use fonts to create content for Wikipedia or Wikisource and other projects (such as educational materials).
Anderson is encouraging the Gondi script users, for example, to begin developing a Gondi language Wikipedia/Wikisource (albeit not in the Gondi script, since it is not yet approved in Unicode), as a means of demonstrating user community support in Wikis to Wikimedia and to showcase to users the value of a Gondi Wiki.
- Please list other benefits to the movement here.
If this project is successful, it could act as a replicable model for other scripts in India, such as Tulu, Nandinagari, and Dogra (and Gondi, as noted above).