Grants:Project/MSIG/Persian-Tajik converter

From Meta, a Wikimedia project coordination wiki
statusdraft
Persian-Tajik converter
A human-made converter for Persian and Tajiki
targetfa.wikipedia and tg.wikipedia
start date1 September 2023
start year2023
end date1 September 2024
end year2024
budget (local currency)14,000 EUR
budget (USD)15280 USD
grant typeIndividual
join
endorse
Review your report

Applications are not required to be in English. Please complete the application in your preferred language.

Project Goal[edit]

What will be the outputs of your project and how will those outputs contribute to advancing a specific Movement Strategy Initiative


What specific Movement Strategy Initiative does your project focus on and why? Please select one of the initiatives described here

What will be the outputs of your project and how will those outputs contribute to advancing a specific Movement Strategy Initiative In this proposal, I use "Persian" to refer to the language written in the Arabic alphabet and "Tajiki" for the one written in the Cyrylic. Persian is one of the Indo-Iranian languages spoken by about 200 million people worldwide. Persian is the official language in the countries of Iran, Afghanistan, Tajikistan, and the Republic of Dagestan in Russia, and due to the large-scale migration of Iranians and Afghans, it is spoken in other countries as well. Persian is the 8th most used language on the Internet with 2.2% frequency, and Persian Wikipedia has an average of 7.5 billion visitors per day.

Persian is often written in the Arabic script, including in Iran and Afghanistan, and the Persian Wikipedia (fa.wikipedia.org) is also written in this script, which now has around 1 million articles and is the 19th largest Wikipedia in terms of the number of articles. On the other hand, Tajiki, the official form of language in Tajikistan, is written in Cyrillic, and Tajik Wikipedia (tg.wikipedia.org) is an example of writing Persian in the Cyrillic alphabet.

Due to the difference in the alphabet, it is practically impossible to communicate between Tajik and Persian, and the speakers of a common language can only understand and communicate with each other when they talk orally. There are tools such as Persian to Tajik converters and vice versa. However, due to the fundamental difference between Arabic and Cyrillic script, the result could be more satisfactory, and it is not understandable for the speakers. For example, in Persian, there are four letters for the sound of "Z" (ز ذ ض ظ), whose Tajik equivalent is only З, and when converting from Tajik to Persian, it is not possible to determine which Persian letter this З corresponds to. Also, some vowels (a, e, o) are not written in Persian, while in Tajiki, there are vowels; without reporting them, the word does not have the correct meaning.

The difference between Persian and Tajik is not limited to the alphabet. Some words that are obsolete in Persian still have meaning in Tajiki, and their Persian form is meaningless and strange to Tajiks, such as "پنجره-Panjere" (Windows) in Persian, which is "тиреза" in Tajiki. Also, there are many false friends in Persian and Tajiki, words that look or sound similar but have different meanings in Persian and Tajiki, such as "دریا-Darya" in Persian, which means "sea", but "дарё-Darya" in Tajiki means "river". Apart from this, due to Tajik's extensive connection with the Russian language, many loanwords from this language have been added to Tajiki that do not have meaning in Persian, such as "Грамматика" (grammar).

Tajik Wikipedia has about 100,000 articles and 40,000 users, but because the contents produced in the Tajiki are poor, fewer users are eager to work on this Wikipedia. For this reason, Tajik Wikipedia has an average of 34,000 visits per day, and most speakers prefer to use Russian Wikipedia, not Tajiki or Persian. However, due to the common language and culture, using Persian Wikipedia can be easier for Tajiks if they have a Persian-Tajik converter.

For the reasons stated above, it is not enough to use a converter that converts letters from one script to another; we need a lexical converter to transform words and their different forms into Persian or Tajiki. For this purpose, by using the lexicographical data project in Wikidata, we can define the senses and forms of any lexeme in Persian and Tajiki. Then, extracting the data from Wikidata will prepare a database of lexems in different forms, which can be the basis of the converter tool. This converter can be used in all Wikimedia projects, and after that, it can be made available to others as an open-source and free tool.

This converter will allow Tajiks to use Persian Wikipedia, and the free knowledge published in this Wikipedia will be available to more people. In addition, the Tajik Wikipedia community can convey its voice in a better way in the 2030 Movement Strategy because the possibility of a machine and human translation from Persian to English and other languages is much easier than translating Tajik to English.

Project Background[edit]

When do you intend to begin this project and when will it be completed?
1 September 2023
Where will your project activities be happening?
Online, on Wikidata
Are you collaborating with other communities or affiliates on this project? Please provide details of how partners intend to work together to achieve the project goal.
No
What specific challenge will your project be aiming to solve? And what opportunities do you plan to take advantage of to solve the problem?
The main challenge is the differences between Tajiki and Persian which are described above. By using the converter, the Tajik community can involve with the Persian community and using the Persian Wikimedia projects.
Does this project aim to apply one of the examples shared in the call for grants and if so which one?

Project Activities[edit]

What specific activities will be carried out during this project? Please describe the specific activities that will be carried out during this project.
How do you intend to keep communities updated on the progress and outcomes of the project? Please add the names or usernames of these individuals responsible for updating the community
Who will be responsible for delivering on this project and what are their roles and responsibilities?

Additional information[edit]

If your activities include community discussions, what is your plan for ensuring that the conversations are productive? Provide a link to a Friendly Space Policy or UCoC that will be implemented to support these discussions.
If your activities include the use of paid online tools, please describe what tools these are and how you intend to use them.
Do your activities include the translation of materials, and if so, in what languages will the translation be done? Please include details of those responsible for making the translations.
Are there any other details you would like to share? Consider providing rationale, research or community discussion outputs, and any other similar information, that will give more context on your proposed project.


Outcomes[edit]

After your activities are complete, we would like to understand the draft implementation plan for your community. You will be required to prepare a document detailing this plan around a movement strategy initiative. This report can be prepared through Meta-wiki using the Share your results button on this page. The report can be prepared in your language, and is not required to be written in English.

In this report, you will be asked to:

  • Provide a link to the draft implementation plan document or Wikimedia page
  • Describe what activities supported the development of the plan
  • Describe how and where you have communicated your plan to relevant communities.
  • Report on how your funding was spent

Your draft implementation plan document should address the following questions clearly:

  • What movement strategy initiative or goal are you addressing?
  • What activities will you be doing to address that initiative?
  • What do you expect will happen as a result of your activities? How do those outcomes address the movement strategy initiative?
  • How will you measure or evaluate your activities? What tools or methods will you use to evaluate your activities?

To create a draft implementation plan, we recommend the use of a logic model, which will help you and your team think about goals, activities, outcomes, and other factors in an organized way. Please refer the following resources to develop a logic model:

Please confirm below that you will be able to prepare a draft implementation plan document by the end of your grant:

  • ...

Optionally, you are welcome to include other information you'd like to share around participation and representation in your activities. Please include any additional outcomes you would like to report on below:

Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!


  • Research (time needed to review, perform analysis, or investigate any information needed to support implementation ideas or planning):
  • Facilitation (facilitation time including facilitator preparation, meeting facilitation time, and debriefing):
  • Documentation (document preparation time, time spent documenting of discussion, post-meeting work):
  • Translation (translation costs for briefs and global materials):
  • Coordination (coordinator work to manage or support multiple workflows to prepare for meeting):
  • Online tools or services (subscription services for online meeting platforms, social media promotion):
  • Data (internet or mobile costs for organizers or participants to access or participate in activities):
  • Venue or space for meeting (costs of renting a physical meeting space):
  • Transportation costs (costs of supporting organizers or participants to attend the meeting):
  • Meals (costs related to refreshments, lunches, or other meals during in-person activities):
  • Other:

TOTAL AMOUNT REQUESTED USD:

Completing your application[edit]

Once you have completed the application, please do the following:

  • Change the application status from status=draft to status=proposed in the {{Probox}} template.
  • Contact strategy2030(_AT_)wikimedia.org to confirm your submission, as well as to request any support around your application.

Endorsements[edit]

An endorsement from community members (especially from outside your community) will be part of the considerations when reviewing your application. Community members are encouraged to endorse your project request here!