Grants:Project/MSIG/Wiki Term Base

From Meta, a Wikimedia project coordination wiki
statusdraft
Wiki Term Base
A tool to standardize terms used on Wikipedia and speed up vocabulary translation.
targetArabic Wikipedia
start date1 June
start year2024
end date30 May
end year2025
budget (local currency)8900 USD
grant typeOrganization
contact(s)• wm-levant(_AT_)wikimedia.org• Farah
organization (if applicable)• Wikimedians of the Levant
join
endorse

Applications are not required to be in English. Please complete the application in your preferred language.

Project Goal[edit]

What will be the outputs of your project and how will those outputs contribute to advancing a specific Movement Strategy Initiative

Create a tool to standardize the terms used on Arabic Wikipedia articles, and minimize the time and effort translators need to make informed choices of vocabulary.

What specific Movement Strategy Initiative does your project focus on and why? Please select one of the initiatives described here

Project Background[edit]

When do you intend to begin this project and when will it be completed?
June 2024 - May 2025
Where will your project activities be happening?
Online: on Arabic Wikipedia
Are you collaborating with other communities or affiliates on this project? Please provide details of how partners intend to work together to achieve the project goal.
Formal or informal collaborations are anticipated with (these are not official partnerships at the moment, but we are already in touch with all of them on a preliminary basis):
  • The Wikimedia Foundation’s Language Team: to explore potential integration options with Content Translation
  • Columbia University’s Natural Language Processing group: due to its extensive experience in the state-of-the-art technology specifically for Arabic NLP
  • Arabic Language Academy in Damascus: a landmark institute in linguistics policies that the project may need to explore
What specific challenge will your project be aiming to solve? And what opportunities do you plan to take advantage of to solve the problem?
  • Lack of consistency in Wikipedia’s vocabulary: many words have 2 - 5 alternatives that are inconsistently used in different articles, almost completely up to the discretion of the individual Wikipedians.
  • Time-consuming vocabulary translation: 5,000 words of linguistic discussions are written each month on average in Arabic Wikipedia, to define only a handful of words.
  • Difficulty of making vocabulary decisions: there is no standard way to conclude the lengthy discussions on vocabulary, ‘’nor to enforce whatever term is agreed upon’’. Months of discussion are spent to reach an agreement that cannot be implemented.


Does this project aim to apply one of the examples shared in the call for grants and if so which one?
N/A

Project Activities[edit]

What specific activities will be carried out during this project? Please describe the specific activities that will be carried out during this project.
  1. Exploratory research: to define how a tool can solve the problems above. Following design methodology, we already started preliminary user research by interviewing Arabic Wikipedia editors and attempting to understand how a tool can solve the problems they face with vocabulary choice and translation. In addition, we are conducting a review of existing tools and literature on the topic.
  2. Defining technical features: for a vocabulary standardization tool. Based on the research results, we will define how data, analytics and natural language processing tools can best inform decisions on vocabulary. For example, our initial research already indicates that there is a strong demand for a digitized database of accredited dictionaries to automate and speed up the rigorous manual research that volunteers are undertaking.
  3. Prototype building: to test out our research assumptions. The prototype will be a minimal viable product, or more of a “mock-up” of what we envision with the least amount of features to make it functional. Based on preliminary research, we may be anticipating a prototype along the lines of an initial database of 10 dictionaries (which would be a minimum viable product compared to a potential target of 1,000).
  4. Agile evaluation: following agile methods, we will immediately test out the prototype with a sample of target users. We will then evaluate the performance, and decide whether we need to continue developing the prototype or to pivot / change features.
  5. Repeat: we anticipate going through this cycle at least 2-3 times, with each requiring a couple of months. The product features can be finalized once we reach a performance meeting our goals (for now: ⅔ of our users give us a highly positive evaluation).
  6. Final deliverable: Based on our initial exploratory research (step 1), we expect the final deliverable to be a translation help tool with the following features:
  • Core feature: Searching a digitized database of multilingual - Arabic language dictionaties.
  • Core feature: Ranking of the frequency of Arabic equivalents for source language terms according to their occurrences in dictionaries.
  • Core feature: Integrated into Mediawiki as a gadget or part of the Content Translation tool.
  • Extra feature: Ranking of the frequency of Arabic equivalents for source language terms according to their occurrences in existing Arabic Wikipedia articles.
  • Extra feature: Recommended / approved Arabic equivalents according to the Arabic Wikipedia community discussion process.
  • Extra feature: Automatically calculate a score and recommend Arabic equivalents for source language terms based on context, Arabic Wikipedia occurrences and dictionary occurrences.
  • Extra feature: Grouping terms according to their morphology (i.e. by stems).
  • Extra feature: Diffrentiating Arabic and source language terms at the syntactic level (i.e. by their part of speech).
  • Extra feature: Diffrentiating Arabic and source language terms at the semantic level (i.e. by their specific context and use case).
How do you intend to keep communities updated on the progress and outcomes of the project? Please add the names or usernames of these individuals responsible for updating the community
  • Pre-grant engagement: we are already engaging the community in preliminary user research to understand their needs. We are also going to notify the community of this grant request.
  • During the grant: users will be continually engaged through an agile project framework, with repeated cycles ~2 months each of prototyping and user evaluation. We want this tool to be developed collaboratively with users, not as a black box.
  • After the grant: we will share the tool itself and the features we are able to provide. The eventual goal is that this tool will be adopted by Wikipedians and that it will make their daily lives easier.
Who will be responsible for delivering on this project and what are their roles and responsibilities?
  • Project lead (paid): the owner who is responsible for the final delivery of the project. Responsibilities will include: setting the project goals, overseeing the timeline and day-to-day operations, overseeing budget and funds, coordinating between the different collaborators (see below), leading the three phases of the project: research, prototyping, and evaluation.
  • Sounding board (volunteer): 2 - 3 Wikipedians and/or language experts who will help the Project Lead oversee the project. This board is already in place as there is a team leading the preliminary parts of the project, although some members may transition into roles below.
  • Language experts (volunteer or stipend): 2 consultants with language and translation experience (e.g. Arabic language PhDs). Their advice will be sought on the project methodology, scalability, implementation methods, and they will work closely with the programmers (below). It’s recommended that they should receive stipends, although pro bono consulting is probably possible.
  • NLP experts (volunteer or stipend): 2 consultants with experience in Arabic language natural language processing. Their advice will help correctly build and evaluate the technical aspects of the project. It’s recommended that they should receive stipends, although pro bono consulting is probably possible.
  • Python/JavaScript programmer (paid): primarily coder in the project. It’s preferable that this person is an NLP expert themselves, although their role will be more hands-on and can be complemented by experts. The programmer will write code to clean up linguistic data, build databases, extract data, create machine learning models, and put it all together into a Python and/or JavaScript tool that will be either a Wikipedia plugin or an addition to Content Translation.

Additional information[edit]

If your activities include community discussions, what is your plan for ensuring that the conversations are productive? Provide a link to a Friendly Space Policy or UCoC that will be implemented to support these discussions.
  • The community will be engaged at all stages of the project to ensure their involvement and buy-in. Additionally, the project has a sounding board with community members to avoid any disconnect.
  • Arabic Wikipedia: Rules of Discussion
If your activities include the use of paid online tools, please describe what tools these are and how you intend to use them.
  • Google Cloud: likely tool for OCR services, data storage, and running Python code (NLP clean up, machine learning models) on Google Colab.
  • Tableau: may be used to visualize data outputs.
Do your activities include the translation of materials, and if so, in what languages will the translation be done? Please include details of those responsible for making the translations.
  • Potentially translating the results from Arabic (primary language of the project) to English, for reporting and knowledge sharing purposes. Wikimedians of the Levant user group will oversee translation tasks.
Are there any other details you would like to share? Consider providing rationale, research or community discussion outputs, and any other similar information, that will give more context on your proposed project.

Outcomes[edit]

After your activities are complete, we would like to understand the draft implementation plan for your community. You will be required to prepare a document detailing this plan around a movement strategy initiative. This report can be prepared through Meta-wiki using the Share your results button on this page. The report can be prepared in your language, and is not required to be written in English.

In this report, you will be asked to:

  • Provide a link to the draft implementation plan document or Wikimedia page
  • Describe what activities supported the development of the plan
  • Describe how and where you have communicated your plan to relevant communities.
  • Report on how your funding was spent

Your draft implementation plan document should address the following questions clearly:

  • What movement strategy initiative or goal are you addressing?
  • What activities will you be doing to address that initiative?
  • What do you expect will happen as a result of your activities? How do those outcomes address the movement strategy initiative?
  • How will you measure or evaluate your activities? What tools or methods will you use to evaluate your activities?

To create a draft implementation plan, we recommend the use of a logic model, which will help you and your team think about goals, activities, outcomes, and other factors in an organized way. Please refer the following resources to develop a logic model:

Please confirm below that you will be able to prepare a draft implementation plan document by the end of your grant:

  • Confirmed --Abbad (talk) 01:35, 28 April 2024 (UTC).

Optionally, you are welcome to include other information you'd like to share around participation and representation in your activities. Please include any additional outcomes you would like to report on below:

Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

The requested budget and budget breakdown should be in your or recipient’s local currency. We send grant payments preferably in your local currency. In some exceptional cases (e.g. hyperinflation), we allow grant payments to be made in US dollars. If you are requesting a grant in a currency other than your local currency, please reach out to your Program Officer to discuss.

  • Research (time needed to review, perform analysis, or investigate any information needed to support implementation ideas or planning): 1500$ (estimated 60 hours, at 25$ / hour, divided between the Project Lead and/or contractor researchers)
  • Facilitation (facilitation time including facilitator preparation, meeting facilitation time, and debriefing): 1500$ (estimated 60 hours, at 25$ / hour, Project Lead team overseeing the project’s strategy, implementation, budgeting, reporting & communications)
  • Documentation (document preparation time, time spent documenting of discussion, post-meeting work): 1250$ (estimated 50 hours, at 25$ / hour, divided between Project Lead and contractor data analysts to compile evaluation reports and the final report)
  • Translation (translation costs for briefs and global materials): 250$ (to translate the final report from Arabic to English, or vice versa)
  • Coordination (coordinator work to manage or support multiple workflows to prepare for meeting):
  • Online tools or services (subscription services for online meeting platforms, social media promotion): 250$ (Google Cloud fees, estimated) and 150$ (two months of Tableau Creator subscription)
  • Data (internet or mobile costs for organizers or participants to access or participate in activities):
  • Venue or space for meeting (costs of renting a physical meeting space):
  • Transportation costs (costs of supporting organizers or participants to attend the meeting):
  • Meals (costs related to refreshments, lunches, or other meals during in-person activities):
  • Other:
  • 1000$ (250$ stipends each for four consults in linguistics and NLP)
  • 4000$ (estimated 100 hours of coding at 40$ / hour)
  • 500$ (estimated 10 dictionaries, at 50$ / dictionary, to digitize as part of the project)
  • 3000$ (estimated 1,000,000 OCR-generated words to check & correct, at 0.003$ / word)

TOTAL AMOUNT REQUESTED IN LOCAL CURRENCY: 8900$

Completing your application[edit]

Once you have completed the application, please do the following:

  • Change the application status from status=draft to status=proposed in the {{Probox}} template.
  • Contact strategy2030(_AT_)wikimedia.org to confirm your submission, as well as to request any support around your application.

Endorsements[edit]

An endorsement from community members (especially from outside your community) will be part of the considerations when reviewing your application. Community members are encouraged to endorse your project request here!