Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/QA infrastructure and tools to fix problems on Wiktionary (ID: 22678255)/Final Report

From Meta, a Wikimedia project coordination wiki
QA infrastructure and tools to fix problems on Wiktionary
Rapid Fund Final Report

Report Status: Accepted

Due date: 21 November 2024

Funding program: Rapid Fund

Report type: Final

Application

This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds where the user has submitted their midpoint report. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.

General information

[edit]
  • Applicant username: Tbm
  • Organization name: N/A
  • Amount awarded: 5000
  • Amount spent: 5000 USD, 5000

Part 1: Project and impact

[edit]

1. Describe the implemented activities and results achieved. Additionally, share which approaches were most effective in supporting you to achieve the results. (required)

I implemented a prototype of an abstraction layer for Wiktionary which allows to access various data from Python. I have done this for English Wiktionary and Swahili Wiktionary.

I have also implemented a way to store Wiktionary changes to disk for review in bulk; these changes can be applied with a script later. This is for changes that can't be fully automated in a bot and need manual review.

Finally I implemented a number of fixes for various issues found on English Wiktionary and Swahili Wiktionary.

Specifically, I implemented a number of fixes to the translation boxes on English Wiktionary. This includes fixing syntax errors, correcting wrong language names, and making some cosmetic changes. This has resulted in several hundred fixes, which all make the automatic parsing of translation entries easier.

The Swahili Wiktionary has a lot of issues. I started by cleaning up incorrect language headers and other syntax errors. I then also cleaned up translation information in a number of ways, such as converting plain-text text to wiki markup, fixing various syntax errors and correcting some cosmetic issues.

Finally, I implemented several other QA tools to fix common issues. One is to fix language codes in entries (i.e. where the language code of a template does not match the language of the entry). I also wrote some tools to correct common mistakes, such as a number of common typos I identified as well as duplicated words.

In summary, I have implemented QA infrastructure for Wiktionary and implemented a number of QA fixes. This code has already resulted in several hundred fixes and can be used as the basis for further QA fixes.

2. Documentation of your impact. Please use space below to share links that help tell your story, impact, and evaluation. (required)

Share links to:

  • Project page on Meta-Wiki or any other Wikimedia project
  • Dashboards and tools that you used to track contributions
  • Some photos or videos from your event. Remember to share access.

You can also share links to:

  • Important social media posts
  • Surveys and their results
  • Infographics and sound files
  • Examples of content edited on Wikimedia projects
  1. English Wiktionary
    1. Translations: 945
  • Fix language name: 80
  • Fix separation of translations: 40
  • Add missing trans-top or trans-bottom: 90
  • Remove empty lines in translations: 25
  • Fix syntax errors: 115
  • Fix spacing between definitions: 250
  • Fix spacing between language name and definitions: 200
  • Fix cosmetics issues: 145
    1. Other
  • Language code: 8
  • Remove duplicate words: 5
  • Fix typo: 16
  1. Swahili Wiktionary: 800
  • related to language headers: 150
  • unbalanced headers: 30
  • misc syntax error: 12
  • rest: clean-up of translations: fix syntax errors, convert language names to code, cosmetic cleanups

A list of all changes is available here: https://en.wiktionary.org/wiki/User:Tbm/Reports/QA_infrastructure_and_tools_to_fix_problems_on_Wiktionary#Metrics

Additionally, share the materials and resources that you used in the implementation of your project. (required)

For example:

  • Training materials and guides
  • Presentations and slides
  • Work processes and plans
  • Any other materials your team has created or adapted and can be shared with others

The Python source code is available on GitHub: https://github.com/tbm/wiktionary-tools

3. To what extent do you agree with the following statements regarding the work carried out with this Rapid Fund? You can choose “not applicable” if your work does not relate to these goals. Required. Select one option per question. (required)

Our efforts during the Fund period have helped to...
A. Bring in participants from underrepresented groups Agree
B. Create a more inclusive and connected culture in our community Agree
C. Develop content about underrepresented topics/groups Agree
D. Develop content from underrepresented perspectives Agree
E. Encourage the retention of editors Agree
F. Encourage the retention of organizers Not applicable
G. Increased participants' feelings of belonging and connection to the movement Agree
F. Other (optional)

Part 2: Learning

[edit]

4. In your application, you outlined some learning questions. What did you learn from these learning questions when you implemented your project? How do you hope to use this learnings in the future? You can recall these learning questions below. (required)

You can recall these learning questions below: While there are a number of QA tools for Wiktionary, a lot of work is needed in this area. I'm curious if the creation of these tools will prompt the community to build more tooling. This aligns well with a similar effort: https://en.wiktionary.org/wiki/Wiktionary:Todo/Lists

Furthermore, I'd like to see if these tools will lead to more cooperation among the different Wiktionary communities.

Finally, we will see if this will prompt a discussion about moving some Wiktionary data to Wikidata in order to remove duplication among the different Wiktionary communities.

I believe it's too early to answer all of these three questions, although they should be revisited in six months or a year. Personally, working on this project has once again confirmed my belief that there needs to be more collaboration, that commons data should be moved to Wikidata and that this would in fact allow more collaboration between the different Wiktionary communities.

5. Did anything unexpected or surprising happen when implementing your activities? This can include both positive and negative situations. What did you learn from those experiences? (required)

I think the positive insight is that tooling can make a huge difference to the quality of Wiktionary.

There were two negative observations, both in terms of underestimating the effort.

I proposed to work on an abstraction layer and on QA fixes (while the emphasis was definitely on the latter as per the title). However, I quickly realized that an abstraction layer is a very elaborate effort that is best a separate project (in fact, several separate projects given the size of Rapid Grants). While I have created a prototype, this needs more work.

Similary, there are many QA fixes that could be created for Wiktionary. While this project has created many fixes and had an important impact, there's a lot left to do. I think I was slightly too optimistic how much can be achieved within one project. In any case, I hope to apply for another grant to continue this work.

6. What is your plan to share your project learnings and results with other community members? If you have already done it, describe how. (required)

I have documented the impact (e.g. metrics) of this work and published the source code. I intend to work with other community members to further refine this work.

Part 3: Metrics

[edit]

7. Wikimedia Metrics results. (required)

In your application, you set some Wikimedia targets in numbers (Wikimedia metrics). In this section, you will describe the achieved results and provide links to the tools used.

Target Results Comments and tools used
Number of participants 10 2
Number of editors 10 2
Number of organizers 1 1
Wikimedia project Target Result - Number of created pages Result - Number of improved pages
Wikipedia
Wikimedia Commons
Wikidata
Wiktionary 2000 0 1775
Wikisource
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Wikifunctions or Abstract Wikipedia

8. Other Metrics results.

In your proposal, you could also set Other Metrics targets. Please describe the achieved results and provide links to the tools used if you set Other Metrics in your application.

Other Metrics name Metrics Description Target Result Tools and comments

9. Did you have any difficulties collecting data to measure your results? (required)

No

9.1. Please state what difficulties you had. How do you hope to overcome these challenges in the future? Do you have any recommendations for the Foundation to support you in addressing these challenges? (required)


Part 4: Financial reporting

[edit]

10. Please state the total amount spent in your local currency. (required)

5000

11. Please state the total amount spent in US dollars. (required)

5000

12. Report the funds spent in the currency of your fund. (required)

Provide the link to the financial report https://docs.google.com/spreadsheets/d/1Nl0_yLlUIMcbD3CM83YIpnw9Aq-qaIeEKaiRWp79VRc/edit?gid=0#gid=0


12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)


13. Do you have any unspent funds from the Fund?

No

13.1. Please list the amount and currency you did not use and explain why.

N/A

13.2. What are you planning to do with the underspent funds?

N/A

13.3. Please provide details of hope to spend these funds.

N/A

14.1. Are you in compliance with the terms outlined in the fund agreement?

Yes

14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?

Yes

14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.

Yes

15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)


Review notes

[edit]

Review notes from Program Officer:

N/A

Applicant's response to the review feedback.

N/A