Grants:Programs/Wikimedia Community Fund/Rapid Fund/Comprehensive anti-spam external link service (the Citron project) (ID: 22754010)/Final Report
Report Status: Accepted
Due date: 30 January 2025
Funding program: Rapid Fund
Report type: Final
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds where the user has submitted their midpoint report. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.General information
[edit]- Applicant username: Plantaest
- Organization name: N/A
- Amount awarded: 4950
- Amount spent: 4950 USD, 4950
Part 1: Project and impact
[edit]1. Describe the implemented activities and results achieved. Additionally, share which approaches were most effective in supporting you to achieve the results. (required)
This project is named Citron/Spam, and it has undergone approximately three months of development with significant effort.
Over the past three months, I have worked on various aspects of the project, including brainstorming ideas, defining functionalities, designing the interface, programming, testing, deployment, documentation, and community outreach.
Before starting this project, I gathered some preliminary ideas. Once the project officially began, I evaluated and selected the best ideas within my capabilities.
I independently discussed and decided on the technologies to implement my ideas. Using familiar technologies like Java and Quarkus helped save time and ensured code quality. However, in this project, I also worked with Vue and Codex, which I had not used before. Fortunately, these technologies were relatively easy to learn and did not significantly impact the overall progress.
I spent a considerable amount of time building the machine learning model, from collecting raw data and cleaning it to experimenting with various algorithms and selecting the best-performing model. Despite the effort, it was worthwhile, as the chosen model achieved a high level of accuracy.
To visually present reports, I designed a basic interface within the wiki, allowing users to see which links were reported today and in previous days. Additionally, I created a simple interactive interface that enables users to provide additional evaluations alongside the machine learning model's scores.
I promoted this tool to the Vietnamese Wikipedia community, and the overall reception has been quite positive, with many members actively using and contributing to the reports daily.
Overall, Citron/Spam has successfully implemented its core functionalities as initially planned, including monitoring RecentChanges, filtering edits, extracting links, automatically evaluating them using the machine learning model, posting reports to the wiki, and allowing users to interact for further evaluations.
The project's source code is publicly available on GitHub under the AGPL-3.0 license.
I have written a detailed user guide with numerous illustrations to help users understand how the tool works and how to use it effectively.
2. Documentation of your impact. Please use space below to share links that help tell your story, impact, and evaluation. (required)
Share links to:
- Project page on Meta-Wiki or any other Wikimedia project
- Dashboards and tools that you used to track contributions
- Some photos or videos from your event. Remember to share access.
You can also share links to:
- Important social media posts
- Surveys and their results
- Infographics and sound files
- Examples of content edited on Wikimedia projects
Since the initial focus of this project was on the Vietnamese Wikipedia community, for now, I have only written the most important documentation on this wiki.
The project introduction and user guide can be found at vi:User:Plantaest/Citron, where I have provided a detailed explanation of how the tool works, how to read reports, and how to use the interactive interface, with many illustrations that I personally designed. Additionally, the page includes technical specifications for the server, a description of the machine learning model, and other relevant information.
The recent reports page is available at vi:Wikipedia:Citron/Spam, which is a crucial resource for the project, helping administrators identify potentially harmful links and decide whether to add them to the blocklist.
I introduced this tool to the community in a public discussion at vi:Wikipedia:Thảo luận#Phát hành bản đầu tiên của dự án Citron/Spam.
Related images are stored in a Commons category at c:Category:Citron (software).
Additionally, share the materials and resources that you used in the implementation of your project. (required)
For example:
- Training materials and guides
- Presentations and slides
- Work processes and plans
- Any other materials your team has created or adapted and can be shared with others
I have documented key milestones in the development log at vi:User:Plantaest/Citron/Nhật ký.
The project's source code is hosted on GitHub at this link: https://github.com/plantaest/citron.
3. To what extent do you agree with the following statements regarding the work carried out with this Rapid Fund? You can choose “not applicable” if your work does not relate to these goals. Required. Select one option per question. (required)
A. Bring in participants from underrepresented groups | Not applicable |
B. Create a more inclusive and connected culture in our community | Not applicable |
C. Develop content about underrepresented topics/groups | Not applicable |
D. Develop content from underrepresented perspectives | Not applicable |
E. Encourage the retention of editors | Agree |
F. Encourage the retention of organizers | Agree |
G. Increased participants' feelings of belonging and connection to the movement | Agree |
F. Other (optional) |
Part 2: Learning
[edit]4. In your application, you outlined some learning questions. What did you learn from these learning questions when you implemented your project? How do you hope to use this learnings in the future? You can recall these learning questions below. (required)
You can recall these learning questions below: With this project, I hope to learn the following:
- Understanding how to reasonably apply machine learning to support evaluation processes
- Gaining insights into collaborating with the community to develop software
- Gaining a better understanding of Wikimedia's software systems
With the goal of "Understanding how to reasonably apply machine learning to support evaluation processes", I believe I have achieved it. In this project, I developed a machine learning model to assess the reliability of links. Through this model, users can quickly identify potentially harmful links. While they still need to manually verify before making a final decision, the machine learning model significantly speeds up and simplifies the evaluation process.
With the goal of "Gaining insights into collaborating with the community to develop software", I believe I have achieved it. Although community members were not particularly enthusiastic about discussions, they actively participated in testing my software and have continued using it daily since its release. Their contributions have been invaluable in helping me determine whether the tool functions effectively, allowing me to make improvements for the long term. According to the tool's database (as of January 31, 2025), 11 members of the Vietnamese Wikipedia community have participated in evaluating reports through the interactive interface and have contributed 578 evaluations.
With the goal of "Gaining a better understanding of Wikimedia's software systems", I believe I have achieved it. I have learned about Wikimedia's EventStreams technology—despite some difficulties during implementation, it is now running relatively smoothly. I have also gained a better understanding of Toolforge's infrastructure, making the deployment process more seamless thanks to my experience. Overall, these have been valuable experiences that will help me continue developing useful software for the Wikimedia Movement.
5. Did anything unexpected or surprising happen when implementing your activities? This can include both positive and negative situations. What did you learn from those experiences? (required)
Overall, things in this project went relatively smoothly, although I spent more time than expected building the machine learning model, as I wanted the model to have the lowest possible error rate. From this experience, I think I will need to allocate more time if I want to build another machine learning model for future projects.
6. What is your plan to share your project learnings and results with other community members? If you have already done it, describe how. (required)
I have written the documentation for this project on the Vietnamese Wikipedia, as mentioned in my answer to question number 2. I believe that the documentation is detailed enough for those interested to understand the tool and how to use it. I hope that after the tool has been used for a longer period, around one year, it can be expanded to other wiki communities.
Part 3: Metrics
[edit]7. Wikimedia Metrics results. (required)
In your application, you set some Wikimedia targets in numbers (Wikimedia metrics). In this section, you will describe the achieved results and provide links to the tools used.
Target | Results | Comments and tools used | |
---|---|---|---|
Number of participants | 10 | 11 | To obtain this data, I collected it from the citron_spam__feedback table in the tool's database using the following SQL query (January 31, 2025): SELECT COUNT(DISTINCT created_by) AS total_users FROM citron_spam__feedback; |
Number of editors | 1 | 1 | |
Number of organizers | 1 | 1 |
Wikimedia project | Target | Result - Number of created pages | Result - Number of improved pages |
---|---|---|---|
Wikipedia | 50 | 0 | 414 |
Wikimedia Commons | |||
Wikidata | |||
Wiktionary | |||
Wikisource | |||
Wikimedia Incubator | |||
Translatewiki | |||
MediaWiki | |||
Wikiquote | |||
Wikivoyage | |||
Wikibooks | |||
Wikiversity | |||
Wikinews | |||
Wikispecies | |||
Wikifunctions or Abstract Wikipedia |
8. Other Metrics results.
In your proposal, you could also set Other Metrics targets. Please describe the achieved results and provide links to the tools used if you set Other Metrics in your application.
Other Metrics name | Metrics Description | Target | Result | Tools and comments |
---|---|---|---|---|
9. Did you have any difficulties collecting data to measure your results? (required)
No
9.1. Please state what difficulties you had. How do you hope to overcome these challenges in the future? Do you have any recommendations for the Foundation to support you in addressing these challenges? (required)
Part 4: Financial reporting
[edit]10. Please state the total amount spent in your local currency. (required)
4950
11. Please state the total amount spent in US dollars. (required)
4950
12. Report the funds spent in the currency of your fund. (required)
Upload the financial report
12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)
13. Do you have any unspent funds from the Fund?
No
13.1. Please list the amount and currency you did not use and explain why.
N/A
13.2. What are you planning to do with the underspent funds?
N/A
13.3. Please provide details of hope to spend these funds.
N/A
14.1. Are you in compliance with the terms outlined in the fund agreement?
Yes
14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?
Yes
14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.
Yes
15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)
When I submitted this report, I realized I was one day late compared to the deadline, and this was my oversight.
Since this is a software project released at the end of December last year, I couldn’t submit the report earlier because I had to wait about a month to gather data from the software's operations.
I was under the impression that I would submit the report by the end of January, specifically on the 31st. However, I didn't notice that the report deadline was actually the 30th.
Therefore, I apologize for this mistake and will take it as a lesson for future projects. I hope this will not cause any major issues.
Review notes
[edit]Review notes from Program Officer:
N/A
Applicant's response to the review feedback.
N/A