Grants:IEG/Proofreading semiautomatically the Catalan Wikipedia with LanguageTool/Final
Welcome to this project's final report! This report shares the outcomes, impact and learnings from the Individual Engagement Grantee's 6-month project.
- 1 Part 1: The Project
- 2 Part 2: The Grant
- 3 Grantee reflection
Part 1: The Project
- The process of proofreading large amounts of text has been improved resulting in 270,000 edits (with spelling and grammar corrections) in the Catalan Wikipedia.
- A test has been made in the Spanish Wikipedia to show that the same approach can be used in other languages.
Methods and activities
- The activities described in the midpoint report have continued.
- The scripts have been adapted to be multilingual and a full test has been run on the Spanish Wikipedia. A bot request has been approved in the Spanish Wikipedia and I started using it.
Outcomes and impact
- The process of proofreading large amounts of text has been improved.
- A great deal of edits has been made on the Catalan Wikipedia (270,762 edits in 127,753 articles). This work will continue with current and future content.
- A test has been made in the Spanish Wikipedia to show that this approach can be used in other languages.
Progress towards stated goals
|Planned measure of success
(include numeric target, if applicable)
|Number of edits made in Catalan Wikipedia articles, of the order of hundreds of thousands (as a rough estimate, 400,000 edits).||270,762 edits in the Catalan Wikipedia between January 2016/July 2016.||The final number of edits is lower than estimated. The number of edits doesn't correspond proportionally to the time needed to do them. Some corrections are very easy, but others need much supervision and are more time-consuming.|
|A test (without edits) is made in another language.||A test has been done in Spanish. 14,084 edits done with an approved bot.||This result is better than planned, because actual edits were not on the first project proposal.|
|Code and documentacion is available.||Code is available on Github with documentation.|
|Annoucements are made to reach other Wikipedias.||So far contributions have started in the Spanish Wikipedia. There are talks with Wikipedia users interested in improving grammar and spelling in the Spanish Wikipedia.|
Think back to your overall project goals. Do you feel you achieved your goals? Why or why not?
|1. Number of active editors involved||1|
|2. Number of new editors||0|
|3. Number of individuals involved||1|
|4. Number of new images/media added to Wikimedia articles/pages||0|
|5. Number of articles added or improved on Wikimedia projects||127,753 articles improved on Catalan Wikipedia.
13,150 articles improved on Spanish Wikipedia.
|6. Absolute value of bytes added to or deleted from Wikimedia projects||0||Not relevant. Correcctions don't involve substantial changes to content.|
- Learning question
- Did your work increase the motivation of contributors, and how do you know?
Indicators of impact
Option B: How did you improve quality on one or more Wikimedia projects?
- The goal of this project is to improve the linguistic quality of the Wikipedia content. This is specially needed in some Wikipedias like the Catalan one. The success is easily quantifiable with the number of edits and articles improved.
- We have also shown that this approach can be used on other Wikipedias like the Spanish one.
- Collection of scripts used in this project (on GitHub), with documentation.
- Fork of LanguageTool used in this project (on GitHub)
- Example of file generated for making corrections in the Spanish Wikipedia.
- User page of the bot in the Spanish Wikipedia.
What worked well
- Learning pattern: Proofreading large amounts of text
What didn’t work
- Sentences in other languages (e. g. titles or quotations in Spanish or French inside a Catalan text) can slow down a lot the supervision process. More sophisticated methods of language detection could be used, but the results (speeding up of the supervision process) are uncertain.
Next steps and opportunities
- The work done in this project was much needed for the Catalan Wikipedia. This work will continue with current and future content.
- Proofreading in other languages will require contributors with a deep knowledge of the language and a good intuition. Improvements in the grammar checker (LanguageTool) will also be necessary for some languages, as the support is uneven.
Part 2: The Grant
|Expense||Approved amount||Actual funds spent||Difference|
|Project development (for six 40-hour work weeks)||3,000 EUR||0||0|
|Server infrastructure costs (Amazon Web Server)||0||65.33 $||- 65.33 $|
Note: Probably the small expenses in server infrastructure could have been saved using Wikimedia Labs, but I got a bit lost with the confusing terminology.
Do you have any unspent funds from the grant?
Please answer yes or no. If yes, list the amount you did not use and explain why.
If you have unspent funds, they must be returned to WMF. Please see the instructions for returning unspent funds and indicate here if this is still in progress, or if this is already completed:
Please answer yes or no. If no, include an explanation.
Confirmation of project status
Did you comply with the requirements specified by WMF in the grant agreement?
Please answer yes or no.
Is your project completed?
Please answer yes or no.
We’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being an IEGrantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the IEG experience? Please share it here!