User:Quiddity (WMF)/CX outreach

From Meta, a Wikimedia project coordination wiki

Next steps for the translation tool[edit]

Content translation has been successful in supporting the translation process on many Wikipedia communities. In order to better support the needs of translators we plan to enable machine translation support for Italian during the next weeks.

Content translation provides machine translation as initial content for editors to review and improve. The machine translation is provided as a starting point, and translators are highly encouraged (the tool displays warnings) to rewrite the content, in order to eliminate errors and make the translation sound more natural. The use of machine translation is optional in the tool, but it is regularly used for the languages where it is available as a convenient starting point.

Based on the results of other large Wikipedias where machine translation is available, the translations produced are less likely to be deleted than the articles started from scratch. For example, on French Wikipedia during 2018 the deletion ratio was 6% for articles created with Content Translation, and 23.6% for new articles created otherwise. On Spanish Wikipedia during 2018, the deletion ratio was 12.5% for translations, and 33.2% for new articles created otherwise.

Currently machine translation is not enabled for Italian yet, and it has been frequently requested by volunteers. We plan to expand the machine translation support for Italian given the recent improvements:

  • New quality control mechanisms. The new version of Content translation (enabled by default in January 2019) provides more quality control mechanisms for machine translation. Now the tool encourages translators to review the initial automatic translations on a paragraph basis, keeps in a tracking category those translations published with unmodified content for editors to review, and prevents publishing those which exceed the limits defined. The limits to prevent publishing become more strict for users with previous deleted translations, users ignoring the warnings, and cases where several paragraphs contain unmodified contents. In this way, the limits adapt to reduce potential recurrent misuse of the tool.
  • New translation services available. We have also extended the existing translation services by integrating Google Translate, which is considered to provide good translation quality for many language combinations. All translation services are integrated in a safe way where only Wikipedia content and no user information is shared with external services, respecting the user privacy.

We believe that an initial machine translation, with adequate quality control mechanisms, makes it easy for translators to create higher quality translations more easily. Current observations show that during the last month for Italian Wikipedia most of the translations (80.19%) were published with the expected level of user modifications compared to those that were added to the tracking categories for more careful review by the community (19.81%). Note that in many of those a successful machine translation may result in less modifications than anticipated. For the more extreme cases, with 99% or more of unmodified contents for the whole document, or for translations including ten or more paragraphs exceeding 80% of unmodified contents each, the tool directly prevents publishing the translation.

We want the editor community to evaluate the new capability and share feedback about how it helps to create good quality translations. In order to better understand the overall impact, we are providing more details and mechanisms to measure the effects in the content creation process. As machine translation is enabled for Italian we’ll measure with special attention what is the impact in terms of content created and user feedback. Feel free to reply with your feedback in this thread.

Thanks!