Language Engineering[edit]


Time: 17:00-18:00 UTC
Channel: #wikimedia-office
Timestamps are in UTC.

17:00:12 <arrbee> #startmeeting Language Engineering monthly office hour - November 2014
17:00:59 <arrbee> Hello everyone, Welcome to the monthly office hour of the Language Engineering team of the Wikimedia Foundation
17:01:05 <Niharika> o/ Hi arrbee!
17:01:13 <Niharika> And everyone else.
17:01:17 * Romaine greets everyone
17:01:23 <arrbee> I am Runa, the Outreach co-ordinator for our team
17:01:34 <arrbee> Hello Niharika!! Nice to see you
17:01:42 <Niharika> :)
17:01:44 <arrbee> Hello Romaine
17:02:00 <arrbee> Before everything else, we start with the standard message
17:02:18 <arrbee> IMPORTANT: The chat today will be logged and publicly posted. (See big bold note on the channel topic area)
17:02:48 <arrbee> Our last office hour was held on October 15th, 2014
17:02:58 <arrbee> logs at:
17:03:01 <arrbee> #link
17:03:22 <arrbee> A quick introduction :)
17:03:52 <arrbee> Our team builds and maintains language features and tools for the wikis in more than 300 languages
17:05:06 <arrbee> Present along with me today are aharoni kart_ jsahleen and Nikerabbit
17:05:40 <arrbee> We are a truly global team and work from several places in different countries around the world
17:05:58 <kart_> (sometime from island too :))
17:06:29 <arrbee> Our team page on has details about our projects and how you can participate in them
17:06:35 <arrbee> #link
17:06:49 <arrbee> kart_: :P
17:07:02 <Nikerabbit> morning ;)
17:07:23 <jsahleen> Hello
17:07:24 <aharoni> Olá!
17:07:31 <Niharika> :)
17:07:39 <arrbee> In our last office hour, we had mentioned about the second version of Content Translation that we completed on September 30
17:07:45 <Niharika> Namaste.
17:08:09 <arrbee> However, we had to delay the deployment and availability until some technical issues could be fixed in the beta servers
17:08:29 <arrbee> That was sorted earlier this month
17:09:17 <arrbee> and we now have Catalan, Spanish and Portuguese translation support on the shiny new version
17:09:27 <arrbee> supported through Apertium
17:10:04 <arrbee> you can try playing with the tool at:
17:10:08 <arrbee> #link
17:10:47 <arrbee> The page looks a bit empty currently due to a bug that we recently caught (related to Parsoid)
17:11:39 <arrbee> We have more details about this update in one of our recent blog posts:
17:11:41 <arrbee> #link
17:12:48 <arrbee> If you are interested to know about the technical issues that delayed us, there is another post that will published later today or tomorrow on the Wikimedia blog
17:13:05 <arrbee> Do please check back there for the details
17:13:39 <arrbee> For the next round we are looking at some other language pairs which are supported on Apertium
17:14:07 <arrbee> And may have a reasonable number of bilingual users who can make use of the tool
17:14:48 <arrbee> We are planning to do a survey sometime this month to better understand which language pairs can be prioritised
17:15:07 <arrbee> So do please let us know of any suitable candidate language pairs
17:15:39 <arrbee> Romaine has already suggested Dutch to Afrikaans, so thats one of the language pairs in evaluation right now
17:16:07 <Romaine> Dutch <-> German I would like even better
17:16:41 <arrbee> Romaine: okay
17:17:06 <arrbee> What we will do next is to check how well Apertium supports these language pairs
17:17:15 <kclau> Is Chinese <-> English possible on Apertium?
17:17:26 <arrbee> in this case Dutch to German and the other way round
17:17:26 <Romaine> both languages are very similar but many users do not speak the other language
17:17:39 <arrbee> kclau: most probably not. Let me check though
17:17:53 <jsahleen> kclau: Unfortunately, no.
17:18:27 <arrbee> #link
17:19:13 <arrbee> Romaine: Dutch <-> German is in the incubator stage right now
17:19:44 <arrbee> We are concentrating on the released pairs for now
17:20:01 <arrbee> Mainly for reliability
17:21:05 <arrbee> But do lookout for the survey sometime in the next few weeks
17:21:08 <Romaine> Yes, I saw Dutch <-> German in the list
17:21:27 <kclau> too bad. Will other translation machines be supported in the future? can do online translation for Chinese <-> English.
17:22:06 <jsahleen> kclau: we are looking into other machine translation engines, but there are technical and legal issues to work out.
17:22:46 <Romaine> and reliability
17:23:58 <arrbee> Yes
17:23:59 <aharoni> Yes - they are not Free Software, so we need approval from the legal department and from the community.
17:25:12 <aharoni> Technically it's not too complicated, but there are issues such as - how to comply with the terms of use of these engines, how to credit them, how to ensure our editors' privacy, whether we want to use non-free software at all, etc.
17:26:12 <arrbee> Our major concern while looking at new language pairs is that the translation engine has to really give us meaningful workable output of high quality
17:27:16 <arrbee> So far we have seen wonderful response for the Spanish-Catalan pair
17:27:38 <arrbee> We are now looking to see how the other pairs are working
17:28:17 * arrbee wonders if there are any Portuguese translators/editors around here today
17:28:43 <Oscar_> Hi all, how good is the accuracy of the translation?
17:29:01 <arrbee> Oscar_: did you have any particular language pair in mind?
17:29:21 <aharoni> Oscar_: the accuracy of machine translation must be judged by the Wikipedia editors who know that language.
17:29:40 <aharoni> We enable a machine translation engine if people who know that language tell us that they think that it's good enough.
17:29:48 <Oscar_> Spanish, I'm from Venezuela (greetings all :))
17:29:53 <Romaine> is there a way to stimulate a certain language pair, to get out of nursery and becoming a trunk one?
17:30:00 <arrbee> Oscar_: Welcome :)
17:30:16 <aharoni> And even after that we very strongly encourage to check all the translations that are produced by the machine translation engine,
17:30:19 <arrbee> Oscar_: The Catalan editors were very happy with the translation quality they got from Spanish
17:30:32 <aharoni> and we discourage people from publishing 100% machine translation.
17:30:49 <arrbee> Oscar_: They found the MT and editor working well for their editing workflow
17:31:03 <aharoni> So even if the translation from Spanish to Catalan makes the Catalan speakers happy, as arrbee says, we still strongly suggest to double-check every paragraph.
17:31:39 <arrbee> Oscar_: Someone even mentioned that they observed they could cut down approximately 30% of the time they earlier needed
17:32:39 <arrbee> Oscar_: But we are waiting for feedback from users on how well the Spanish<->Portuguese and Catalan <-> Portuguese translation works
17:33:12 <arrbee> Oscar_: so you are most welcome to try out if you do speak any of the other 2 languages :)
17:33:58 <arrbee> Romaine: Apertium surely has a process for that
17:34:04 <arrbee> kart_: would you know?
17:35:04 <aharoni> Romaine: Yes, you need to contact Apertium developers.
17:35:39 <aharoni> They need people who know the two languages well to go over the dictionaries and the lists of grammar rule transformations and to test the quality of the translations.
17:36:05 <aharoni> There's also an #apertium channel.
17:36:11 <aharoni> Tell them that we sent you :)
17:36:17 <arrbee> lol
17:36:19 <kart_> +1 aharoni. Most important part is Grammar rule, avaibility of dictionaries and testing.
17:37:27 <arrbee> Thanks aharoni kart_
17:37:50 <arrbee> It would be a big help if more language pairs are stabilized on Apertium
17:38:31 <arrbee> Meanwhile, there are also some updates from the new features front
17:39:31 <arrbee> We plan to provide a dashboard for translators to see all their translations and status
17:40:05 <arrbee> Also a way to save the translations as draft and resume later
17:41:07 <arrbee> Since Content translation (by definition) is a cross wiki process, the dashboard will be common for all wikis
17:41:39 <arrbee> i.e. users can access the same dashboard from any wiki
17:42:05 <arrbee> We have a few illustrations for the dashboard
17:42:26 <arrbee> 1. Regular state:
17:42:49 <arrbee> 2. Initial state:
17:43:29 <arrbee> The dashboard will also provide suggestions for starting new translations
17:44:04 <arrbee> The third illustration is for what happens after the user clicks on a suggestion
17:44:11 <arrbee> 3.
17:44:34 <arrbee> This is still under development
17:45:31 <arrbee> Going forward, we may also allow translators to resume draft translations made by another translator
17:46:00 <arrbee> jsahleen: aharoni: Do you have anything to add to this?
17:46:58 <aharoni> another feature that we hope the editors' communities will like is this:
17:49:31 <aharoni> if we suspect that an article has a lot of machine-translated text, but the translator insists on publishing it, then a category will be added to it,
17:50:12 <aharoni> so that the community will be able to check it and improve it or delete it if necessary.
17:51:40 <arrbee> oh and btw, we also added categories in the new version
17:52:18 <arrbee> That had been one of the requested features
17:52:49 <aharoni> yes - categories are adapted automatically.
17:52:56 <aharoni> this is one of my favorite features ;)
17:53:04 <aharoni> developed mostly by Joel [ jsahleen ]
17:53:16 <jsahleen> Yes, categories are now automatically adapted and you can manage them (add delete) from the interface.
17:53:24 <jsahleen> My pleasure. ;)
17:53:56 <aharoni> we just noticed that that's one of the first things that translators do when they translate an article, so we are trying to do this automatically and save the translators a few minutes.
17:54:46 <Romaine> aharoni: that can be difficult as often categories do not match between Wikipedias
17:55:14 <aharoni> We do this if there is a directly corresponding category according to the interlanguage links.
17:55:34 <Romaine> I often have to pick a category higher in the category tree when translating
17:55:53 <aharoni> Yes, I know that this happens often,
17:56:03 <aharoni> and it's possible that we'll implement something like that in the future,
17:56:07 <jsahleen> Romaine: We adapt categories in much the same way we adapt links. Only those categories for which there is an equivalent category in the target are carried over.
17:56:12 <aharoni> but for now we only do directly corresponding categories.
17:56:40 * Oscar_ still frightening to translate something to catalan :-)
17:57:02 <Romaine> are links only adapted and inserted if the article is available in the language it is translated to?
17:57:16 <arrbee> Oscar_: what about Portuguese? :)
17:57:51 <arrbee> Oscar_: oh btw.. you can even translation from Catalan to Spanish. We enabled that direction of translation too.
17:57:54 <jsahleen> Romaine: Yes
17:58:01 <Romaine> ok
17:58:24 <Oscar_> That would be more easy arrbee
17:58:33 * arrbee notices we have 2 more minutes left to end the hour
17:58:37 <arrbee> Oscar_: :)
17:59:20 <Niharika> Are there gonna be any OPW interns working with them team in the upcoming round?
17:59:26 <Niharika> the*
18:00:05 <aharoni> Niharika: not announced yet :)
18:00:14 <arrbee> Niharika: We hope so, but do wait for the announcements from Quim
18:00:28 <Niharika> Ah, okay. :)
18:00:29 <arrbee> okay ... we are out of time today
18:00:43 * arrbee thinks today the hour ended faster
18:00:56 <arrbee> Thanks everyone for coming.
18:00:57 <jsahleen> Thanks, everyone.
18:01:12 <arrbee> If nothing changes, our next office hour will be on December 10, 2014, but do lookout for the announcements for the exact date
18:01:26 <arrbee> Our mailing list is and IRC channel is #mediawiki-i18n
18:01:44 <arrbee> I will post the logs the metawiki in some time
18:01:47 <arrbee> Thanks again
18:01:47 <Romaine> thanks all for the work!
18:01:55 <arrbee> #endmeeting

