Language Engineering[edit]


Time: 17:00-18:00 UTC
Channel: #wikimedia-office
Timestamps are in UTC.
17:00:08 <arrbee> #startmeeting Language Engineering monthly office hour - July 2014
17:00:32 <arrbee> Hello, Welcome to the monthly office hour of the Wikimedia Language Engineering team
17:00:41 <Pavanaja> Hi Alolita
17:00:42 <matanya> hi
17:00:58 <arrbee> I am Runa, the Outreach co-ordinator for our team
17:01:00 <BPositive> Hello!
17:01:06 * GunChleoc waves
17:01:30 <arrbee> Our office hours are held every 2nd Wednesday of the month, but we could not host one last month
17:01:45 <alolita> Hi all!
17:01:57 <Nikerabbit> morning
17:02:04 <arrbee> Our last office hour was held on May 21, 2014 (delayed due to travels). The logs are at:
17:02:18 <arrbee> #link
17:02:40 <arrbee> But first a very important message
17:02:50 <arrbee> IMPORTANT: The chat today will be logged and publicly posted
17:03:04 <arrbee> It has also been mentioned on the channel topic
17:03:40 <arrbee> The Wikimedia Foundation's Language Engineering team builds language features and tools to support our wiki communities across the world
17:04:20 <arrbee> We are a distributed team and operate from various locations around the world (timezone yayness!)
17:05:02 * yuvipanda waves, waits for QA
17:05:40 <arrbee> Along with me, present from my team are alolita aharoni kart_ divec pginer Nikerabbit santhosh
17:06:16 <arrbee> and together we will be hosting the session today
17:06:25 <alolita> arrbee - thanks for the intros :-)
17:06:57 <arrbee> We would like to give a quick update about our recent work and end the hour with an open session and Q&A
17:06:59 <GunChleoc> I had a peek at your plans for the CAT tool and they look promising. I'm a localizer rather than a content writer, but I think I might have 2 cents to add here r there ;)
17:07:20 <arrbee> GunChleoc: we are all ears :)
17:07:49 <GunChleoc> I'll let you do your presentation first and wait for the open session with any comments
17:08:19 <arrbee> GunChleoc: Thanks. Do pitch in at any point where you have inputs.
17:08:26 <GunChleoc> OK
17:08:33 <arrbee> In our last office hour we spoke about the Content Translation tool and our focus for the first release
17:09:02 <arrbee> We would like to take that discussion forward today and present more details on how the project has been progressing
17:09:18 <arrbee> For people who missed the earlier discussion:
17:09:38 <arrbee> #info The Content Translation tool is a way to create new Wikipedia articles from existing articles in another language
17:10:18 <arrbee> It consists of an editing interface and translation tools, which make the translators' work more efficient, such as a dictionary, link adaptation, limited machine translation, etc.
17:10:43 <arrbee> It will complement the Translate extension, with which many of you are already familiar
17:11:06 <arrbee> This tool is targetted for wiki pages and addesses complexities like links, references, templates, etc
17:11:37 <arrbee> Users can create an initial version of an article from another language, which can then be published and edited like any other article
17:12:16 <arrbee> Users will be able to 'Publish' an article in their namespace on the target Wiki in the format User:UserName/ArticleName
17:13:05 <arrbee> After an issue related to publishing articles on the main namespace is resolved, users will be able to load the unfinished articles from their namespace in the target wiki and publish it as a regular article
17:13:31 <arrbee> At present we are at the last stages of developing the version that we planned as the minimal viable product (MVP)
17:14:24 <arrbee> Initially, the beta instance of ContentTranslation will be able to help in translation between Spanish (es) and Catalan (ca) using Machine Translation and Dictionary tools
17:15:06 <arrbee> We are already testing the tool thoroughly with users who had signed up earlier and we hope to gather more feedback after the MVP version deployment is complete
17:15:20 <arrbee> The link to participate in the testing is:
17:15:27 <arrbee> #link
17:15:58 <arrbee> Future stable releases will of course support more language pairs and many more features
17:16:00 <GunChleoc> Concerning dictionary tools, do you know this site?
17:16:47 <arrbee> I believe we did look at it at some point
17:16:55 <arrbee> santhosh: amir: ^^
17:17:04 <arrbee> aharoni: ^^
17:17:31 <aharoni> I haven't looked at this particular site, thanks for the link
17:17:57 * arrbee checks if divec is around
17:18:16 <arrbee> Thanks for the link GunChleoc
17:18:38 <arrbee> If you are planning to be at Wikimania next month, you can hear more about the project at this talk:
17:18:45 <arrbee> #link
17:19:36 <arrbee> GunChleoc: you could perhaps share what you thought about the project so far
17:21:02 <GunChleoc> I think it's a great ide. I only went though it fairly quickly, but what I have seen so far looks sound. It will help casual editors of small language Wikipedias as well, because we won't have to deal with learning so much about wikitext and templates - especially when the visual editor reaches gold stage
17:22:29 <GunChleoc> In the long run, I think we might also want some form of changes tracker for the source article, so the translator can implement corrections/improvements on the original article
17:24:02 <Nikerabbit> that's quite far away still :)
17:24:10 <aharoni> GunChleoc: we are thinking about such a thing, but this is complicated
17:24:11 <arrbee> GunChleoc: That is a good point to keep in mind for the feature thats called 'Translation Center' (working title at the moment)
17:24:29 <GunChleoc> I am working for a small language where machine translation isn't available, so I was thinking something akin to wordlink might be useful
17:24:37 <arrbee> But as Nikerabbit said.. its a long way
17:24:44 <GunChleoc> Yes, definitely for the future
17:25:09 <aharoni> we specifically decided that the first release of the tool will focus only on creating a first version of an article, otherwise the project will be too big
17:25:20 <GunChleoc> Makes sense
17:25:49 <GunChleoc> I'm just throwing ideas out there, you know best what is doable and when it fits into the project. it's a big task
17:26:06 <arrbee> GunChleoc: May I ask which languages do you generally work on?
17:26:18 <GunChleoc> Scottish Gaelic
17:26:36 <arrbee> ahh
17:27:05 <aharoni> GunChleoc: I'd love, actually, to hear proposed solutions for languages that don't have any machine translation or for languages where it is too bad to be useful.
17:27:35 <GunChleoc> Dictionary support and translation memories is all we can do for these languages
17:27:47 <aharoni> for quite a lot of languages it would make sense to simply copy the text in the source language with adapted links and templates and let the translator do the rest.
17:28:06 <aharoni> link adaptation alone saves the translator a lot of time
17:28:13 <GunChleoc> The wordlink project allows you to click on words on a website and it loads into a dictionary, using multidict. Something like that would speed up things I think
17:28:33 <aharoni> but for right-to-left languages it's barely helpful, because editing English in right-to-left is not comfortable.
17:29:08 <aharoni> for Hebrew, Arabic and Persian there is machine translation, but it's not Free Software and the quality is not great,
17:29:08 <GunChleoc> Yes ,link adaptation sounds like an important feature. I hardly do any article editing, mind. I mostly do software localization
17:29:21 <aharoni> and for other rtl languages such as Pashto and Kashmiri there's nothing,
17:29:31 <aharoni> hi abartov :)
17:29:41 <abartov> howdy, aharoni
17:29:57 <aharoni> so what I thought is to simply translate it in a silly way word-by-word initially, and it will be very bad of course,
17:30:03 <aharoni> but at least it will be right-to-left
17:30:06 <GunChleoc> aharoni: are you talking about link adaptation or clickable dictionary here?
17:30:14 <aharoni> no, neither
17:30:22 <aharoni> link adaptation itself is the easiest part
17:30:28 <aharoni> I'm talking about the rest of the text
17:30:36 <Pavanaja> Is it something we can download, install and work offline also?
17:30:46 <aharoni> Pavanaja: what are you referring to?
17:31:02 <Pavanaja> The content translation tool
17:31:18 <arrbee> Pavanaja: it will need a setup. You can refer to this link:
17:31:22 <arrbee> #link
17:31:29 <GunChleoc> aharoni: You would have to mirror all elements I guess... swap any inserted boxes from left to right and vice versa etc
17:31:52 <aharoni> yeah, but putting an English string in an RTL box is not so helpful.
17:32:02 <aharoni> anyway... we are going to have to tackle it very soon.
17:32:12 <GunChleoc> If words are tokens and translation is word by word, just reversing the array will do the trick - but is word order in the language differs, it will be a horrible mess anyway
17:32:40 <GunChleoc> If you think of the English string in the box as a placeholder, at least the coded elements will be there
17:33:03 <aharoni> Pavanaja: ContentTransation will be fully integrated in with the Wikipedia sites
17:33:12 <GunChleoc> I'm thinking tables and stuff, which then can be traanslated cell by cell
17:33:16 <aharoni> to translate Wikipedia articles you won't have to install anything
17:33:38 <aharoni> of course, if you have your own wiki, you'll be able to install ContentTranslation there as an extension
17:34:49 <Pavanaja> @aharoni - ya, that's what I was thinking
17:36:59 <arrbee> GunChleoc: I am sure pginer.. our UX interaction designer would love to know more about these points
17:38:35 <GunChleoc> He's welcome to ping me, I usually have an IRC client running
17:38:43 <arrbee> We will make more announcements about the availability of the ContentTranslation instance as we close in on the dates
17:38:51 <arrbee> Thanks GunChleoc :)
17:39:02 <pginer> Thanks GunCheleoc
17:39:10 <pginer> We’ll talk in more detail
17:39:47 <arrbee> If there are no more questions about ContentTranslation… we can move over to the next part
17:40:03 <arrbee> Some updates from the GSoC students
17:40:10 <GunChleoc> pginer: K. Just note that i don't speak any RTL languages, this was just me imangining being one ;)
17:40:25 <arrbee> Nikerabbit: BPositive: Please go ahead :)
17:40:38 <BPositive> Thanks arrbee
17:40:50 <BPositive> Hi everyone, I am BPositive (Pratik Lahoti) from Pune, India. I have been working on the "Tools for mass migration of legacy translated wiki content" project under Google Summer of Code (GSoC)
17:41:11 <BPositive> My mentors are Nikerabbit and Nemo
17:41:42 <Nikerabbit> ;)
17:41:50 <BPositive> For those who are unaware of it, the project is about automating the manual task of 1) Preparing the page for translation 2) Importing old translations (legacy wiki content) into the Translate extension at Special:Translate
17:42:10 <BPositive> You can find the project proposal over here:
17:42:28 <arrbee> #link
17:42:54 <BPositive> We used to have daily meetings on #mediawiki-i18n until recently when I got a bit inactive
17:43:08 <BPositive> anyway
17:43:57 * arrbee waves to sucheta
17:44:12 * sucheta waves back
17:44:21 <sucheta> Hello, world!
17:44:29 * yuvipanda asks something unrelated
17:44:38 <yuvipanda> does the i18n team have cycles to fix issues in the Translate extension?
17:44:55 <Nikerabbit> BPositive: some of this is already on, right?
17:44:56 <yuvipanda> Nikerabbit: and aharoni are aware, I think, but a lot of languages aren't available for Translation on translatewiki for the Android app
17:45:08 <liangent> yuvipanda: zh-cn you mean?
17:45:19 <yuvipanda> liangent: anything with variants
17:45:23 <yuvipanda> liangent: and zh-cn counts, yeah
17:46:17 <Nikerabbit> yuvipanda: is this just about mapping the language code or something more complicated?
17:46:18 <liangent> yuvipanda: and any two-letter and three-letter language code issue?
17:46:32 <arrbee> BPositive: do continue
17:46:36 <yuvipanda> liangent: the three letter one is in Android itself, not much we can do about that
17:46:41 <yuvipanda> Nikerabbit: mapping, I think
17:46:53 <yuvipanda> arrbee: ah, sorry if I interrupted. I can wait :)
17:47:17 <arrbee> yuvipanda: no worries. I believe we can track multiple conversations. :)
17:47:23 <yuvipanda> :D
17:47:24 <yuvipanda> ok
17:47:34 <Nikerabbit> yuvipanda: that should take like 10 minutes, we can do it after the meeting
17:47:41 <yuvipanda> Nikerabbit: alright!
17:47:58 <Nikerabbit> looks like BPositives Internet disappeared on critical moment...
17:48:07 <arrbee> oh ouch
17:48:15 <arrbee> yuvipanda: Thanks for bringing that up though
17:48:31 <Nikerabbit> but some of his work can be used on for example by translation admins:
17:48:50 <Nikerabbit> feedback how the tool works is much wanted :)
17:49:17 <arrbee> Nikerabbit: feedback through bugzilla?
17:49:39 <Nikerabbit> arrbee: for bugs yes, or just find him/Nemo/me on #mediawiki-i18n
17:49:49 <arrbee> okay. Thanks
17:49:56 <arrbee> vikas: are you around? would you like to give a quick round up of your project?
17:50:02 <arrbee> aharoni is mentoring vikas
17:50:13 <vikas> hello arrbee
17:50:23 <arrbee> Hello vikas :)
17:50:44 <vikas> Hello ! I am Vikas S Yaligar from Bangalore. I am working on "Automatic cross-language screenshots" project (proposal:
17:51:00 <vikas> For people who don't know about my project:
17:51:07 <vikas> Currently images in User guides of extensions are create manually for different languages, for Eg:
17:51:15 <arrbee> #link
17:51:37 <vikas> So my task is to automate the whole process, where all one(Documentation maintainer) has to do is click a link and all the images in related to that user guide is updated for different languages.
17:52:33 <vikas> Currently we are concentrating on VisualEditor User guide (
17:53:24 <aharoni> (Actually, VisualEditor is probably the extension with the most comprehensive and thought-out *user* guide. There are extensions with more documentation, such as Semantic MediaWiki and Translate, but their documentation is less oriented to general-public end-users.)
17:54:13 <arrbee> vikas: That is very useful! Can I suggest creating a screencast of entire process for wider circulation of the tool?
17:54:40 <vikas> The screenshots are uploaded to commons (you can find them in
17:54:55 <arrbee> #link
17:54:59 <arrbee> Thank you
17:55:00 <vikas> aharoni: yup  :)
17:55:13 <liangent> vikas: what about any integration with Extension:Translate?
17:55:28 <aharoni> the text of the VisualEditor user guide is translatable using the Translate extension, and it is indeed translated to many languages already,
17:55:28 <liangent> so pages like will not be needed anymore
17:55:39 <aharoni> what vikas helps with is the screenshots
17:55:41 <GunChleoc> 8-)
17:56:09 <aharoni> and yes, liangent, that is *precisely* the purpose of vikas's project: to make such pages unnecessary and fully automated
17:56:47 * arrbee hates to spoil the party.. but we have 5 more mins left on this channel
17:57:02 <aharoni> vikas, can you give a few links with examples of how some translated images already appear in the VE user guide in some languages?
17:57:30 <vikas> Yup ! thank you aharoni. We were able to test some screenshots in ( Lists & indentation part
17:58:14 <vikas> When you go to you can see that image is converted to hebrew :)
17:58:29 <aharoni> vikas: what other languages do we already have?
17:58:35 <vikas> That is this image: ""
17:58:48 <vikas> is changed to this one:
17:59:55 <vikas> Currently for that image we have: 10 languages
18:00:05 <vikas> you can check all of them here =>
18:00:15 <arrbee> #link
18:00:33 <MissGayle> Hey folks!
18:00:33 * marktraceur kicks LE team
18:00:36 <marktraceur> :P
18:00:36 <arrbee> vikas: aharoni : Thank you for that update. This looks very nicely done.
18:00:40 <arrbee> And we have to leave
18:00:43 <arrbee> :)
18:01:02 <arrbee> Thanks everyone and lets move to #mediawiki-i18n
17:01:06 <arrbee> #endmeeting

