Wikimedia monthly activities meetings/Quarterly reviews/Language Engineering/September 2014
The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's Language Engineering team, September 3, 2014, 9AM - 10:30AM PDT.
Present:
Lila Tretikov, Erik Moeller (from 9:20), Tilman Bayer (taking minutes)
Participating remotely:
(The Language Team is remote to SF. Only Alolita is based in SF.)
Alolita Sharma, Amir Aharoni, Joel Sahleen, Kartik Mistry, Niklas Laxström, Pau Giner, Runa Bhattacharjee, Santhosh Thottingal
Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material
Agenda:
Introductions and Team Overview - Alolita
Q4 2013-14 Project Goals and Accomplishments - Team
Q1 2014-15 Project Goals, Accomplishments, Timelines - Team
Reusable i18n and L10n tools - Santhosh
Inter-team dependencies - Alolita
Looking to the Future - What’s on our roadmap? - Amir
Questions & Answers - All
Introductions and Team Overview
[edit]Alolita:
Welcome
Agenda: ...
(in this meeting about previous q, then this current q)
Team introductions
Amir: ...
David: reporting into features team, might switch there
Joel: ...
Kartik: ..., devops liaison, MediaWiki language extension bundle (MLEB)
Niklas: developed Translate extension
Pau: was first UX engineer in WMF
Runa: QA, outreach, responsible for all communications (no support from CE team yet), scrum master
Santhosh: responsible for much of the architecture
This team has been remote since day 1
Everyone is very aware of standards processes (Unicode, W3C: ....)
(One of?) the first to adopt Agile within WMF
Releasing our i18n/l10n tools for third parties
Collaborate internally (Features, Mobile, Wikidata, ...) and externally (RedHat, Google, Mozilla, ...)
Uniquely diverse team
mandate:
1. language support for WM sites
2. to other engineering teams
3. be exemplary open source citizens
Impact: 1. users, 2. developer community, 3. WMF product
Q4 2013-14 Project Goals and Accomplishments
[edit]Runa:
April-June engineering summary
1. projects, 2. processes
team underwent some changes during last 6 months, changed process
Requests come in via Bugzilla, Gerrit, team communications
i18n support for VE, Mobile
(slide) shades = importance, most important: Content translation (CX)
ULS: deployed in most wikis without webfonts (fallout of earlier performance issue, which needs attention from our team and Ops)
Translate extension: in maintenance mode, outside community helps, critical bugs always fixed immediately
MLEB (bundle): we benefit from Kartik's experience as longtime Debian package maintainer,
Process: (lower part of slide)
received Agile coaching, most important outcome: strict release planning
rollout: support from CE, CA, proceed cautiously
use village pumps/ MassMessage very widely
Erik: talking to Rachel about liaison on her team? yes
she is open to that idea
(Runa:)
this q (July-Sep):
Content Translation still main focus, but more attention to other areas
Team speaks 15-20 languages, but is still limited in capability to communicate with all lang communities [Intended message was: Like most internationalization teams, we are also physically limited in our knowledge of languages as native or secondary speakers. Community engagement is important to augment our understanding of the languages we support]
Amir:
Content Translation (CX)
(explains screenshot)
in Q4, implemented dictionaries, (wiki)link adaptation, machine translation integration, references support
for performance, caching (meeting in May), got lots of support from Gabriel about services
Pau:
CX design timeline (slide)
Lila: why did the priorities switch in oct/dec 2013, after definition of success metrics?
Pau: team was busy with other projects
Lila: where is that documented?
Pau: on mediawiki.org
prototypes are currently on Github
Design Doc: https://commons.wikimedia.org/wiki/File:Content-translation-designs.pdf
Q4 CX UX goals were:
1. translation workflow (from start to completion)
2. more control over (wiki)link and translation services
3. confirm assumptions (creating a new article is natural use case, more advanced scenarios - e.g. moving paragraphs - not in scope)
Project Goals, Accomplishments, Timelines
[edit]Q1 goals are:
1. more robust and fluent experience
2. measure success; feedback, resulting articles, more analytics (currently: # of created articles)
Santhosh:
on CX tech architecture
none of the existing tools (Apertium, or proprietary ones like Google translation, Bing) can support all our languages
Apertium good for e.g. Spanish, Portuguese, Catalan, but not for others
minimalistic approach, try to fill gaps left by machine translation with other things
currently (Q1): working with Ops to get deployed, currently on beta cluster
make Apertium stable and scalable
try to match templates between languages
worst case: deconstruct(expand) template
adding supported language pairs
for internal testing, in Labs: all which are supported by Apertium
in beta: Catalan -> Spanish, Portuguese <-> Spanish, ...
Amir: on Q1 CX user feedback
since deployment of Spanish->Catalan, have seen constant usage, positive feedback, bug reports but no really negative comments
ULS
our previous big project, now in maintenance mode (fixing bugs and accepting code from external contributors)
ULS Webfonts continue to be disabled because of performance issues
some projects asked for them to be enabled by default, e.g. English Wikisource
have a lot of data now on font usage, user prefs, not yet analyzed
still no language selection by anon users
Niklas:
on Translate extension
last 6 months, worked on migration of translation memory from Elasticsearch
apart from that, it's in maintenance mode
improvements from GSOC projects
was designed for UI messages, suitability for CX?
replication lag issues
Erik: Let's talk a bit about the overlap between Translate extension and new CX extension
CX: one-time translation, no segmentation, no tagging of original page needed - are these the main differences?
Niklas: yes
Erik: is the code completely independent?
Niklas, Aolita: yes
Erik: overlap areas identified?
Niklas: translation memory, translation surface (the three column view of CX)
Erik: relation with Parsoid...
potential for sharing code?
Niklas: Translate is in PHP, CX in JS
Erik: potential future where we consolidate both?
Niklas: they are different
Erik: what about Translation notifications or similar tools, plans to share there?
Niklas: not aware of such plans
Amir: plan "translation center" for CX to engage users in translations
no practical overlap with translation notifications
Erik: has there been a systematic evaluation of overlap?
Alolita: discussed that a bit in London
underlying goal to merge architecture
Erik: risk confusing users when deploying both
also, tech debts
encourage thinking about this
Reusable i18n and L10n tools
[edit]Santhosh:
on i18n and l10n tools; OS libraries we maintain
maintenance support for translatewiki.net (uses Translate extension for UI translation, also for non-MW open source software projects)
MLEB
Project Milkshake
plural rules parser for CLDR languages
input method library: jquery.ime (currently 143 methods for 86 languages)
reusers: Koha (library management system), Firefox OS, Android indic keyboard (>50k downloads)
David working on general input methods mainly for VE, but also to be reused elsewhere
internationalization library: jquery.i18n
language selector: jquery.uls
jquery.webfonts
CLDRpluralRuleParser (in JS)
Alolita: also reuse by other MediaWiki sites
Lila:
about reuse of libraries: did we have to introduce new libraries; how do we make these decisions?
Santhosh: some integrated into MW core
Alolita: ...
Erik: very few other websites have a need to support as many languages as we do
RedHat, Mozilla tell us MW is the most advanced
so we have to develop ourselves
most websites just do string-by-string translation
MW also support e.g. gender, plural rules
Lila: how widely reused now?
Alolita: examples mentioned by Santhosh; KDE, Mozilla, Koha, many languages
we are used as kind of a reference site in the number of languages we support, e.g. by Twitter too
Erik: apart from third-party reusers, IME library has received tons of community contributions
I wrote a crappy German input method myself ;)
Alolita: made the libraries available on Github, on purposes
Lila: is the scope well-defined? (any time we create an extensible framework, need to be conscious of that)
Erik: yes
Alolita: no onscreen keyboard support yet, becomes more important especially for mobile
Inter-Team Dependencies
[edit](Alolita:)
we depend on: RelEng, QA, Ops
wish we had a bit more resourcing from Ops
Erik: has that become a main blocker for deployment? yes
Erik: team in place?
Alolita: yes, working with Alex [Kos.]
beta labs
Roadmap
[edit]Amir:
CX translation center / dashboard: for user engagement and growth
integrate CX and ULS with Wikidata (instead of using the Babel extension in a hackish way)
e.g. translate labels in Wikidata
on-screen keyboards
language variants support in PHP for e.g. Chinese, Serbian - developed many years ago, showing its age
Erik: talking to CScott about this?
Amir: yes, he has a big vision about it
Erik: let's chat about this
I looked at it a fair bit
might be fundamentally the wrong approach for Chinese
look at our options
Alolita: Parsoid team has been interested in working on architecture
Gabriel and Subbu have ideas on how to make this work for Chinese and Serbian
need to align with priorities
Questions & Answers
[edit]Erik: apart from CX/Ops, any other blocker we can help with?
Alolita: mainly that