Grants talk:Project/Glrx/SVG i18n

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Scheduling Project Grant Interview[edit]

Hello Glrx, I will be reaching out shortly to set up your project grant interview. Can you please send your email address to me at lmiranda @ wikimedia  · org at your earliest convenience? Thank you! LMiranda (WMF) (talk) 22:51, 4 April 2018 (UTC)

Hello Glrx, This is a reminder to please email me with your contact information so that I may set up your project grant interview. Thank you! LMiranda (WMF) (talk) 17:12, 9 April 2018 (UTC)

Comments of Ruslik0[edit]

This is an interesting project. However I have a number of concerns:

  1. Commons:Help:Translation possible This link does not work.
  2. Please, provide a detailed budget and justification for the requested $40,000 sum.
  3. Could you create a short project plan in tabular form which specifies what will be done at each stage, what results will be and how much time each stage will take.
  4. Please, also specify the full duration of the project.
  5. In addition, can you say what are you qualifications in the javascript programming? Have you created any complex javascripts before?
Ruslik (talk) 12:02, 7 February 2018 (UTC)
Re 1. Fixed link. Glrx (talk) 17:49, 7 February 2018 (UTC)
Re 5. I'm an old Fortran, assembly, C, and Lisp programmer. I've dabbled with XML in C++, Java, and JavaScript. I'm not a jQuery or nodeJS wonk. I'm not a PHP wonk either, but I have explained bugs in SVG Translate and Image.php. The more appropriate point is understanding DOM, SVG, and several W3C standards, and I tried to address that in the proposal. I've pushed enough on JavaScript for HTML and SVG to find several DOM bugs. I've tried several endpoints on the MediaWiki API and fought some CORS issues while I'm at it. SVG Translate actually builds the upload.wikimedia.org file string by computing the hash. WMF servers frustrate the redirect by disallowing ?origin=* until reaching the hashed filename. On another tack, WP has a very nice template for w:IPA strings, {{IPAc-en}}, so I tried something several months ago. There's a W3C WebSpeech standard, so one can pull the textContent phoneme string from a class="IPA" span in an en.WP page, stuff the string into an SSML document's phoneme element's ph attribute, and (theoretically) have the browser utter the phoneme string when the user clicks the speaker icon (no .ogg required). The trick works in MS Edge because Edge does a decent job with SSML, doesn't work in Chrome because Chrome ignores SSML markup and speaks the textContent rather than the ph attribute, and is abysmal in Firefox because Firefox ignores the WebSpeech spec and speaks the SSML markup. The result is minimal benefit for Wikipedia until the major browsers get their act together. I copied the short hack to my en.WP account (orignal code worried about webkit prefix):
Glrx (talk) 05:04, 23 February 2018 (UTC)

Relation to existing attempts[edit]

This page talks about SVG Translate (without a link?), which I understand is a continuation of the TranslateSVG MediaWiki extension. How does this new tool relate to those attempts? Are either of those going to be improved, or are you building something new from scratch? I am also very cautious about the suggestion to build a new translation interface. Wikimedia currently has ~two translation interfaces, from Content Translation, and Translate, which of the latter was used for TranslateSVG. It seems very risky to start building a yet another interface, or at least it should have a very good justification and maintenance plan, like Content Translation had, to do it. Integrating with existing translation interfaces is likely about same amount of work, for extra features and lower maintenance cost in the future. --Nikerabbit (talk) 14:01, 16 February 2018 (UTC)

I tend to agree. I would prefer that TranslateSVG extension be fixed up and deployed instead of doing something new. Or at least I would like to see a justification for doing something totally new. Bawolff (talk) 14:47, 17 February 2018 (UTC)
First, I'll thank User:Jarry1250 and the others involved with SVG Translate and Translate SVG Extension for a huge effort. SVG Translate has been used to translate hundreds of files to other languages. Without the tool, WMF would be in a worse position.
Links for running SVG Translate were provided. https://tools.wmflabs.org/svgtranslate/
I am not familiar with the distinction between Content Translation and Translate interfaces. I have used the Commons translation template, I have used a wiki translation UI, and I've looked at some of the mechanics (the i18n directory with numbered translation units; markup).
See, for example, Commons:Translations:File:The GLAM-Wiki Revolution.webm/srt/94/fr
A short answer is the WMF translation database is sparsely populated. There are better translation sources. The WMF translation memory is essentially a blank dataset for the translation units in most illustrations. Certainly, it could be used, but it does not seem to be expedient.
Bones of the ankle. Wikidata item fibula (Q302896) has many translations. Wikidata item anterior talofibular ligament (Q3829591) has 5 translations.
The translations needed for most illustrations are not complete sentences but often single words or short phrases. Sometimes those terms are technical terms. Consider a diagram showing the bones of the human ankle.
Several years ago I generated a multifunction semilog plot. The linear abscissa was the date, and the log ordinate was a count. There were two functions: infected persons and deaths. Using Windows Professional, I could make some system API calls to generate short date strings in dozens of languages (even Thai, which uses a different calendar). The only awkward part of that generation was SVG uses IETF langtags and Windows uses a numeric locale identifies, so I needed an equivalence table, which was no big deal. That translated most of the material in the diagrams.
I still needed to translate the function names; that was a little harder, but Wiktionary has translations. I looked at Wiktionary as a source for translations. In some places it was OK, but there were many holes. Wikitionary didn't have the technical terms for glow discharge regions, so it didn't have the translations for those terms. I also looked for mechanical engineering terms used in some particular diagrams. Even if the term exists in Wiktionary, there is no guarantee that there will be many translations. Sometimes there are, but sometimes there are not. It also looked like the best way of getting the translations was scraping the HTML output. IIRC, there is a Wiktionary API that will get JSON, but that API did not include translations.
A Wiktionary entry also includes a gloss to distinguish different meanings of a word. See, for example, wikt:die, which has at least 3 major meanings: death, a cutting tool, and a gambling device. I would want to include the gloss to get the appropriate translations.
If I look at translation memory APIs from vendors, they will often have a subject specialization to better choose possible translations.
AFAIK, WMF translation memory has no such specialization parameter. A probe for die would get all meanings.
Google translate is no longer free, and I'm suspicious of its accuracy. Recently I did a round trip translation of a paragraph. The translation converted a DTD internal subset into a private email address.
When User:Delphi234 confronted this problem, the solution was Wikidata Q-items. If there is an item, then it usually has labels and aliases in many languages. The Q-item is precise; there is no need for a disambiguation gloss to filter false hits. The translations are not perfect. Plurals are an issue because Wikidata does not provide much in the way of grammar information. An SVG diagram may show a body with two lungs, but the Wikidata English label lung (Q7886) is singular despite the item has part (P527) right lung (Q5938041) and left lung (Q18088285). The label and aliases have grammar issues, but it seems to be the richest source for translations at the moment. The Wikidata hit rate is not perfect, but the corpus is huge.
The proposal mentions some problems SVG Translate. It was an extremely reasonable tool, but it was also broken for several years because it relied on documented interfaces that were changed out from under it.
The follow on to SVG Translate is the Translate SVG extension and its successor version 2.0. A simple observation is the project has been sitting in limbo for years. IIRC, the last major edits were in 2014. Translate SVG is an improvement over SVG Translate. It uses the DOM instead of pattern matching, but the PHP DOM apparently uses prefix matching rather than namespace matching (HTML does not use namespaces). Translate SVG looks for a particular element name by testing for "text" and "svg:text". Using a namespace-aware DOM is a better option.
Translate SVG Extension also does some brutal and possibly incorrect manipulations. If it finds a transform attribute on a text element, it moves the attribute to the surrounding switch element. That's a bad idea if the switch element already has a transform. It also looks like SVG Translate will be confused by planar translations. Inside a group, it will see a text element without a switch parent, so it will wrap it with a switch and start translating it. The result will be lots of switches with dead code.
Translate SVG gives up when it sees a nested tspan. I'd have to look again to see what it does with anchors.
<text>Concentrated <tspan fill="red">H<tspan class="sub">2</tspan>SO<tspan class="sub">4</tspan></tspan></text>
Translate SVG does not look for ITS instructions.
Translate SVG does not have access to the SVGElement interface. It does not use http://www.w3.org/TR/SVG2/coords.html#BoundingBoxes to look for neighbors or recognize planar translations. In general, pushing that work to the client (which has an SVG DOM) seems the better option.
A WMF developer commented that API changes have probably broken the Translate SVG extension.
The Translate SVG Extension is also a bit odd. It embeds all the translations in the SVG file, but also has those translation repeated in the i18n subdirectories. If somebody edits the SVG file on Commons, the next run of the extension may remove the change. That's another dual copy problem. We cannot lock the SVG file because we want editors to improve the graphics. Compare offline translation of XLIFF and PO files / check out and check in. I'm not opposed to forwarding translations to WMF's translation server, but I haven't seen a published API (I have seen server libs for several formats).
On a different front, jQuery.i18n (or something similar) should be used for the UI strings.
Glrx (talk) 23:59, 22 February 2018 (UTC)
It's hard for me to understand your reply, and I think that is partly because of a confusion between the terms translation tool, translation memory and machine translation which causes that you are perhaps replying to a different thing than what I asked about.
The Translate extension is a translation tool and it is documented in mw:Help:Extension:Translate. It provides the interface for translators and connects with translation memory and machine translation services. You mostly talked about how to automatically translate the labels using for example Wikidata (which is not a translation memory in that sense), while my question was about the translation process and interface.
I would like to see better description of what is the translation process and translation interface in your proposal. Based on my understanding you are currently proposing to build a new tool that does not connect with the existing translation tools I mentioned. There can be legitimate reasons to go that way, but those reasons should be clearly stated in the proposal in a way that shows that you have done sufficient exploration of existing tools and their advantages and disadvantages. –Nikerabbit (talk) 11:22, 5 March 2018 (UTC)

Eligibility confirmed, round 1 2018[edit]

IEG review.png
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 1 2018 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through March 12, 2018.

The committee's formal review for round 1 2018 will occur March 13-March 26, 2018. New grants will be announced April 27, 2018. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 01:51, 17 February 2018 (UTC)

More detailed comment[edit]

I was asked to comment on this proposal more substantively, so here goes.

This proposal is asking for $40,000. That's a lot of money. That said, this may be a reasonable amount of money for an experienced Javascript Developer to do 6 month-ish of full time work. However, it is very unclear:

  • What is Glrx's experience level? Is he an experienced javascript developer? Can he successfully complete a project of this scope? For this level of funding I would expect a proposal to list previous successful projects so we know that Glrx is capable of completing a high budget project. If s/he doesn't have open source experience that can be directly linked to, I would at least some sort of resume detailing previous experience.
  • How much work is this proposal for? The proposal doesn't list a timeline. For such a large budget, I expect it to be broken down into milestones, with expected time frames.
  • What is the actual end product. For the large amount of text in this proposal, it is very vague on what the resulting end product will actually be. For a 40K grant, I expect a concisely stated acceptance criteria for the product, plus possibly having "reach" goals.
  • User engagement plan. The ultimate goal is to improve number of translations. This is to be accomplished by users using the resulting tool. Yet this grant seems to not have user consultation mentioned anywhere. I would expect having users testing the tool, providing feedback, etc to be a core part of such a grant. I would also expect having the tool being used by X number of users to translate Y number of images to be part of the success criteria (or at least, something that the grantee intends to measure). After all, the point is to make something people will actually use.

In conclusion: 40K is a large sum of money. I expect proposals asking for such a sum to be more professionally written than your average grant request; they should include specific details on how the money will be spent and what the end result will be. I don't think this proposal is detailed enough to warrant granting such a large sum. Bawolff (talk) 22:21, 28 March 2018 (UTC)

The proposal identifies a couple problems with Commons SVG images: translations are usually lacking and where translations have been done, there is a copy problem. The problems have largely been ignored. The simple SVG Translate (which exacerbates the copy problem) has been broken for years. The slightly more involved Translate SVG hasn't been touched in 3 years (and has its own problems).
The proposal identifies an opportunity to leverage Wikidata labels and aliases for translations.
The proposal also identifies problems with such an opportunity. Lack of structure in SVG files means translation units are not isolated. Several existing SVG files use planar translations. See planar translation c:File:Copper electroplating principle (multilingual).svg which also has some subtle problems with chemical formulas that are not intended to be translated. The sulfate ion, Template:Chem, approximately SO42− but with superscript shifted left, is done poorly and may be beyond librsvg.
The proposal seeks a research and development grant. The first part of that grant is examining SVG files. The result of that examination would be text in the final report about findings. The examination would also be used for direction about where to go with tools. There's a thorny BIDI issue when the language is rtl. It gets worse when the line is rotated 90 degrees and should switch from rtl to tb. And librsvg doesn't do any of it correctly.
A second part is a simple SVG Translate-like form translation tool. It is not promising anything spectacular other than the same span-for-span translation that SVG Translate did.
A third part is extending the SVG Translate form to include at least a Wikidata item. That not only opens up the possibility of using Wikidata labels and aliases, but it would also allow anchors that hyperlink text in the illustration to Wikidata. The proposal doesn't say that, but it should be obvious to many editors. It is not a benefit that would arise from using TranslateWiki.
A forth part is about better SVG practices and understanding how changes would interact with oft-used but troublesome Inkscape.
The proposal does not purport to be a graphics editor. There are files that are better served by making small layout changes with a graphics editor to gain single-span translation units. See, for example, c:File:Leitungsende Abisoliert.svg, which should have 3 single line spans and no usable Wikidata items. Such a file should be marked for a simpler layout. In contrast, c:File:Binocularp.svg needs a better layout but can then go to town with Wikidata items for eyepiece, objective, and Porro prism; it has further complications in that most wikis use a different file, c:File:Porro binocular.jpg.
If you want a senior JavaScript developer, I understand that Cambridge is the place to go. There's a claim such developers can be had for about $43K/year. See Grants talk:Project/ScienceSource.
In the Wiki community, there are a handful of editors who understand at least some of the issues with SVG images. User:Redrose64, User:Sameboat, User:Perhelion, User:JoKalliauer, User:Jarry1250, User:Menner, User:Rillke, de:User:Sarang, .... In many ways, it is a deal with the devil that is often too focused on getting librsvg to produce the right result now rather than also considering longer term issues. There are places where I expect to ask User:Shizhao, User:PhiLiP and User:Obsuser questions about script usage on their wikis. Some wikis have common source but different scripts. Script selection currently fails in librsvg; there is a workaround. Is it possible to choose the correct langtag on such a wiki? I've seen a Chinese illustration that surrendered the issue by including both -Hans and -Hant varieties.
For qualifications, I pointed out that I was one of the earliest posters of switch translated SVG to Commons, have been debugging SVG issues at help desks, and was debugging broken SVG tools. I've reported several javascript browser bugs. That said, this proposal is more about SVG files on Commons, DOM manipulations, and understanding the poor behavior of librsvg. BIDI is broken on librsvg.
I also see no comment about Phoneme.js. Yes, it is sad that the hack works on Edge but not Chrome or Firefox, but consider the implications if the tool worked on the significant browsers. The IPAc template gives pronounciation hints and hopes the user can figure it out. With Edge, I can visit w:Methane, see the two IPA strings (US /ˈmɛθn/ and UK /ˈmθn/), and just click them. No audio file required. An easy enhancement for Wikitionary.
As far as advertising new tools, SVG Translate has been advertised for years using {{translate}}. That advertisement was so successful that users were finding the tool, trying it, finding it was broken, and then finding forums to complain about it being broken.
Glrx (talk) 23:21, 9 April 2018 (UTC)
Re (I'm just going to respond paragraph by paragraph with bullet points):
  • I agree that the current situation with translatable svgs is very sub-par.
  • [re: wikidata vs translate] Using wikidata for translations is an interesting idea. Although I suspect there will be gotchas involving grammatical forms and whatnot. I'm not sure if its a better thing than the translatewiki system, but I think a reasonable argument can be made for exploring wikidata as a potential solution.
  • [re: Lack of structure in SVG files means translation units are not isolated] Yep, I agree current translation mechanisms are subpar
  • [re: The proposal seeks a research and development grant. research part] I think having a research part of the grant is fine, but it should clearly & consiesly identify what the research questions are, and how much of the grant roughly is going to research (both budget and time wise). As it stands, I consider it unclear what exactly you are researching, other than you are going to look at some files and think about them in the context of translation.
  • [re re-implement svg-translate tool as js] While this may make sense, the proposal should have a rationale as to why it is being reimplemented as opposed to fixing the existing code. The proposal should also break this down into milestones with a timeline, as while as have clear and concise acceptance criteria.
  • [re Wikidata] While I'm not entirely sold on this as being a good solution, I think this is worthwhile to try (And what are grants for, if not to try new things). I would like to see it better broken out in budget (and timeline) how much time/money is going to spent on this. I would also like to see clear acceptance criteria in terms of what features are planned to be implemented, and what are the milestones.
  • [re 4th part with inkscape] I guess this falls under research? This should clearly state what the research question you're trying to answer is, as well as break down in the budget roughly how much time/money is spent persuing this goal.
  • [re not a graphics editor] Ok.
  • [re JS developer salary]: Salaries can vary significantly depending on your location, experiance level, etc. In this proposal its not even listed how much work you're planning to do over what period (Are you working full time on this for a year? Are you working part time on this for a month?). Its also unclear if you have ever worked professionally as a javascript developer. I think its possible 40K may be justifiable for a software developer, but as a its a large amount, I think you need to justify that its a reasonable amount given your abilities (Which you have not).
  • [re Phoneme.js]: Not sure how that's relavent to this project. If you mean that it demonstrates you have experiance writing Javascript - I mean I guess it does imply that you've at least written JS once in your life. But a small (77 line) JS snippet doesn't really mean you have any experiance working on a major JS product, which I think should be a requirement for awarding such a large grant.

Bawolff (talk) 22:49, 11 April 2018 (UTC)

Aggregated feedback from the committee for SVG i18n[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
8.4
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
7.2
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.0
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
6.0
Additional comments from the Committee:
  • The project fits with Wikimedia's strategic priorities. However its sustainability and scalability are more of question as it is not completely clear what will be the final result of the project.
  • The idea in itself is good. If done well, this tool would be very helpful to support multilingualism of Wikimedia projects which is one of our strategic priorities. It can also be scaled and adopted by multiple communities if implemented and promoted correctly.
  • I can say that the project is innovative - using 'switch' SVG elements for translations is a new idea for MediaWiki. The potential impacts are significant but there is a big risk that just another translation tool will be created, which will quickly fall into disuse. The success can be measured.
  • There are already available SVG tools, although the proposed one sounds like a significant improvement. My concern is that this is an attempt to build a new tool instead of expanding the existing one.
  • The project can be accomplished in 12 months but the actual duration is not specified. The budget is rudimentary. The necessary skills are probably present.
  • I have significant concerns of whether the applicant is the best person to do it and whether he will do it in the most efficient way. I am not sure some parts of the project plan are really reasonable:
  • paying for studying Inkscape which is not something we usually do: on one hand we have enough Wikimedians quite experienced in Inkscape, on the other hand, work in Inkscape is not a major goal of this grant. Still I am fine with a person spending a few days on that, but not as one of the major activities.
  • studying 25 SVG files might be a good goal, but should not take that much time, probably a few days worth of work.
  • then we come to JS development. It is not clear how much time this will take, nor where this tool will be hosted and how it will be used.
  • The community engagement is low but it is not required at this stage.
  • There are significant concerns (see talk page - no or unclear answers from the applicant) related to lack of clarity of the proposal and of its output, insufficient attention to the existing tools as well as to the lack of user engagement plan. While there was some effort regarding targeting and contacting a few users a lot of work has to be done to make this new tool widely used and meeting community needs.
  • The problem is relevant but honestly there is not an explanation about the costs. The implementation of a switch is easy, in my opinion the cost should be drastically reduced if there is not a demonstration of the calculation of the budget.
  • I am willing to give it a chance but three conditions should be met: (i) the budget should be filled with more details and justification of the expenditures should be provided, (ii) the project should be reviewed by WMF developers and at least there should not be strong objections from them (iii) the project duration should be clearly specified and a detailed project timeline developed.
  • I would award full funding, if the grantee provides a more detailed budget and project plan
  • In the current state I would not support funding it. The two major concerns are lack of clear plan (timeline, budget etc.) and lack of user engagement strategy (work on needs of end users, usage targets, tests with potential users etc.). I might switch to support if there are significant changes made, either by reduction of budget (the level of detail is not what I would expect for a 40,000 USD project, while I would be fine with investment something like 5 times less given risks of this project), or by significant improvement of the proposal with more details on planning and user engagement.
IEG IdeaLab review.png

This proposal has been recommended for due diligence review.

The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.


Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
  2. Following due diligence review, a final funding decision will be announced on May 11, 2018. Please note that this date is two weeks later than previously published in the round schedule, due to changes in scoring and proposal review this round to allow for additional staff feedback on technical proposals.

Questions? Contact us.



Reply[edit]

I'll do some haphazard replies over the next few days. Glrx (talk) 21:20, 19 April 2018 (UTC)

Using switch[edit]

Using switch for translations is not a new idea in MediaWiki. SVG has had the switch element and the systemLanguage attribute for a long time. I put the switch-translated File:First Ionization Energy.svg on Commons over 5 years ago, but one had to load the SVG (not the rendered PNG) into a browser to get the benefit of the translations.

Several months after that, MW started having switch-translated support in the form of [[File:xyz.svg|lang=de]].

So MW has had the functionality for a long time.

The translation tool that came out, SVGTranslate, did not use switch but rather spun off copies. A subsequent tool, Translate SVG Extension, uses switch but apparently never saw much use.

Using switch has some advantages, but it also has disadvantages. Currently, it is the only method in MW to support multiple translations with a single SVG file. Glrx (talk) 21:20, 19 April 2018 (UTC)

Existing tools[edit]

The target tool can be viewed as an SVG string editor. Its goal is to focus on encouraging ordinary editors to supply translations without learning how to use a graphics editor.

I don't think it is reasonable to attempt to fix the exiting tools SVG Translate or Translate SVG Extension.

The great feature of SVG Translate is a simple user interface. It presents the strings to translate to the user, and it lets the user fill in a form with the translated strings. In the proposal I gave examples of SVG Translate's weaknesses: it just grabs substrings; it has no idea of translation units. For example, "ATLANTIC" and "OCEAN" should not be treated as separate strings but rather a unit. It would be better if the tool allowed a user to say that "ATLANTIC" and "OCEAN" should be merged. Such strings are often neighboring text elements in the SVG. That's a feature I want to add; it can be done in some situations, but not in others. For example, graphic artists might make the graphic "H2SO4" by placing the string "H SO" on the page, change to a smaller font size, and then place a "2" and a "4" on the page. Consequently, there are three strings on the page, but reconstructing a single text element is not a simple task. (And, in the "H2SO4" case, something that does not need translation. There are expressions where European languages use Uout for an output voltage while American English uses Vout.)

SVG Translate is not a good candidate for such programming changes. SVG does not use the Document Object Model (DOM) to manipulate the SVG image; instead it does string manipulation. An SVG file is an XML file, and it makes more sense to manipulate the structure rather the characters.

Also, SVG Translate has the copy problem; its goal was to generate a new copy of the file rather than insert switch.

Translate SVG is an improvement. Instead of string manipulating the SVG file, it uses the DOM. IIRC, the code is server-side PHP. The DOM is limited, and there are some strange manipulations that look for SVG elements in either the SVG namespace or the null namespace. That's just odd. Translate SVG's architecture has its own copy problem. Not only does the SVG file have all the translations embedded in switch statements, but the translations are also duplicated in the Translate extension subdirectories. That's a strange design choice; keeping two copies of something is not a good idea. It is not the typical translation model used by companies such as Tektronix, nor is it the model that MW uses in its other translations. The typical model would have a skeleton SVG file; there would be a program that would take the translation units in a subdirectory (e.g., ./de/*) and merge those translations into the skeleton SVG file to produce a German version of the SVG file. That is the better design choice than dual copies, but it is not how MW currently functions.

There are also missing characteristics of the MW translate extension. Although I have found general methods for FFS (File Format Support), I have not found an API entry point for grabbing such files (such as XLIFF). mw:Help:Extension:Translate/API One can grab a PO file from a translate page, but one must have special privileges to submit a PO file.

In contrast, I'm proposing a client side tool. It can make use of a more sophisticated DOM. Code can find the text elements for "ATLANTIC" and "OCEAN", use the SVG DOM to calculate the bounding boxes (.getBBox), expand the boxes, and see if they intersect. That's a potential merge. Alternatively, if the unexpanded bounding boxes overlap, then we may have a poorly synthesized string such as H2SO4.

None of the existing or proposed tools would deal with problematic SVG files. The proposal shows a glow discharge illustration that has crowded, multiline, labels. The illustration needs to be fixed in a graphics editor. Glrx (talk) 22:25, 19 April 2018 (UTC)

Not only does the SVG file have all the translations embedded in switch statements, but the translations are also duplicated in the Translate extension subdirectories

This still makes no sense to me. If you are talking about the Translations namespace in the wiki, that is explained in the documentation. If you are talking about the i18n directory, that is translations of the interface messages of the translation tool itself.

I have not found an API entry point for grabbing such files

That is not how Translate works. There is another layer called message groups which are explained in the documentation. – Nikerabbit (talk) 08:25, 23 April 2018 (UTC)

Timeline[edit]

I view the proposal as a research and development project. I see it as 4 to 8 months. I want a free charter. Glrx (talk) 22:27, 19 April 2018 (UTC)

Direction[edit]

Commenters seem to miss a major point. Although translations can be done string-to-string, Wikidata offers an alternative for some (restricted) strings: a string-to-Q-item.

In December, User:Perhelion took a single-language SVG file and turned it into the multilingual File:Copper electroplating principle (multilingual).svg, a file that is used on 28 wikis. The file only needed translations for "anode" and "cathode" (other text elements on the page are chemical formulas); Perhelion provided languages en, de, eu, es, ru, mn in a planar switch translation. Perhelion reduced the file size from 16 kB to 3 kB. A size that might be reasonable to serve directly.

A planar translation (which uses one switch for the entire file) does not isolate translation units, so I edited the file to use a switch for each translation unit ("anode" and "cathode"). Using Wikidata for anode and cathode, I raised the number of languages to 20 (including Hindi and Arabic, but the Vietnamese translation falls outside the image margin). While at it, I also added metadata for the CC0 license and attribution. The file also points back to Commons even if copied elsewhere. The file is 6 kB. (I also did and odd transform adjustment.)

I don't think translating two words (anode and cathode) is a big deal, but the file demonstrates the possibility of using Wikidata to do shotgun translations when there is a Wikidata item. Maps and medical diagrams should have a high incidence of Q-item labels.

Inkscape[edit]

I don't use Inkscape, I'm not interested in using it, and there are plenty of Inkscape users on Commons.

That does not mean those Inkscape users can comment on or predict the interactions that Inkscape will have with a translation tool. Inkscape has an SVG optimization tool, and one of its optimization is removing all unused id attributes. Does running that tool confuse Translate SVG Extension by removing those id attributes used to match translations?

In other words, Inkscape should be investigated for its potential interactions.

We can certainly mark all switch-translated files with something like {{Hand-edit only / Do not edit with a graphics editor}}, but that would be a poor result. One of the reasons of having just one copy of a file is so improvements to an image benefit all language versions of that file. Locking out graphic editors would make graphic improvements difficult. We want to get away from hand-editing SVG files.

Inkscape is used by many on Commons, but Inkscape SVG files strike those who look at the XML as bloated, overspecified, and misapplied. What will Inkscape do to files that have been optimized or files that use class attributes and CSS? That requires testing.

Graphics artists on Commons are often not very skilled. Many illustrations use super and subscripts, but many files on Commons merely paste text strings in about the right place to achieve the visual result. That's what File:Copper electroplating principle (multilingual).svg did for the charged electrons and ions. I fixed them to use tspans, but what will Inkscape do with the new strings? Inkscape does have the ability to toggle subscripts and superscripts, but can it recognize sub and superscripts in other SVG files? If it does, will it rebloat the file when it is saved?

Round 1 2018 decision[edit]

IEG IdeaLab review.png

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!

Comments regarding this decision:
The committee was supportive of work on SVG files, but they ultimately decided against funding this project. The decision was informed by the history of conflict with WMF technical staff and the friendly space issues noted in Phabricator. Additionally, because of the close integration of WMF technical staff in the software proposals funded through WMF grant programs, the grants program is not able to meet the applicant’s request not to engage with certain staff at all during execution of the project.

Next steps:

  1. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  2. To reapply with this project in the future, please make updates based on the feedback provided in this round before resubmitting it for review in a new round.
  3. Check the schedule for the next open call to submit proposals - we look forward to helping you apply for a grant in a future round.

Questions? Contact us.


The proposer seems to have strong technical opinions and possibly some issues communicating them (see above on terminological confusion and lack of clarity around plans), but I don't think that "friendly space issues noted in Phabricator" is a sensible (co)reason to deny this funding. Technical reasons were sufficient. --Nemo 19:41, 18 May 2018 (UTC)