Grants:Project/Glrx/SVG i18n

From Meta, a Wikimedia project coordination wiki

statusnot selected
SVG i18n
summaryDevelop a tool for multilingual, switch-translated, SVG files.
targetSVG diagrams and labeled images on Commons
contact• Glrx
this project needs...
created on19:16, 19 September 2017 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

The goal of this project is to create an SVG translation tool for multilingual SVG files using SVG's switch element. The tool would probably have 2 modes: Prepare and Translate. In the Prepare mode, the tool would allow a suitable SVG file to be marked for translation by an editor who knows how switch translations work. In the Translate mode, the tool would allow users to add or edit translations to the file. The Prepare mode is needed to make sure the file is a reasonable candidate for switch translation. For a simple description of how switch translations are done, see Commons:Help:Translation tutorial#Using the same file; Commons:Commons:Translation possible/Learn more#Multiple translations within one SVG file.

See also 2017 Community Wishlist Survey/Results item 9; 2017 Community Wishlist Survey/Multimedia and Commons/SVG-Translate; Community Tech/SVG translation, mw:Extension:TranslateSVG, mw:Extension:TranslateSVG/2.0.

Wikipedia projects have international reach, but the diagrams the projects use are primarily English. Imagine a Serbian or Chinese high school student reading an article in her own language, but the diagrams that illustrate that article are all in English. File:Digestive system showing bile duct.png is a bitmap image with English labels that is used on 43 wikis (there are Arabic and Italian translations).

Sadly, there are few illustrators translating images. There are many readers, but few editors. There are fewer illustrators, and even fewer non-English illustrators. Consequently, there is a need for simple tools to translate diagrams — tools that need translation skill but not graphics skills.

SVG files, because they may contain text rather than painted characters, are easier to translate than bitmap images. The tool SVG Translate was developed to translate such SVG files. The tool uses string matching rather than the w:Document Object Model (DOM) to find text and tspan elements, presents their contents as an intuitive HTML form for translation, and then substitutes the resulting translations into a copy of the original SVG file. The translator does not need to know how to use a graphics editor. The hope is the translated file would be the finished result, but if there are problems, the file can be edited in a conventional graphics editor. The tool has been successful despite having long periods of being broken.

SVG Translate has the disadvantage of creating new copies of the image for each translation, and that introduces a problem of maintaining the copies. An improvement in one of the copies does not propagate to the other copies, so the copies diverge from being simple translations of the original.

For example, the image File:Bicycle diagram-en.svg has dozens of translations (each image being 350kB), but many of those translations have diverged. Color schemes have changed, and there have been other small image changes. All of the images could be improved with a more accurate rendition of the bicycle wheel's spokes.[1]

For another example, File:Oceanic basin.svg is an English diagram that is used on 22 wikis; it has been copied and translated into 8 other languages. Unfortunately, the diagram does not properly distinguish the continental rise from the continental slope: both are shown with the same pitch. Compare the file to other sources that show the continental rise as a gentle slope built from the debris that has fallen down the steeper continental slope.[2] Nine images (the original and its eight translations) need to be fixed to correct that error.

SVG Translate solves a translation problem but creates a copy maintenance problem.

It would be better if there were one image. An improvement to that image (such as reducing its file size) would then propagate to all language versions. It does place some restrictions on the original images; for example, text areas must be big enough for all translations. It is simpler and easier if every label in the image is on a single line so no line breaking is required. That is what companies such as Tektronix have done with their SVG images.[3] Tektronix uses XLIFF translation files to make localized images for each language. XLIFF is not a work-flow that MediaWiki supports for SVG files.

Instead of localizing by creating independent copies, SVG files may be internationalized using the switch element. MediaWiki supports switch element translations, but with some significant problems. The software (librsvg) that MediaWiki uses to render SVG into the PNG that is served has problems. For example, it does not handle hyphenated IETF langtags correctly (Phab:T125710); it also has problems displaying right-to-left text and top-to-bottom text. A tool, Commons:Commons:Commons SVG Checker, detects some of the problems.

Despite those problems, MediaWiki offers reasonable support for switch-translated SVG files. Commons:Category:Translation possible - SVG (switch) has about 1100 files. Commons:Category:Translation possible - SVG has about 7800 files that can be translated (some of these files are translated siblings). There are more than 20,000 transclusions of {{Convert to SVG}}; once converted to SVG, many of those files would be candidates for translation to many languages. There are also lists that encourage SVG conversion: for example, Commons:Top 200 biology images that should use vector graphics.

Bitmap images do not need to be converted to vector graphics to make use of switch translation. There are translations that embed a bitmap image in an SVG file and then overlay that image with SVG text labels. Such an SVG-labeled image is easily translated into many languages. See Grants:IEG/Health images for all, which advocates such SVG labeling. See File:Defecating into a pit (raster).svg, a file with many translations. The file could be switch translated, but instead each translations is a separate JPEG file.

Switch-translated files are not the perfect solution. Such files work well when translating a few phrases into several languages, but they become bloated when translating many phrases into many languages. For example, File:Map of USA with state names.svg translates 50 state names into over 100 languages. The file may carry 400 kB of translation information (which would be 5,000 translations at 80 bytes each). With so much overhead, it is unlikely that WMF would want to serve such an SVG file. (The bloated size of many SVG files is not an issue today because WMF serves PNG renditions.)

There are also issues with subsequent editing of switch translated files. Existing tools may have problems with switch-translated SVG files.

Graphics editors such as Inkscape and Adobe Illustrator may have trouble with the files or present a poor interface for editing the switch clauses.

SVG Translate does not support switch-translated files.

The typical switch-translated file is modified in a text editor, but most editors do not have the skills to hand edit raw SVG files. Using a text editor has caused problems because users may not understand the nuances of the switch element. One user put his translations after the default clause, so they would never appear. Other editors have misunderstood other ordering issues or the advantage of a default clause. A tool can enforce simple rules to avoid such problems. In addition, tools such as SVG Translate are far less daunting than editing raw SVG.

There are some other issues with MediaWiki, too. If somebody inserts a German translation into an SVG file, the German translation does not appear even though the file is transcluded on the de.WP. For the translation to appear, an editor must add a lang=de to the transclusion. In the future, MediaWiki may default the lang to that of the local wiki, but that is not a sensible approach right now because MediaWiki would end up creating many localized PNG files that are identical to the English localization.

In the future, MediaWiki may want to serve small SVG files directly rather than converting them to PNG format first. Bloated SVG files (e.g., those over 100kB) may not serve efficiently. In the long run, serving SVG files allows tooltips, animation (instead of using GIFs), and avoids the need for image map linking.

Another reason for serving SVG directly are the bugs and missing features in librsvg, the program MediaWiki uses to convert SVG files to PNG. As time goes on, these problems will be more painful. Commons has many maps, and SVG map illustrators want to use the textPath element to label rivers and streets because rivers and streets do not always follow straight lines, but libsrvg does not support textPath. In addition, librsvg mishandles right-to-left text and makes top-to-bottom Chinese unreadable. Such errors tempt illustrators to "fix" the problem by converting text to paths (i.e., draw each character as a geometric shape rather than as text), but that has the side-effect of bloating the SVG file size and preventing further translation. The bloated SVG files are also less likely to ever be served directly by WMF. Alternatively, the textPath problem may be "fixed" by saving the image as a bitmap, which also makes the translated text inaccessible.

What is your solution to this problem?[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

This (PNG) image is poorly laid out for translation. It has multiline labels with little space. It is a poor candidate for translation and should be rearranged. The image File:Glow discharge structure - English.svg is a little better, but some text placement has been done with repeated leading spaces.

The first step is to assume illustrators have made simple, clean, structured, SVG files. That is a standard industry practice: make the illustration easy to translate. The advantage to the illustrator is her work becomes more accessible. For example, assume each label is on a single line. (SVG Translate implicitly makes this assumption.)

It should be easy to extract translation units from such an SVG file. Each translation unit should be in its own text element; if there are multiple lines, then those lines should use tspan elements for each line. (Many files do not follow that practice.) If each translation unit is in a text element, then it is easy to wrap a switch element around the text and offer other translations.

The translation units in this illustration are embedded in text elements. That can be checked by loading the SVG file in a browser, selecting all the text, and pasting that text into a text editor. The flaw is the translation unit is "ATLANTICOCEAN" (the space is missing).

For example, File:100 Years War France 1435.svg keeps the translation units together even though it uses multiple lines. However, there are still problems. Obtaining the textContent of one text element produces "ATLANTICOCEAN" (written solid). It would be better if the text element included spaces so it provides "ATLANTIC OCEAN". Still better would be "Atlantic Ocean", the traditional capitalization of the proper noun. That result can be obtained while still providing the same visual display with CSS styles (text-transform: uppercase). That is something to ask illustators to do.

SVG Translate does not try to pull translation units; it extracts the smallest pieces, so it breaks up translation units:

Some files that are already switch-translated do not keep translation units together. Some files have a "planar" organization: there's a single switch element, each language is contained in a single text element, and each translation unit is made of one or more tspan elements. Such an organization makes it difficult to identify translation units, and the translations are grouped by language rather than by common translations. An example of a planar organization is File:Epicenter Diagram.svg. Such an organization should be deprecated. The file should be reorganized before letting it be further translated. I intend to investigate some issues with such identifying and remediating such files, but offer no promises.

It would also be good if the graphic artist (or one preparing the diagram) identified items that need not be translated. In industry, that is done with the w:Internationalization Tag Set (ITS).[4] The ITS translate attribute has made its way into HTML 5[5] and may be in SVG 2.0.[6] For many diagrams, numbers and particular symbols could be marked as do-not-translate. Applying SVG Translate to a switch-translated file with two thermometers and poorly set formulas produces many items to translate, and many of those items (e.g., numbers) do not need translation:

The graphic artist must also exercise some care in designating text anchors. Many graphic artists might draw a leader line, insert a descriptive label with left-anchored text, and then adjust the position of the text to match the leader line. If such a label is replaced with translated text that is substantially shorter or longer, then it may look odd. Text should be left-, middle-, or right-anchored to make most cases work correctly. The graphic artist should also try to provide lots of space to accommodate longer translations.

Now assume that the graphic artist has provided a clean, organized, SVG file. Call that a normal form SVG. Someone knowledgeable about switch translations could then Prepare the file. Text elements could be wrapped with switch elements, and then a marker could be inserted in the SVG file to signal it has been prepared. Further assume that the diagram labels are only single line, so no line-breaking is required. That would allow some code to read the file, use the DOM to pull out the translation units, allow a user to add or edit new translations, and then save the edited file.

Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

Examine about 25 SVG files that are reasonable targets for switch-translation. For each file, identify the number of current translations, the number of inclusions, and the pageviews. For each translation unit, try to identify a Wikidata item. Comment on the issues of switch translating the file. Comment on coalescing translated sibling files (remerging diverging copies).

Write a JavaScript program that extracts translation units, asks for translations, and inserts them into the SVG file.

Extend the program to allow Wikidata item annotation within the SVG file. User Delphi234 identified Wikidata as a good source for diagram label translations. For some illustrations, many of the translation units may have Wikidata items. Let the user determine an appropriate Wikidata item. Let the user select translations from Wikidata item label and aliases.

Investigate recognizing non-normal form switch elements such as planar translations. Reasonable translation units will have a single text element with nearby/adjacent tspan elements. Unreasonable translation units occupy two or more text elements. Planar translations often have g elements with scattered text elements or text elements with scattered tspan elements.

Learn the capabilities of Inkscape. I don't know the program, but should learn what it does with switch translations, styles,[7] and minimization. From what I've seen in Inkscape SVG files, Inkscape misuses the style attribute. Inkscape does offer optimized output, but very few users select that option.[8][9]

Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

Examining files and developing working code would be signs of achieving goals. The output of examining files would be a report that lists the files examined and the issues that arise in them. The code development progresses from getting an equivalent to SVG Translate, to adding do-not-translate markers, to adding Wikidata items and offering Wikidata translations. More sophisticated issues are guessing whether a file is in normal form and being able to put some files into normal form.

Once the project is over, the impact would be editors using the tool to translate SVG images to their language. SVG Translate has had modest success along those lines. Although switch translations are inexpensive (adding some strings to file rather than copying the entire file), they currently have had a much higher learning curve and are therefore less accessible. A tool would make switch translations more accessible. The {{Translation possible}} can be changed to be more inviting; it could provide a link to start translating. It may also be feasible to include a translate link in the SVG file; a user clicks on a translate icon that links to the translation tool.

Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

The immediate project goals are research and building tools rather than increasing participation or improving content. There are no planned events or activities, so the three metrics are not appropriate. A side-effect of examining files and developing tools should improve some content by making some files easier to translate and adding some translations. Once the tools are available, there should be increased participation, but the project is not aimed at drumming up support now.

Ideally, there are a large number of files that can be effectively translated. However, even files that look promising often have not allowed for translations. They need only place text so that it looks right for their language. Position or alignment may need adjustment. Leader lines may have to be moved. In many cases, those problems will be minor: users would probably want a diagram in their language with minor errors rather than one in a language that they don't understand.

Even with a working switch translation tool, there can be significant effort to ready a file for translation. Ideally, one can take a PNG file and convert it to vector format, but few users have that skill. I've tried many bitmap-to-vector programs, and the result has usually been poor. The problem has been compounded because many bitmap diagrams use gradient fills. Vector conversions can be difficult. Even if the vector conversion is avoided, the trick of keeping the bitmap and overlaying it with SVG text requires some graphics skills: the existing text must be erased, the background restored, and then the SVG text added.

Once a file has been prepped for translation, then the translations should be easy. In most cases, diagrams have terse labels that are not difficult to translate. Translating object references (such as "ocean") is an easier task than translating entire sentences.

Project plan[edit]


Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

Study files. Build a simple switch translation tool. Expand the capabilities of that tool.

Study some SVG files. Studying files can turn up odd twists: faked text-on-a-path,[10] embedded bitmap images, text converted to paths, incorrectly sorted langtags, and a failure to use symbols (see File:Electron affinity of the elements.svg, a switch-translated file).

File:Psychrometer2.svg is not in normal form. For example, the translation unit "wet bulb temperature" has been split into two switch elements instead of being a single switch that contains a single text element for each translation. Some languages need only one line (such as German: Feuchtkugeltemperatur), but some languages require two lines (French: température / de bulbe humide). It displays appropriately with libsrvg because libsrvg only allows one user agent language. Many browsers (user agents) allow the user to specify several language preferences; I specify en,de;0.9,fr;0.8. When I display the SVG in my browser, I get the German first line and the French second line: Feuchtkugeltemperatur / de bulbe humide.[11] Making translation units atomic (display none or display all) avoids the problem. Furthermore, it suggests that languages should be in the same order in all switch elements (but SVG 2.0 using SMIL allowReorder semantics[12] should make the clause order less of an issue).

File:Map tonkin autumn 1947.jpg isn't used in many places, but it shows some interesting issues. The map was drawn by hand, but it was overlaid with English text. The map would have been a good hybrid target: a bitmap overlaid with SVG text. The battle was between the French and the Vietnamese, but the diagram is in English. Most of the labels have easy translations, but the time period raises an issue. The map's "China" is not the People's Republic of China (People's Republic of China (Q148) which formed in 1948); China (Q29520) (the region) is a little better, but the Vietnamese translation (Trung Quốc (khu vực)) includes a restriction. Republic of China (Q13426199) is the country during 1947; its translation is Trung Hoa Dân Quốc.

Studying files also shows problems. File:Average prokaryote cell- unlabled.svg is both interesting and sad. An early version of a prokaryote cell was drawn, and that drawing was translated to several other languages. An updated version of the drawing was done, and that drawing was independently translated to several other languages. Some translations of the early drawing (e.g., Turkish and Serbian) did not make it to the later drawing. Furthermore, some of the drawings, although being SVG files, have their text converted to curves. Some translations, such as Chinese and Tamil, have been saved as PNG files, surrendering all the benefits of SVG.

At least 6 translated copies have spawned from File:2008 South Ossetia war en.svg. The original file uses paths instead of text and is 2 MB; it is not a good target for translation. File:2008 South Ossetia war nl.svg is a competent translation of the map that uses text and is less than 1 MB. It is a good target for translation. Another would be File:2001 Macedonia insurgency.svg. Maps are rich with Wikidata items.

It would be better if these files were not separated and instead merged back into the common skeleton and the multiple translations kept as text. There are about 10 translation units in the Tonkin map, and they are single-line terms. For the files whose text has not been converted to curves or bitmaps, the text can be copied to a single switch-translated file. The transclusions can then be edited to point to the switch translation and include the appropriate langtag.

Part of this effort would build a tool that slightly extends the API's globalusage with pageview statistics. The globalusage shows that an image is transcluded on particular pages on various wikis. One can see that information on the image description page. It does not reflect how often the image is actually viewed. Summing the page views for each wiki page would indicate the value of translating an image. An image that is viewed thousands of times per month is riper for translation that one that is viewed only a dozen times. The pageview frequency is missing from lists such as Commons:Top 200 biology images that should use vector graphics; the list states an image is included in many pages, but it does not state the image is frequently viewed. The pageview calculation is expensive.

Build a JavaScript translation editor with similar translation functionality to SVG Translate. It would extract phrases to translate and allow the user to enter translations. Instead of producing a separate file as SVG Translate does, it would use the switch element. Caveat: Chrome and Edge are not suitable for some tasks. Currently, neither handles the systemLanguage attribute correctly.

Experiments with detecting troublesome issues. Initially, the tool would just be appropriate for normal form SVG files with single-line labels. Imagine that the SVG has already been prepped for translation. It is desirable that file use translation units, but many files do not. Consequently, before a file is translated, some split translation units would have to be coalesced. For example, if the tool extracted separately "ATLANTIC" and "OCEAN", the user should be able to tell the tool to join the two words into a single translation unit. That takes us away from single line translations, but it is important to do.

There are SVG files that already do switch translations but use planar translations. Each language is a separate layer, and the words in that layer form many translation units. I'd like to detect such SVG files. A normal form switch would have just text children. If there were g or other child elements, then a planar translation is indicated. In addition, if the text element has widely separated tspan elements, then a planar translation is indicated. There's a significant question about how common planar translations are. A category check is possible.

Another activity is extending the tool to do annotations. A simple annotation is to mark some text as do-not-translate. To first order, text such as numbers ("100") and chemical formulas ("NaCl") do not need to be translated. The tool should allow the user to mark the text so it is not translated.

A more exotic notation is to mark a label as a Wikidata item. Many diagram labels correspond to Wikidata items. In the map above, we have Atlantic Ocean (Q97) and Mediterranean Sea (Q4918). Here is a searchentities API query for the string "Atlantic Ocean". SVG 2.0 has signaled it will include data-* attributes;[13] we might use data-wd-q="Q97" within the SVG file.

Wikidata items give us translation and verification options. From Atlantic Ocean (Q97), each French label or alias is a potential French translation. A language translation can also be compared to labels and aliases to see if they match. A small edit distance would indicate a good match.

I'd also like to be a little bit radical and suggest a house style for Commons diagrams. Wikimedia should adopt lowercase labels so the spelling is a case-sensitive match to a Wikidata item or alias. For example, the content of a text element should be "shoulder" rather than the usual title case "Shoulder"; the text should be "Atlantic Ocean" (a proper name) rather than "ATLANTIC OCEAN". If the graphic wants the string displayed in uppercase, then CSS should be used to achieve that result. That approach is the role of CSS: selecting the presentation; SVG can use CSS just like HTML. If you copy this apparently uppercase italic string Atlantic Ocean, it will paste as a lowercase string; CSS has been used to uppercase, italicize, and color the phrase.

I'm not sure that all graphic editors can handle CSS (CorelDraw may; Inkscape probably not), but the goal should be Atlantic Ocean (Q97) is an instance of (P31) ocean (Q9430), so a map maker sets class="ocean", and CSS renders the text with an appropriate style:

text.ocean   {text-transform: uppercase; font-style: italic; fill: blue;}
text.sea     {                           font-style: italic; fill: blue;} {text-transform: uppercase;                                }    {                                                          }

Inkscape is used by many WP graphic artists, but I have not used it and do not know its limitations. An activity would be understanding its impact on internationalization and directly serving SVG. Study how it handles switch elements, class attributes and CSS, RDF, and entities.

Generally, there is a notion that Inkscape and Illustrator should not be used on some SVG files. See, for example, {{NoInkscape}}, which has 136 transclusions. There is a fear that using a graphic editor on an SVG file may damage the file. I've looked at several SVG files, and some oddities pop up; they may or may not be due to a graphics editor. A file that used internal subset entities has had all the entity references substituted. A switch-translated file lost all its translations when a user added a new translation. Font size specifications on a single element have differed between the attribute and style versions.

After a file has been translated and annotated, it may still need graphic editing. Some translations may need more room, a leader line might be moved, and more details may be added. There should be some understanding of what Inkscape will do to such a file even if Inkscape is inappropriate.

Inkscape output is usually bloated with inkscape: and sodipodi: notes, no hierarchy, graphics context embedded in individual style attributes, and extraneous properties. Inkscape files usually show no evidence of a grid: objects are placed with absurd pixel precision. Human editors do not follow the advice at Commons:Help:Inkscape. (There is similar advice for reworking Adobe Illustrator at Commons:Help:Illustrator.) Consequently, many SVG files are unreasonably bloated.

File:Woman surface diagram ahead-behind dark skin.svg is an elegant but simple picture, but the SVG is over 100 kB. The file has many path elements where the style attribute requires more bytes. Annotations in inkscape: and sodipodi: namespaces can be discarded easily. SVG semantic nonsense is more difficult. For example, Inkscape often specifies that an element is not stroked, but then Inkscape goes on to specify characteristics of the stroke. There are existing tools that try to clean up and optimize SVG files. Investigate svgo[1] and scour.[14]

Measure effectiveness.

In addition, SVG housekeeping issues should be explored. An SVG file should declare its language on the svg element, but that practice is rarely followed in the images I've seen. It's appropriate because it essentially specifies the language of the switch element's default clause. Most files apparently assume xml:lang=en; Wikimedia even does that. There are, however, SVG files that were originally done in German; they should signal that German is the default language. There are also unclear side-effects of such an assignment. Recently, I noticed that RDF metadata inherits such a language declaration, and that modifies the resulting RDF triples. I don't know if that modification has an effect; it can be counteracted by inserting xml:lang="" on the rdf element.

Another housekeeping task is checking the order systemLanguage attributes. librsvg currently mishandles systemLanguage, but they should be checked for proper order and resorted if needed. For example, the default clause should be the last clause. There are other langtag issues. My current understanding is if sr-Cyrl and sr-Latn are specified, then there is no reason to specify sr because a user agent specifying sr will match either. It may be appropriate to put the sr-Cyrl clause first to catch the more common default. I need to study this more. I also need a better understanding about using zh-Hans, zh-cmn-Hans, and cmn-Hans (and likewise for zh-Hant variations). It may be that the middle combination should not be used. The langtags zh-CN and zh-TW are also used.[15] To workaround current problems with Edge and Chrome, it may be appropriate to restructure systemLanguage to single values.

SVG is supposed to automatically scale, but that requires the file to specify the correct information. The file should set viewBox attribute; that sets the extents of the window. The file should not set width and height; they should remain at their defaults of 100%. If an SVG file specifies its width to be 5000 pixels, then the user agent will display the image as 5000 pixels wide. That is rarely the intent; instead, the picture should be scaled to fit the width of its embedding. To see the large width effect, go to File:History of the Universe.svg and click on the version for 28 March 2014. Only a portion of the image will display because the image size is 5,500 × 4,250 pixels — bigger than most displays.

Validation is also an issue. Wikimedia may have recently outlawed some internal subsets Phab:T151735. Outlawing parameter entities prevents an SVG 1.1 DTD from validating specific data-* attributes. That is, XML internal subsets using parameter entities can satisfy the W3C validator, but they may not upload to Commons:

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" ""
<!ENTITY % SVG.External.extra.attrib "
  data-wd-q CDATA #IMPLIED
  translate  CDATA #IMPLIED

It may be that the W3C validator will accept SVG 2.0 extensions sometime soon.


How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Research and development: $40,000.

Community engagement[edit]

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

Advertising would be by templates on the file description pages and categories.

Descriptions of how to make good-for-translation SVG on SVG Help page.

Get involved[edit]


Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

Glrx has made more than 25,000 edits to the en.WP.

Developed switch-translated files for Commons.

  • File:First Ionization Energy.svg (29 kB; SVG has tooltips to identify element and ionization energy).
    although there's no Bulgarian translation, look at the table that has Bulgarian Wikidata labels.

Contributed at

Identified SVG Translate bugs.

  • used library that changed public method to private Phab:T138780
  • https: not recognized Phab:T125743
  • error processing versions

Identified SVG bugs.

  • Chrome does not handle systemLanguage correctly (does not parse out commas).
  • Edge does not handle systemLanguage correctly (only single valued).
  • Edge does not do text-anchor, dir, and BIDI correctly.

Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Link notifications:

Individuals (links failed; edits to talk pages; see User Glrx contribs 2018-02-15 16:44)



Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • Creating localized images is a task as old as Wikipedia. While SVGs with translatable text improved upon raster-based formats a great deal already, the most used copy/paste approach for creating new translations still does not solve a fundamental issue: Translated files will quickly become outdated, keeping them in sync is a huge amount of work that can not be expected of translators. Also it adds a huge amount of redundant information that adds an unjustified maintenance burden.
    Keeping all translation in a single file therefore is the ideal solution to add an arbitrary amount of translations to a single file that can easily be kept up-to-date by graphic experts and content specialists. Having a tool for automated addition of localized strings will open up this job for all translators without needing any technical knowledge about SVG or uploading files, greatly aiding the translation effort Wikipedia should strive for. Patrick87 (talk) 00:58, 16 February 2018 (UTC)
  • Pretty interesting project and it stated the considerable for languages with multiple variants of their writing systems like Chinese, which is my top concern. However, I think it is reasonable to provide more detailed information about the $40,000 budget. PhiLiP (talk) 01:22, 16 February 2018 (UTC)
  • I would support this project, it would be a great help, especially for Wikipedias with lesser graphical participants. Every different language Project would benefit exponentially. It would also fit in to phab:T56214 and phab:T184310 (unfortunately there exists currently bug phab:T36947, with priority high since nearly a year). -- User: Perhelion 22:52, 16 February 2018 (UTC)


  2. "continental shelf"
  3. Schnabel, Bryan (2008). "Translating SVG with XLIFF and Open Standards: Efficiently Solving a Classic Challenge". 
  4. e.g.,
  6. Cannot find a ref for this statement. Certainly its:translate can be used.
  8. Bah, Tavmjong. "Inkscape: Guide to a Vector Drawing Program" (5th ed.). Retrieved 2018-02-11. To be quite honest, making Inkscape convenient for creating SVGs for the web has been more of an afterthought. 
  9. "Inkscape SVG vs. plain SVG". Retrieved 2018-02-11. however, if you then re-edit the SVG in Inkscape (after hand editing, for example) 'without' removing the references to inkscape in the object that has been edited (for an example if you edit a path created using the inkscape star tool), then inkscape will re-generate the SVG path d="" attributes using the information that 'it' has stored under its namespace, and therefore getting rid of any editing to the path of that you, the user have done. 
  10. . Adobe Illustrator may offer this as an alternative to text-on-a-path.
  11. The browser does not do allowReorder processing, so it latches onto the first language preference that matches.
  13. SVG 2.0 Draft § 5.12.6