Wikimedia monthly activities meetings/Quarterly reviews/VisualEditor-Parsoid/July 2013

From Meta, a Wikimedia project coordination wiki

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's VisualEditor and Parsoid team on July 17, 2013 (2pm-5pm)

Present: Roan Kattouw, Trevor Parscal, Gabriel Wicke, Howie Fung, Terry Chay, Sue Gardner, Erik Möller, Tilman Bayer (taking minutes), James Forrester, Rob Moen, Phillippe Beaudette

Participating remotely: Ed Sanders, Timo Tijhof, C. Scott Ananian, Subbu Sastry

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

James: Welcome
will review past quarter (Q4), but also look at this Q1: July-September
both about VisualEditor and Parsoid

Gabriel on Parsoid
achievements in past Q4, tasks for Q1


Presentation slides from the meeting



Make it easy to view, reuse, and edit content

1. convert wikitext to semantic HTML+RDFa (faithfully)
a standard format we are moving to

2. Support HTML editing, converting back without dirty wikitext diffs
^^ we are at here

3. once this is achieved, use HTML in MediaWiki core

4. Support wikitext editing with Parsoid

Long term, Parsoid would be used in reverse, to convert to wikitext on demand (instead of converting wikitext to HTML).

Progress in 2012/13 Q4[edit]

1-2 above: Solid parsing and roundtripping of basically all content without dirty diffs

  • achieved high degree of compatibility with existing content
  • delta some pretty rare edge cases
  • used a combination of several innovative technicques to achieve that (see our blog post a while ago)

then shifted focus on performance and scaling

  • goal met with pragmatic caching setup, on-edit parsing and expansion reuse
  • Paid off to start tracking full load in early June
  • tracked all edits in early June.
  • Resulting in success: Can *not* see VE or parsoid deployments on server load graphs :)

Other features:

  • templates: Done exceeded
    • went for simple wikitext editing, because we don't yet have the necessary information to check input type for template fields
    • hope to move to visual editing of templates, too
    • Erik: when doing editing of template parameters, you still encounter wikitext
  • cite extension: Done
  • tag extensions: Done
    • not implemented in VE yet, but done
    • there is a GSoC project on <math> that uses code
  • categories, defaultsort parameter: Done
  • images and parameters: partially met, ongoing
    • rendering: fine, editing: not quite
    • Erik: like changing the size? Gabriel: ex. when doing "left" and "right" alignment at the same time, it doesn't exclude them. Straightforward to do, just needs a bit of time
  • Public HTML load / save / expand API for VE, Google, bots, offline readers like Kiwix, etc. (Deferred to Q3 2013 = Q1 2013/14)
    • ex. Google would like to load the Parsoid HTML and get rid of their custom parsers
    • Kiwix
  • Incremental re-parsing after wikitext edits
    • - was planned for performance, but turned out not to be necessary because load not on parsoid cluster
    • possibly use for HTML<->WP switching in Q3 2013 (= Q1 2013/14))

High-level goals in 2013/14[edit]

see also:

  • continued support of VE with editing features, bug fixes etc.
  • start leveraging HTML in MediaWiki core (see 3)
    • HTML and page property (e.g. categories) storage (also, Flow might use the HTML format): idea, store the meta-data outside page content
    • HTML diffing (already doing this, but no frontend for it yet), basic authorship maps - we already diff on the way in, so this is not expensive
      • James: Scott would like us to call them praise maps instead of blame maps ;)
    • Stretch goal: Parsoid HTML for page views.
      • Erik (explains): so far, legacy MediaWiki still does the HTML rendering for readers. When you press edit, the brower contacts Parsoid to get payload for VE to edit. The idea here would be that the reader would have a parsoid-parsed version of the content and could skip this process == instantaneously switch between it
      • Roan: it would also remove some of the small remaining rendering differences between editor and reader HTML.
      • Trevor: It's not trivial. The extra parsing of the page adds bloat to the page. Also, gadgets do stuff to modify content. So it's not trusted. It's not Parsoid, it's Parsoid + gadget modifications.
      • Erik: still fundamental reservations about doing this?
      • Trevor: agree with principle.
      • Gabriel: could always fall back to the DOM that was sent in case gadgets modified the displayed DOM
  • Forward-looking: Investigate HTML-based templating
      • Problem right now: templates still entirely based on wikitext
      • Gabriel: Could eliminate unbalanced template
      • eliminate wikitext from templating
      • Erik: could also make MediaWiki for 3rd parties without wikitext
      • Trevor: ultimately, would like to use a DOM for templates too

Tasks for Q3 2013 (= Q1 2013/14)[edit]

  • Image editing refinements
    • [straightforward]
  • provide public HTML API
    • [straightforward]
  • research: Language variant support
    • [hard, Q3-Q4>]
    • need this for Chinese (multiple variants in same document)
    • difficult because of intricacies in the way variants are processed in MediaWiki
  • Research: Support switching between HTML and wikitext within one edit
    • [hard, Q3-Q4 >]
    • without saving in between
    • trouble: we'd like to preserve annotations in the DOM while switching between the two. difficult to do in wikitext
    • don't know how long it will take, but it will be important. It has been requested quite often. Sue: How essential is this? Gabriel: VE is not the most efficient workflow for everything yet, so people would like to switch back for some tasks. Sue: so it's important for power users? Erik: yes
  • HTML / Wikitext compound storage; support Flow
    • [medium, Q3-Q4 2013 = Q1-Q2 2013/14]
    • requires some forethought for figuring out the right interface. optimize, and interact with Flow team
    • Erik: substitute Varnish based caching with db storage? Gabriel: Yes
  • Enforce proper nesting of transclusions.
    • [hard, Q3-Q4 2013]
    • We do this currently through fully expanding them, and figuring out which part of page is affected. Unbalanced tags can affect the entire page (via template dialog)
    • do something that would nest tags (ex. table start, table row, table end would still be allowed to live in different templates, but proper nesting of whole table would be enforced)
    • James: proposal might be to have metadata that says if you start it this way, you must end it that way
    • Erik: Could we burn this all down. Is that legitimate? JamesF: Now that we have Lua, you can replace that with Lua-generated HTML. Gabriel: Issue is a choice to pass information into template as a parameter (e.g. Table content). might be doable but we don't have good data
    • Gabriel: thought about disabling some of those templates, but would disrupt too much existing content
    • Trevor: also, can't do proper looping of wikitext templates
    • C.Scott: other template systems invent new sugar. The basic problem: they don't want to shove content from article into template as arg. Make a sugar for template so that what looks like start and end of tables actually looks a certain way. Gabriel: still need to enforce nesting for that construct. Cscott: imagines VE has a nice way of editing such a thing, so that thing goes away long term.
  • Testing infrastructure improvements
    • [straightforward, Q3-Q4, 2013 = Q1-Q2 2013/14]
      • Testing 160k pages, thus ailing a bit (Marco is working on it)
      • might be reused by VE or Mobile for browser testing (it's a framework)
  • Performance: More efficient template updates (which part of page affected by changed template?), currently biggest issue for API
    • [straightforward]
    • Need more information from preprocessor

Other tasks on the horizon (beyond Q1)

  • Parse most transclusion parameters to DOM once type info is available

(medium, likely Q2)

  • HTML-only wiki support
    • [hard, Q2-Q4 2013/2014]
    • pretty difficult, not just storage, but also all those extensions that rely on hooks in PHP parser
    • James: outcome would be you can use without every using wikitext anywhere

Erik: so HTML based templating not going to be in Parsoid, but core?
Gabriel: Lets see what the best solution is after figuring out what it should do.

  • Non-Wikipedia projects
    • [likely hard, Q2-Q3 2013/2014?]

e.g. Wikibooks: labeled section transclusion (LST)
problem there: currently extension tags wrapping whole page
Erik: labeled section transclusion is our one feature that would probably make Ted Nelson proud ;)
Gabriel: probably lots of special cases, for example treat section as HTML5 tag - need to investigate

  • Research DOM-based templating
    • [hard, Q2-Q3 2013/14]
  • Use Parsoid HTML for all page views
    • [ hard, stretch goal for Q4 2013/14?]

Questions and Discussion[edit]

Erik: To restate one part: There is interest in Flow to go HTML-native. Store HTML from beginning and not surface wikitext (or restricted wikitext) through UI
That would be pretty big change of UX.

Erik: Any implications for Parsoid from real-time collaboration plans?
Roan/Trevor: Unlikely. It's about VE. At the end of the day Parsoid doesn't care about that.
Gabriel: When you need a snapshot, then Parsoid gets involved.
James: it would affect blame maps
Trevor: need to annotate which user created what
C. Scott: you mean "praise maps" ;)


Presentation slides from the meeting



  • Be the default (and best) editor for all Wikimedia wikis
    • adding currently-missing editing abilities (e.g. tables)
    • better l10n/i18n (e.g. language variants)

improve performance, stability, scalability (get easier, faster, better)

  • Be the default editor for MediaWiki core (shipped with tarball) available for 3rd parties
  • Be a great general editor for non-MediaWiki users to (e.g. WordPress), encouraging an ecosystem of reusers

keeps us honest in terms of architecture, too

Progress in 2012/13 Q4[edit]

(see also slides from last review)

  • 2012/13 Q4: a beta "general content" editor in production as default for "all" [Ongoing]

fully deployed on enwiki this Monday

    • templates: met/exceeded
      • as promised (see mockup)
      • met: dialog editor
      • exceeded: did multi-part templates
      • we didn't more for smooth editing environment (did some changes since end of Q2)
      • (ex. hinting, delays dealt with)
    • references: last time, didn't have a mockup yet
    • met: dialog - add new references, duplicate references, and edit content, citation name. can remove/insert/replace reference lists
      • exceeded: do group references (not happy with design, though)
      • more work on better workflow for adding references particularly in integration with wiki-specific standard citation templates

Erik, Sue: ref groups from POV of editors?
James, Trevor: will be drop-down list, collapse fields that are not relevant for most
Trevor: initially, this was our most ugly feature ;), got a lot of feedback, improved workflow (commit is already in Gerrit, deploy soon)
Erik: VE right now driven by features that exist in wikitext, some of which were ill thought out in the first place (which now impact VE experience negatively, too), can improve these later
Sue: can we make it so that <references/> shows up automatically as soon as I add a ref? (I believe there may be a bug for that already.)
Trevor: strongly believe that this should not be in VE
currently shows red error message when not present
instead, should generate reflist automatically
Erik: but for consistency of UX, should ...
James: some complication: templates for reflist eg. on ptwiki
some discussion on how much invest in cite extension rather than replacing it?
Roan: Trevor's change (plan on improving the UX with respect to handling existing references, ISBN and URL recognition, etc.)

    • categories
      • met: page-level dialog lists current categories, and category sort key and page's DEFAULTSORT setting
      • not met: show editors the (uneditable) categories transcluded from templates
      • Further work: transcluded and hidden categories; editable language links and page behavior settings (#REDIRECT, etc.)

in Amsterdam, flagged language links to Wikidata team, but no followup yet
Erik: language inspector?
James: that's a different thing, helps with rendering of RTL content etc.

    • images
      • Met: can add images, videos, change captions, resize thumbnails by dragging
      • NOT Met: Change non-caption settings on a media item, like thumb/frame, float status, exact size in pixels, etc.
      • Further work: Uploading new items on page (through drag-and-drop)
        • Erik: for multimedia team, might be nice to have a tab or option to list user's current uploads. James: this is something we discussed with Wikia

Trevor: showing content time-based is a problem, but if uploaded "by you" it is much safer
Scott: could offer people to use camera to capture people's photos while writing, for avatars in version history :-)
James: maybe for third party users ;)

    • deployed for all users on main and user namespace on *all* Wikipedias

Expected Deliverables in 2013/14[edit]

(legend for slides:)

  • * = parsoid work
  • ! = significant work in parsoid
  • [component] = depends on component

Core platform[edit]

[Q1] 3rd parties can easily add integrations
[Q2] Stability - Squashing every bug we can, improving glitches
[Q2] Performance
[* Q2] Scalability

James: magic thing at the end: real time collaboration. need to sit down and figure this out, mostly form a product perspective. default for all articles? problem: e.g. edit conflicts when one users starts a big edit and the other makes small edits

[Q1] Hiddent content
[Q1] Rich text copy and paste
[Q1] no-wiki blocks
[Q1] better browser support (IE9)
[[MW] Q1] Content style hitting (redlinks showing as redlinks, stublinks as stublinks by CSS preferences and applied using the same HTML)
Gabriel: basically removed all this from Parsoid, so we could show all logged-in users same cached page
James: currently, our logged-in users put up with really slow (uncached) site anyway
[Q2] re-evaluate content munging
(e.g. template placing padlock in corner)
Roan: e.g. en:Arsenal F.C. has a template for shirt colors ;), breaks in VE (done by Ed Sanders in a previous life)
Juliusz's suggestion about separating CSS from content

[Q2] Bailing to source mode: one-way switch from VE to wikitext editor without saving
closely related:
[Q3] Outline mode: lets you see structure of document
current system: lots of hacks
Trevor: get rid of slugs - content that is not in wikitext, only added for rendering (e.g. separation of two tables so cursor can be placed there)
images and other floated content - very unintuitive to interact with

[Q3] Drag *& drop
[* Q4] Micro VE: edit summary and log entries, currently these take (a subset of) wikitext only
needs some work with Parsoid
Trevor: these are the only bits on the site that are written one time only (i.e. not editable). This means as long as the users are happy with tool, it could be written one-way (no two way)
Erik: also, currently editors parse wikitext with their brain ;) (in places where rendered version not shown)
Trevor: in 20 years, wikitext will still be in the back of some people's minds ;)
[! Q4] editor switching: switching between wikitext & VE mid-edit
challenging to do
[Unscheduled] Commenting : GSoC on document side-by-side commenting (annotation system, using OKFN's model...): help integrating
Erik: long-term, might need to build some way of quoting/referring article content into design
Trevor: so far, little to no communication between us and Flow team
Gabriel: Needs DOM attribute (uid) preservation in Parsoid, just as authorship maps

New Editing Tools[edit]

[Q1] Language settings

[! Q2] Language variant tool
Gabriel: this is wikitext-based right now.
messy backend + UX problem, critical for Chinese and others

[* Q2] Table of Contents
drag and drop __TOC__

[Q3] Layout HTML items
<br /> <hr /> <div>
Erik: no br's might be a blocker for some wikis

[Q4] Arbitrary user CSS
setting "color:red" and similar

[Unscheduled] Definition lists
relatively unused in content space with complex editing needs
Erik: colon-based annotation is common

Extended Dialogs[edit]

[* Q1] Nested templates
Trevor, Roan: more likely Q2
Gabriel: needs type information for parsing (-> TemplateData)
[* Q1] Media settings: media item size (default), display type, alignment, alt text, link
will expose all kinds of preferences that users are currently not aware of (like image links), might create issues with inexperienced users if they start using them in unintended ways

[Q1] References: set local citation templates for auto-suggest
local communities could designate like 5 templates as presets

[[UW] Q2] Media upload: triggered by button, drag and drop or (maybe) copy-and-paste
use UploadWizard
Erik: this entire item could be outsourced to Multimedia team
James: mobile team did some work

[Q2] non-templates: parserfunctions, content magic words like {{CURRENTDAY}} etc.
make it easier to use those
[Q2] categories: hidden categories /inherited categories (from templates)
[[Wikidata] Q2] language links --> Wikidata

[* Q2] magic words: ex. #REDIRECT, NOTOC, NOCC
Erik: have we done some prioritization here? e.g. redirects frequently requested
James: not yet

Erik: do we have special character insertion? e.g. non-Latin, like toolbar in Vector
Trevor: not yet, but simple to do. might just port current Vector toolbar, it's well-curated

[[MW] Q4] edit notices: setting/editing page-specific edit notices in-page
enwiki has system where admins can add editnotice (appears on top when page is edited)

[Unscheduled] media searching: filtering by media type (still/video/audio), length, etc.
currently gives thumbnail icons

New Dialogs[edit]

[* Q1] Galleries
Wikia (Inez) already working on it, this is vital for Commons
[Q1] Equations
widely asked for: click on LaTeX block, edit it
currently only edits as LaTeX
discussion on letting anyone edit content of tag extensions
for math, currently a GSOC project
[Q1] Code: edit code in syntaxhighlight block

[Q2] Tables: add/remove row/column, merge/split cells; set headers, sortable, caption, float
more difficult in terms of UX

[Q3] Table styling: setting colors on table row, column,cell

New and extended integrations

[Q1] Consistency: ensure all parts of integration are as expected (edit links in diffs)
[Q2] Flow (Q2): VE will be an editing experience in Flow
Erik: VE as sole editor for this might not be possible yet (since not all browsers supported by VE)
[Q2] ProofReadPage: Necessary for deployment on Wikisource; unclear who owns it
[Q3] Mobile: Rich editing on mobile devices (tablets and phones)
[Q4] Visual Histories: Visual diffs, "playback" of history
might be nice for readers. Sue: agrees


Deployment and release[edit]

[Q2] Stable MW integration release
- easy install by 3rd parties
maybe without Parsoid first (i.e. just for HTML only wikis). Parsoid might be too demanding for many 3rd parties
[Q2] Cover all content: all non-talk pages except MediaWiki: and Template: namespace
this is mostly just a config change
[* Q2] Deploy to all WMF content wikis: Wiktionary, Wikivoyage, Commons
big dependency on Parsoid. would like to do it this in 2013
would like to sit down with people from each of these projects to discuss issues
e.g. Wiktionary might be easy, Wikisource tricky
[[OAuth] * Q4] Deploy to all WMF wikis: requires Parsoid to auth for private wikis
plug into OAuth work that Platform team is doing now
[Unscheduled] A stand-alone version: for "keeping ourselves honest' and encouraging 3rd parties
Trevor: to be clear, we have been doing this all the time
[Unscheduled] a WordPress integration: for using VE in WordPress, initially as demo
Trevor: talked to ca. 4 people who work for WordPress outfits. Local installations often break parts of the VE with their own variants, most serious Wordpress users don't trust Wordpress' VE, ours could be very useful for them. We can't put much energy into it, but might kickstart it
Erik: Wordpress uses their own component?
Terry: they adapted TinyMCE, adding features by shortcodes
James: also, this would bring more eyeballs


Questions and Discussion[edit] :
(mainspace edits, excluding bots, but including bot-like tools like AWB)
about 9% (fluctuating) of edits by existing registered users
about 45% (trending upwards) among edits by new users
On Monday, switched on for anonymous users (but artefact: fading in gradually because of caching - a previously cached page does not yet include a VE edit link)
Trevor: could purge entire cache gradually (this has been done before)
Erik: need to start conversation about purging now, because it will occur for every language
Trevor: i.e. anons use it pretty exclusively
James: about 19% of views are with blacklisted browsers
Trevor: editors have different browser distribution than readers
James: disappointing that not used at higher rate among newly created accounts. should go up to 50-60% with all browsers supported, but still
caveat: "new" users could be experienced users from other wikis coming in via SUL
Sue: so currently ca. 87% of anon edits still use wikitext?
James, Erik: yes, but distorted by caching
Sue: and among existing registered users, massive majority use wikitext?
James: yes
Erik: this is % of edits, not users :
Trevor: anons look similar to our browser support ratio?
Roan etc: not quite
14000 wikitext vs. 200-300 VE
Howie: just calculated percentages
Erik etc: so actually not a huge discrepancy of percentages between edits and users
I.e. existing editors: about 1 in 5, new editors: 40-45%
Sue: so we think that the proportion using it currently is lower than we wanted?
James: yes, but partly due to caching issues, and need to win over power users
Sue: if we were done already, this would be disappointing. But we are not done yet.
Erik: also, confidence needs to build that VE does not mess up things
and: copy+paste is a big deal, not supported yet
Gabriel: performance issue, I did some tests on e.g. en:San Francisco
Roan: browsers make a huge difference re performance, e.g. IE9 very slow
Trevor: we have a lot of performance work to do, for sure

reasons summary:

  • browser support
  • cache for anons
  • power user support
  • bot on user accounts
  • performance (to intialize stuff in browser, or browser dealing with large documents)

Sue: any more material to cover?
James: no
Sue: Jimmy might cover VE in his Wikimania keynote. my general sense: we are getting less pushback than I personally expected
time has passed for a mere "help us test this" message
Trevor: what convinces community members is us reacting to their concerns and needs, incorporating feedback
Philippe: yes, we fixed ~200 bugs. also: this whole project has been a reactive one from the very beginning, taking up concerns/needs that were voiced in the community for a very long time
Sue: imagine all the awful things (breakages etc.) that *could* have happened, but didn't
do we have some illustrative examples of people whom VE empowered? of "wanted" editors, e.g. subject matter experts
Erik: we have some, ID'ed by community liasons. e.g. comments on feedback page
Erik: most contentious issue: made a conscious decision not to offer an "off" switch
James: other questions about further development?
Erik: I think you spoke to pretty much everything
Mobile and Flow integration will be important
Trevor, James: for that, need someone "embedded" in the mobile team
Erik: worked with Jon yet?
Trevor: yes, he's great
Erik: any general cross-team coordination yet?
Erik: we have Kenan now. development has been siloed, which was fine so far, but need to start coordination now. Howie, can you drive this? Howie: OK
Roan: VE on mobile *basically* already works, except issues like dialog boxes blocking screen (demoes it on his phone)
Erik: (I already discussed this with James a bit:) We need to think more about task efficiency.
important both in dialogue with editing community, and in product decisions
like "experienced user needs 7 seconds in VE for task x, 2 seconds in wikitext"
e.g. wikitext useful for large-scale changes (like search-replace in 200k of text, or lots of insertions of same template)
when talking to experienced users, point out that VE already tells them much that they need to look up when using wikitext (like template parameters)
lots of different user/tasks types (e.g. "I add redirects all the time")
Trevor: yes, VE shows a lot, but not all is discoverable
James: we do have a manual for VE
Erik: any blockers you want to flag?
James: could talk about hooks between...
Trevor: talked about this last quarter: contentEditable code is dependent on Wikia, creates a bit of a bus factor. I think we need a CE engineer in our team, with more of a computer science background, that works for our style of coding, especially the responsiveness of it
Roan: need a variety of skillsets for this
Erik: maybe some shared responsibility with other teams/tasks(?)
Trevor: if we are serious, might need 2 people for this. doesn't differ much between mobile and desktop
Erik: ok, let's think about the resourcing issue
Erik: volume of their input currently?
Trevor, James: low, they were pulled off right before our July launch
Erik: Anything else?
Sue: No. Thanks for VisualEditor ;)
Gabriel: And Parsoid ;)