Wikimedia monthly activities meetings/Quarterly reviews/Editing, April 2016

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Editing team, 11 April, 09:30 - 10:30 PDT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Slide 1[edit]

James: Welcome all. To start with, we don't have a strong impact on short-term basis on metric. Slightly down year to year, although not apples-to-apples due to month alignment (full data including March won't be available for a bit).

Metrics[edit]

Slide 3 - Metrics: Active editors[edit]

Neil: Doesn't make much sense to talk about editor declines here; there is a very slight decline over a five-year period but ... still a significant challenge, but not sure decline is the best term. Whatever is happening, it hasn't really stopped.

Slide 4 - Metric: New Wikipedians[edit]

Neil: We do see a significant long-term decline in new editors. Deserves further study, but there are two major challenges to doing that kind of investigation

First is, only one analyst in Editing (me), but we are hiring a second.

Second is infrastructure. My normal metric is "new active editors", not "new Wikipedians", but we were blocked by technical limits from calculating that (query ran 8 hours and failed).

(coming back from slide 6)

Katherine: Why is this decline in editors not of concern? [...]

Neil: Sorry, I don't mean to say it isn't of concern. What I'm trying to say is that if we want to address the overall number of editors, we should be focusing specifically on new editor decline because that seems to the more significant problem. Existing editors stick around pretty consistently.

Ori: Particularly since total Internet user count has grown dramatically—it isn't flat.

Neil: Yes, just having active editors being flat is a challenge for us [i.e., in context of user growth, flat vs slight negative is not important - anything other than proportional growth is actually decline]

Slide 5 - Metric: Edits to Wikipedia articles[edit]

Neil: Least interesting metric up here, as it's inflated by bot edits and tools like Huggle.

Slide 6 - Metric: Mobile edits to Wikipedia articles[edit]

Neil: continues to go up, which we would expect. currently 550K/month out of 12 million wikipedia article edits.

James: For context, roughly 50% of views are mobile, but only ~5% of edits.

Collaboration team[edit]

Joe: Main goal was to roll out cross-wiki notifications, which we did. Went great.

Slide 8 - Objective: Powerful notifications[edit]

Joe: People have been adopting with good numbers.

Slide 9 - Other successes and misses - Cross wiki beta enables[edit]

This slide is people enabling the beta feature. You can see for example that English shoots up to 4k adopters in March.

Slide 10 - Other successes and misses - ancillary things[edit]

Joe: Some ancillary things we've been up to. Mobile interface now matches web interface. [...]

Language Team[edit]

Slide 12 - Objective: Improve reliability[edit]

Runa: Goal was to focus on some of the technical debt, areas that seem to be more of a problem based on feedback from users. Three areas that block new article translation. One, Errors on saving and two, restoration of translations. Translations that are partially complete. Three, errors on publishing. Tried to isolate error subtypes. An important achievement was to handle errors with more precision. Added more elegant abuse filter warning handling, to help users identify what was blocking them and which section of the translation needed fixing. so they could go to that section, e.g. remove a Youtube link which was not allowed on that particular wiki. We think this is bringing down error margins given that more translations are being recovered after initial errors and published. Higher percentage of articles which had errors are now being published (meaning the translator fixed the errors). Total error numbers is still somewhat unchanged, but the percentage is stable given the total number of articles is rising. Found that even if we focus on a few types of errors there could be many more sub-types within that small focus group.

Slide 13 - Objective: User engagement[edit]

Runa: Secondary goal to continue ongoing work on translation suggestions, to increase user engagement. Lists have been welcomed by users. Was used more that 7K times this quarter to start an article from suggestions.[includes abandoned or in-progress articles] Suggestions came from Research Team. Even notified some new users who had only used Content Translation only once about suggestions. Medical translation team conducted multiple sprints to translate lists of essential articles (starting with list of vaccines): benefits include consistent slate of translated article.

Right now, creating a new list requires the Language team's intervention. Have plans to change this, but actual work may be delayed by staff availability.

Distribution of user engagement has been identical to past quarters. Continue to want to work with the Wiki Education Program.

Slide 14 - Objective: Increase machine translation coverage[edit]

Part of end to end workflow, some have it, some don't. Quality of translation differs, which can affect per-language rates/quality. have two machine translation sources, Apertium, Yandex. A third ("Matxin") has been in conversation with us. [...] The entire process to deploy new machine translation pairs took much longer than expected; try to inform user communities well in advance; rollout plans are made to go from lower-activity-level languages to higher. Very cautious, had a few blockers. Did enable 17 languages for machine translation but the languages that would probably produce higher rates are delayed due to the original rollout schedules and deployment blockers. Working with Community Engagement and TechOps to get more robust rollout and easier deployments sorted out.

Slide 15 - Other successes and misses[edit]

Runa: Major uplifting news was from medical translation project, used content translation for very focused sprints. They coached the users, both old and new, on using the tool. Earlier, they used Microsoft Word; now they use Content Translation, which is a better user experience.

Work around parallel corpora. Improved the way data is stored. Will be available for download in JSON and TMX format.

Concerning part: increase in technical debt in Content Translation. Attempted to go slow with feature improvements this quarter. Feature development will continue to be slow in the face of resource shortages.

Slide 16 - Appendix - articles published[edit]

As of today, over 70,000 articles created with Translation.

Katherine: The technical debt, you raised as a roadblock. Is this consistent with what was planned and expected, or was technical debt greater than expected? What is the path forward to sustain or resolve debt, and is it consistent with expectations?

Runa: One of the reasons for the increase in debt over expectations is the way the tool has been promoted to a larger circle of users than expected. Discovering some issues ourselves, getting some from user feedback. Trying to triage to set priority effectively. Also means we are using short-term solutions because they of time shortages. Still trying to figure out how to balance technical debt and how to keep it to safer levels. Not sure that fully answers your questions.

Katherine: Not entirely, but thank you. Interesting issue; we should continue the discussion out of the meeting.

Slide 17 - Appendix - Error types[edit]

Multimedia Team[edit]

Slide 19 - Objective: Upload dashboard[edit]

James: Had a single goal plus secondary goals which were more like work areas. Main goal is data and intelligence around what's going on. Got anecdotal comments from users, no real numbers. Ran an A/B test of different upload tools aimed at improving quality of uploads but discouraging uploads contrary to policy (copyright violations or, rarely, out of scope). 60 to 65% of images uploaded were deleted. good news, it's not the tools' fault, that's the number for all tools. Explaining to new community members to explain copyright, haven't had any success moving the needles. tools used by incredibly trusted users, like GLAM bulk upload tool, have no copyright violation rate, tools used by new users have very high rates.

Looking at historical data, Wiki upload drops off after 2007/8 peak; commons keeps growing dramatically. Same pattern for first-time uploaders, with spikes for Wiki Loves Monuments. Since cross-wiki upload tool, almost 50% above peak. i.e., comms community has much more work to do explaining copyright to new users. goal now is to support the community better by triaging incoming things.

Geoff: Does the graph include uploads which were later deleted?

James: Last September (during Wiki Loves Monument), 1.15 M uploads to Commons. Yes, I believe that's before deletions. We have deletions on the dashboard, which are fairly flat over the last couple of years.

Katherine: Sorry, who did you say you were working with?

James: Working with CLs and Legal on redesigning text and interface around upload interfaces, e.g., asking Yes and No questions on upload, checking boxes – but nothing worked to reduce delete rates and thus burden on the community. Need to fix this.

Katherine: What's next?

James: Try putting more "why" text into interface. Striking out in the dark on how to convince people. Looking at how Youtube, Facebook, Twitter does it, but they don't have much success either.

Most usual answer to copyright is, "I have a physical copy at home". Teaching the world about copyright is not our core mission but turns out to be a necessary part of our work.

Maggie: What was community response?

James: ... They initially thought it was the tech tool's fault, but that got much better quickly when we showed data that all tools have a similar pattern of behaviour (depending on user class). It's systemic, not specific to any one tool. [...]

Parsing Team[edit]

Slide 21 - Objective: Mobile reading via Parsoid[edit]

Subbu: One issues with Parsoid is that we add a lot of information that allows VisualEditor, ContextTranslation, to do their editing. For things like Mobile Reading this is a lot of data that gets shipped unnecessarily, slowing down. Solution was to move the data into a separate bucket so each client can get it or not depending on their needs. Problem is this is a big change to HTML, had to coordinate with all clients (VE, CX, Flow, ...). final achievement is a process and protocol for breaking changes to HTML. Main thing we got done this quarter, project driven by Arlo and Gabriel. we expect that this quarter we might have this deploy on production cluster. Hoping that now that we have process for managing breaking changes to HTML, it will be much smoother in the future.

Next 3 slides are about work that's been going on over the last two or three quarters.

Slide 22 - Other work: Visual diff testing[edit]

Subbu: Our team's main thing is building infrastructure for doing big back-end changes to the Parser without disruption. We want to be confident about how changes to parser affect rendering. Visual diffing uses motion-detection to accurately determine how the visual output from two renders is different, so we can decide whether a particular change will have a big impact.

Want to be able to run tests on tens of thousands of pages and not have to figure out what the most important ones are by sifting through diffs -- numeric score of visual diff helps with this.

Slide 23 - Other work: Prototyping “balanced templates”[edit]

Subbu: In wikitext, templates don't necessarily produce well-formed output and aren't necessarily self-contained. Big problem for clients like VisualEditor because they don't know the boundaries of a template and Parsoid can't easily update the output when a template is edited. Scott has implemented a prototype that implements a solution for this in MediaWiki core. Trying to decide if this is a good approach we should move forward with. Visual diffing is important for this.

Katherine: not sure what balanced template is but will ask later - we are running out of time.

Slide 24 - Ongoing work: Replace Tidy[edit]

Subbu: Replacing Tidy is a long-standing request. Expect that by the end of this quarter (Apr-Jun) we may be able to actually do it.

Slide 25 - Product Team[edit]

Slide 26 - Objective: Measure product acceptance[edit]

James: Goal of pioneering running an A/B test inside the department, without help from Research. Joint task with VE team; have not done the A/B test, but not because of this, because of other factors.

VisualEditor Team[edit]

James: First blocker was single edit tab integration. Replace two tabs with one, switch within tab, default based on last editor. Will be one necessary component of A/B tests. Live on Polish and Hungarian wikis for ~1 month each, gone well, found some issues, made some improvements. Planning to deploy to English WP tomorrow.

Katherine: are we rolling out to 100%?

James: yes, but the proportion of logged-in users actually affected is less than 1%, because most users were automatically opted out, as per community agreement, or by their explicit choice. It won't affect logged-out users as we've yet to have that conversation with the English Wikipedia community.

Katherine: Have we socialized this?

James: Yes. There have been Village Pump notices etc.; even so, we're anticipating some surprised feedback

Katherine: So the surprised feedback will come from?

People who don't read Village Pump or other messages, because people don't all read those fora, or do but forgot, or were away, or….. Will always be some of those. Undoing, waiting, and redoing is of course an option.

Geoff: Is the long-term strategy to support both the wikitext and visual editors? 10 years from now?

James: Yes. Some edits can never be done visually — e.g. template creation/editing can amount to programming. Also, VE is JavaScript-only, and we are going to want to enable editing for all, about 3% of users don't have JavaScript so that's not good enough. Also, for foreseeable future, wikitext is saved in database, not HTML. It might be possible to change that years in the future, but it's not clear there will be significant benefit.

James: SET rollout was originally planned for [Q2/3 in FY2017]. Brought forward by several quarters at beginning of FY2016Q2 planning (Sep/Oct 2015). I think the product ambition was right, but re-sequencing that much work created major challenges.

Slide 28 - Objective: Increase use of the visual editor - single edit tab[edit]

Katherine: once Single Edit Tab is deployed, how do people move between editors?

James: Switch button in toolbar of each editor, big part of tech effort [already deployed]; thanks to Parsing tearm and Services department for support in doing that.

Slide 29 - Objective: Increase use of the visual editor - option for anonymous users; A/B test: visual editor, default editor for anonymous users[edit]

Slide 30 - Other successes and misses[edit]

James: Table editing is now much more aligned with the way the rest of VE work, and works much better on mobile.

German Wikipedia came and asked for VE to be turned on, had a community vote, 130-20 or so, to switch it on for both IPs and for new users. Deployed the day before de.wp 15th anniversary.

Only big wiki that we're not deployed on, and could be made by just flipping a switch, is now the Dutch Wikipedia. We're OK with this.

Now support pretty much every language other than Chinese and other languages that use language converter. Still not on by default for some big languages, such as Japanese, but that is planned this quarter, pending CL support being available.

VE is now on by default by about 230 wikimedia wikis, including some which are really dead.

Geoff: Just want to say: James, you have been a steady hand on VE through very large storms, and we're seeing the benefits of that steady hand. Want to thank you and the team for that steady progress.

Maggie: content translation mean language converter? Yes.

James: Using VE with Language Converter is likely to be a fundamental, unaddressable technical challenge. Trying to match one technology (Web/DOM/WYSIWYG/interactive) with another (string manipulation). Same issue with the Translate extension. May have to replace both with equivalents that are compatible.

Ori: You're in a tough predicament. You're being asked to own the key metrics (e.g. monthly active editors), but you're already engaged in complicated software engineering work. Do you think that's a gap in our organisational structure? We used to have a growth team; should we create a team like that that focuses on the KPIs?

James: I don't know. The new data analyst I'm hoping to hire should enable us to do some serious modelling, less nebulous answers about impact of VisualEditor and where people are dropping out. I feel that we should own the KPIs around editors because otherwise, no-one does. You're right, we are not obviously answering the question about - even if we can see the number is going up or down, and can say why, ...?

Katherine: Thank you, everything that Geoff just said, extending it to the entire team. Appreciate bringing the lessons learned to everything you present on, moving past success and failure to here's what we've accomplished, here's roadblocks, here's what we're doing to address them. Good to start with Neil's data and end with Ori's question.