Wikimedia Foundation Annual Plan/Quarterly check-ins/Editing Jan–Mar 2017

Notes from the Quarterly Review meeting with the Wikimedia Foundation's Editing department, 24 April 2017, 09:00 PDT.

Please keep in mind that these minutes are mostly a rough paraphrase of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Slides:

Present (in the office):

Heather Walls
Jaime Villagomez
James Forrester
Joady Lohr
Joe Matazzoni
Joel Aufrecht
Roan Kattouw
Michelle Paulson
Stephen LaPorte
Toby Negrin
Victoria Coleman

participating remotely:

Amir Aharoni
David Lynch
Ed Sanders
Katherine Maher
Neil Quinn
Quim Gil
Subbu Sastry
Trevor Parscal

Slide 1[edit]

Trevor: Welcome.

Slide 2[edit]

Trevor: Editing Dept is focused on contributions, specifically looking at improving quantity, quality and diversity. Different initiatives that aim at 1 or 2 of those, we think this is a balanced approach to improve the wikis overall.

Slide 3[edit]

Trevor: We're mostly software engineers, designers, researches, QA engineers, so we do it by building tools, specifically tools that are designed so a lot of different people can contribute around free knowledge. There are publishing tools on the internet like Twitter and blogs, but we specifically work on tools that let multiple people work on free knowledge.

Slide 5[edit]

Trevor: We're toward the end of the current FY. We've mostly been working on making improvements on what we've got and solving some tech debt, so that we can focus on new things next FY. Also getting our feet wet with new tech before we make major commitments. We now have a strategy for next FY in our annual plan that's mostly focused on new editor success and device support on one side, which is a play to improving diversity and quantity of contributions, and on the other side looking to improve editor quantity and engagement.

Question from Katherine via Toby: What are the timelines for editor retention work?

Trevor: In two days, we're going to divide up the work between the resources we have and figure out who's going to work on what and when. Edit review improvements is something that's already in flight, so the timeline on getting that out is pretty aggressive, whereas some of the other things are either research projects that we might not start until mid-year, or prototypes. So the timing varies, but we're also going to figure out more on Wednesday.

Toby: We already track retention on a number of different timescales, is there a specific one you're looking at?

James: The "famous" one is 2-month retention: new registered user makes 5 edits 1st month, then 5 edits again 2nd month. That rate is very low and has been trending down for some years, we want that to reverse (halt decline and instead increase).

Trevor: A lot of measures have a symbiotic relationship between new and advanced editors that we're trying to set up. By giving advanced editors tools to be more precise and kind to new people, will improve the experience for entry-level people.

Toby: It'll be interesting to see any impacts of the community health work in CT/CE, would be interesting to have some follow-up discussions with those teams.

Slide 6[edit]

Trevor: Aside from the Multimedia team moving over, which represents the structured data portion, machine learning-wise we're doing some things with edit review, and with machine translations we integrate that into a product that helps translate between pages. These are investments that we're focusing on, either completing work in this fiscal or setting us up for work in next.

Toby: Katherine's excited about the move to single parser and UI libraries, this is the kind of thing that frees up resources for other projects. The moves to machine learning and structured data, will that be addressed later?

Trevor: For structured data, well, it moved when Multimedia moved, and it got the grant that was our play. Machine translation is wrapped up in CX which we'll discuss later, ditto for machine learning in ERI. For structured data, go to the Reading QCI.

Slide 8[edit]

Trevor: The way we're doing this is divided into product teams, with the Parsing team that works on infrastructure things, a Design team that provides design for the entire dept, and Product that's also a horizontal. So we have a combination of verticals and horizontals within our audience vertical.

Slide 9[edit]

Trevor: There's also a lot of other things we're maintaining that are not listed here.

Slide 10[edit]

James: Top-level metrics are pretty flat. Note that February has 28 days, March 31, so +8% is actually flat.

Toby: We might move to normalised 30-day month stats for that reason.

James: Worth pulling out: 2nd month active editors has been in a slow decline for years now. As I said, that's something we want to address. Monthly non-bot edits has been going up, 4.2% up YoY. In particular, mobile edits has been going up very strongly for several years now, now 6.1% of all human edits.

Victoria: Does this mean on phone? Tablet?

James: Yes, someone using a phone or tablet or an app to make edits.

Trevor: We saw this trend as a leading indicator on the Reading side, and in the world in general, so this isn't a surprise. It's becoming a bit proportion of our users, so we're doing research to address it, which Neil will talk about in a second.

Stephen: Does this include the Wikidata Game?

James: No, the data excludes Wikidata. Almost all of the edits there are by bots, or are bot-like tool edits that are not flagged as bots, so it's not practical to distinguish the low %age of edits that are human-made. When Wikidata was turned on, there was a huge drop in bot edits (>1m a month) because they didn't have to maintain cross-language links any more. We're now back above that level just with human edits now.

Trevor: Just to be clear, the reason we're expecting 2nd month actives to slowly decline is because we haven't made an intervention to address it, and it's been going down for a long time. Next FY we're doing a variety of things to address this, and then we do expect a change.

Toby: It feels like the place we're losing people is getting them to continue to edit, goes back to the focus on retention.

Victoria: I'd look at what the profiles of these people are…

James: Next slide addresses that

Slide 14–17[edit]

Neil: One of our goals is increasing success and retention of new editors, but we don't have a lot of knowledge about who new editors are, what motivates them and what they need. Some research, but it's limited, and it's difficult to do, it's hard to have deep conversations with new editors. To address this, we've drawn on the model from New Readers, and assembled a team of representatives from various teams, and engaged Reboot (the same firm that helped with New Readers and is helping with Movement Strategy). Abbey and I are leading this project. We selected South Korea and the Czech Republic as the destinations for our research trips. Both countries have compact geographic bases. Both mid-size communities, 900 active editors/month, not huge like English or small like 2-10 monthly active editors. So they are big enough to have specialisation, but small enough to be receptive to this research. They're also distinct on some axes we find interesting. Korean has unusually high new user retention, Czech unusually low. Czech also has an active affiliate community, Korean doesn't. We're going to travel to each country for a 2-week research sprint. We're also developing our internal capacity within Product and within WMF for this type of contextual inquiry research. For this project we're going further than New Readers did by hiring local coordinators ourselves. Planning for this research to be ready around July 15th. We're calling the project New Editor Experiences.

Katherine via Toby: Do we describe these communities as "emerging"?

Neil: That's a community resources term, means something different, not what we based these decisions on. We'd call them mid-sized wikis.

Toby: It's been really great watching this project learn from New Readers, great to watch us build these capabilities. Kudos to you and Abbey.

Slide 17[edit]

Trevor: Draft annual plan is up on meta, looking for feedback from the community. It focuses on these three areas, three programs we'll be working on:

The increase in device support for editing is going to be informed by the research we're doing as far as what things to actually try and port and make work on mobile; making everything work would take a lot of time, we want a target approach.
Increase new editor success is also going to be informed by the research findings.
For current editor retention and engagement, we have an existing body of research, so it won't not be research-based, but it's going to be somewhat less pivoting on the research Neil talked about.

Slide 19[edit]

Volker: After focusing on style guide in the second to last quarter, last quarter was about solving technical and design debt. The design guidelines are now public and already in use. Reading devs have said they plan to use it. In other areas we've been able to bring design templates in OOjs UI fully on par, no design debt there anymore. Further increased ARIA (Accessible Rich Internet Applications) support, passing 3 more tests. And (actually part of this quarter) did the 100th release of OOUI with the 54th patch contributor, which is a pretty impressive number from my perspective. Next quarter we're going to define design requirements for support on mobile devices; we've already started this, there will be a clear definition of what browsers we'll support and we're gonna work on design issues that come up with that definition and with implementation.

Slide 20[edit]

James: Next goal, improve curation tools. We added visual diffs to the visual editor, pretty novel piece of work. When people edit with VE, they then have the chance to see the diff of the edits they made, that diff has been in wikitext. This is pretty terrible, because a big part of the point of VE is you don't need to learn wikitext, but also, we can do better than wikitext diffs which has its roots in 1970s tech. It was pretty seriously hard. Looked at a lot of industry standard DOM diffing libraries, settled on a multiple library strategy. Wrote our own library for part of it and used a Google-based library for another part.

Slide 21[edit]

James: Example of a diff that moves a paragraph and changes a word. In wikitext diff this is impossible to see, but in a visual diff we're able to highlight this separately. The up/down arrows reflect the paragraph move, and the word "not" is highlighted to show it was inserted. This is less interesting for your own changes, but it will help spotting vandalism. Right now this tool is only in VE for your own edits, but the plan is to make it available more widely.

Slide 22[edit]

James: Change where you've italicised one word and changed the target of a link. On the left you see apostrophes, on the right you see "link target changed from X to Y".

Slide 23[edit]

James: Long-term target is to move this out to all users. Right now it's only on desktop; save-publish workflow on mobile needed some tweaking. Also need more improvements so that e.g. references will show correctly. Then it can become the primary diff view for visual edits, and eventually for all edits.

Toby: User testing feedback? James: We care about 2 categories of people. One is people who already know what a diff looks like and what to not be misled or confused. That group is our initial cohort for testing this with. Most are highlighting things that aren't yet available, like reference changes, fixing those.

Trevor: There's no doubt we can improve the UI on this, Pau's been really helpful in working with the VE team to come up with visual treatments. I would caution about this type of screenshot that — if you made this edit and clicked the button, you would understand what it showed because you just made this edit. It's part of a process, so it makes more sense. Not all UIs are going to make complete sense in isolation. That said, when you are reviewing someone else's edits, you don't have that context. But I just want to clarify that one look at a screenshot can't always tell the whole story.

James: Once we're confident we've fixed edge cases, we want to do testing with newbies who have never seen wikitext in their life and ask them if they understand what it's showing them, then going from there to explore possible changes. Toby: Another useful population is people who already use VE as their primary editor.

James: People who edit visually but still see wikitext diffs are a good target, yes.

Toby: What makes this successful , what behaviour are you looking for?

James: Initially, "can they get a visual diff?". Longer term, once we deploy this more widely we're expecting this to make vandalism detection easier, but also a more pleasant experience that we can give people on both mobile and desktop. Right now, two-column diff on desktop and different experience on mobile.

Ed: Right now, when you hit save, the diff is in the way because it's wikitext and confusing. If we had a better experience, we can change the workflow , and help users discover mistakes they may have made. Reviewing your own changes before you hit save is part of the review process.

James: Maybe when you click save, you have the opportunity to split edits, those kinds of things. But yes, certainly in terms of direct metrics, we'd expect self-reverts/self-follow-ups to go down.

Slide 24[edit]

James: Number of logged in users who have made 1 or more edits with VE in a given month since July 2013 when it was initially enabled on English Wikipedia and 10 others. We're now getting over 100k accounts/month with at least one VE edit. There's some big peaks and troughs here. The one in September 2013 was because VE was switched off on English Wikipedia. The big rise in August 2015 was because VE was enabled for new accounts on enwiki. It was then switched off again when the community changed its mind in Feb/March 2016. That's the principal driver of the quick changes in these numbers. Despite that, it's been going on other wikis, but because enwiki is so large, config changes there dominate the overall numbers. E.g. the switch by dewiki community to enable by default had a small increase, but not that large.

Toby: The rise since mid-2016 is encouraging. So that's organic growth?

James: Yes

Toby: That's almost a 50% increase YoY, that's pretty cool.

Slide 25[edit]

James: So right now, coming up to a third of all active accounts use VE at least once each month. They don't all use it for a significant proportion of their edits, some people use VE for a few edits and wikitext for most others. This is the raw number of edits to content (non-talk) pages. Just coming up to a million edits a month made by VE. 2016 had 8M edits made with VE, we're going to surpass that halfway through 2017. 1M edits is only 8%-9% of all content edits.

Victoria: There are people that only make 5 edits per month, are they more or less likely to use VE?

James: More. Users of VE skew to people who make fewer edits.

Victoria: Is there a trajectory where people start wth VE then move to wikitext?

James: Yes, but we don't have strong data, we think it changes over time.

Toby: Is this cannibalising wikitext editing?

James: I don't think we have the ability to determine that. The overall number of edits is rising, and we can't attribute that to any one cause of factor.

Trevor: You could claim that, but you couldn't prove it. So it's possible, but we're not taking a victory lap or throwing a party.

Toby: It's not an unrealistic assumption.

Neil: When we did A/B testing of VE back in 2013 and 2015, we didn't see an increase in edits in at least the first several months.

Slide 26[edit]

skipped for time

Slide 27[edit]

Reduce tech debt interstitial

Slide 28[edit]

Subbu: Goal for this quarter was to enable the Linter extension on all wikis. Wikitext is hard, you can accidentally introduce errors. Parsoid has the ability to identify markup errors and deprecated patterns. Linter surfaces them to editors. We had a GSoC student work on a prototype in 2014, but we didn't have time to polish it; last offsite in 2016 Kunal picked it back up, and he and Arlo finished it. Deployed to all wikis, but had to be pulled from the very big ones for performance reasons. Coming back soon.

Slide 29[edit]

Subbu: Errors vary between wikis depending on various factors. Here you can see that bewiki has many more mis-nested tags than dawiki, but many fewer missing end tags.

Slide 30[edit]

Subbu: You can click through and get a list of specific pages with specific errors, then click an edit link to be taken to the exact place to fix them.

Slide 31[edit]

Subbu: Linter extension will be a collaboration between Parsing and MediaWiki Platform, now that Kunal has moved there. We also deployed the ParserMigration extension to all wikis, to work towards the goal of replacing Tidy. We also worked on supporting language variants in Parsoid, and enabling visual diff testing for the Tidy transition. All in support of the long-term goal of using Parsoid HTML for read HTML. This is only used by the Android app currently but we want it to be everywhere. Next quarter, in pursuit of that long-term goal for Parsoid HTML read views, we will deploy audio/video support in Parsoid.

Slide 32[edit]

Invest in New Tech Interstitial

Slide 33[edit]

Amir: In the last quarter we mainly worked on porting the current CX code to OOjs UI, the goal of this is to make the editing component the same as VE and Flow. This is a completely internal project right now, pretty long-term, no user-facing changes yet. VE team is helping. Actual results will start to be visible next quarter, main goal for next quarter is completing that, actually wiring the editing surface up and deploying it.

Slide 34[edit]

Amir: We started deploying CX in Jan 2015, after 6 months it was available in all languages. We had about 1000 articles translated per week then, now it's getting close to 3000. Also close to 200k all-time. 3 different machine translation engines supporting 92 languages. Gradually adding more.

Toby: Why are the numbers going up? Do we know?

Amir: The tool itself is improving, becoming more stable. Last year we completed a major feature which allowed people to edit templates. In a lot of languages, experienced community members are starting to teach new editors how to create articles using this tool, so it's becoming part of the usual way that people create articles. User groups and chapters around the world use this for edit-athons and translate-athons.

Slide 35[edit]

Amir: Another project we've been deploying since June 2016 is compact language links. Big project for redesigning the interlanguage links list in the sidebar. Shows the user a short list of 9 languages and a "more" button. We can now see the impact. We have not yet deployed this to German and English Wikipedias, but now it's everywhere else; in the last quarter we deployed to Dutch, French and Swedish. A few weeks after deployment we saw that the number of people that clicked the interlanguage links has grown. In mid-size languages in Europe like Danish and Czech it grew by 200-300%. Consistent growth among all languages.

Slide 36[edit]

Joe: Collaboration team aimed to complete two deliverables last quarter. One was new filters for edit review, the first release under the edit review improvements project. This release provides a more precise and powerful interface for the recent changes page. Provides ORES-powered filtering for predicting good faith and quality of edits, optimised for usability. A lot of other improvements, like a filter to let people identify new users, as well as user-defined highlighting. The purpose is to make edit reviewing more efficient overall, and empower patrollers to find new users who may be making mistakes but editing in good faith. Now have tools to find those people and hopefully support them. As of today, available on all non-ORES wikis and, since this morning, English Wikipedia.

For our second goal, we planned to release a feed called ReviewStream, like RCStream but includes data for edit reviewers. Basically we decided to wait on that and see how the new filters are received before investing more in that project.

Toby: How are you going to that?

Joe: I'll get to that in a second. The purpose of ReviewStream is to adapt tools widely used to combat vandalism like Huggle and RTRC.

Toby: Expecting to see retention to go up?

Joe: In the long run. A lot of things affect retention. This is the first step to giving a better experience to new users, is allowing people to find them. We have not yet built a whole interface and workflow for this. One lesson learned from this project: we worked with Aaron and his team, one of the challenges was that they were really not staffed to support a project like this, moving a product from labs to a real product takes a lot of work, they were really stretched.

Victoria: We're adding resources there in Technology.

Joe: Which we fully support and welcome, thank you!

Toby: Product also has a responsibility there.

Joe: Just wanted to emphasise there's a cost in terms of delays etc to not making such investments

Slide 37[edit]

Joe: Users seem to be really enjoying this product. These comments are from non-English users because English was just released this morning.

Slide 38[edit]

Joe: Going to be investing in spreading this technology and improving it this quarter. Adding features users are asking for, like saving filter settings. Organising user interviews with Daisy. And incorporating filters that didn't make the cut for the first round. Then next quarter, adding this to other places like Watchlist

Toby: Can you work with CE to get more program support? You're clearly working to support new users, perhaps they know about volunteers out there who can help. We know from CX that the community's support to making features successful is critical.

Joe: Sure.

[Fin]