Community Wishlist Survey 2015/Status report 3
2015 Community Wishlist Survey, Status Report #3: October 2016
It's October already! – and as the Wikimedia Foundation's Community Tech team heads into the fall, we're wrapping up our work on this year's wishlist requests, and looking ahead to the new 2016 Community Wishlist Survey starting on November 7.
Conducting the Community Wishlist Survey was a big, scary experiment for our team last year. Our friends on the Wikimedia Deutschland's Technical Wishes team have been running surveys since 2013 in the German-speaking community, but for WMF Community Tech, this was our first – and it's the first Wishlist Survey to invite active wiki contributors from every Wikimedia project to come together and propose, discuss and vote on 107 different project ideas.
Our team is responsible for addressing the top 10 wishes on the wishlist survey, often in collaboration with other teams and volunteer developers, including the WMDE Technical Wishes team. Meanwhile, WMF's Technical Collaboration team has been supporting volunteers who are working on other wishes further down the wishlist. There's overlap and collaboration on various wishes, so this report includes progress made by both Community Tech and WMDE's Technical Wishes team – as well as the incredible work being done by volunteer developers, staff, and the Technical Collaboration team.
Here's a quick update on the top 10 wishes from this year's wishlist:
In the top 10...
We've completed our work on 5 wishes:
- Wish #1: Migrate dead external links to archives, with volunteer dev Cyberpower678 (currently on English Wikipedia, other languages coming)
- Wish #2: Improved diff comparisons, by WMF developer MaxSem (WMF) (live on all wikis)
- Wish #5: Numerical sorting in categories (currently on English, Swedish, and Macedonian Wikipedias, more languages coming)
- Wish #7: Pageview stats tool (live for all wikis)
- Wish #9: Improve the plagiarism detection bot, with volunteer dev Eran (currently on English Wikipedia, other languages coming)
We're currently working on 1 wish:
- Wish #4: Cross-wiki watchlist
Other teams are currently working on 2 wishes:
- Wish #3: Central repository for gadgets, templates and Lua modules; foundational work is underway by Legoktm
- Wish #6: Allow categories in Commons in all languages; related work is underway by the Wikidata team
We had to decline 2 wishes:
- Wish #8: Global cross-wiki talk page
- Wish #10: Add a user watchlist
There's lots more details on these and other wishes below.
Shipped since last update
Improve the plagiarism detection bot
Wish #9 on Community Tech's Wishlist Survey is to improve the plagiarism detection bot on English Wikipedia, and extend it to more languages.
Over the summer, the Community Tech team built CopyPatrol, a new interface for the plagiarism detection bot, which was built by User:Eran on English Wikipedia. The bot looks at every edit that adds a substantial amount of text to an article page, and checks the text against the Turnitin search database to identify potential copy and paste violations.
When the plagiarism detection bot finds a potential copyvio, a report appears on CopyPatrol, where volunteers can compare the current article's text with the text of the suspected source. If the volunteer determines that there's been a copy-paste violation, they revert the edit, talk to the contributor and then close the case on CopyPatrol. If there's no problem, the volunteer clicks "No action needed" to close the case. Our new interface makes the process much easier, especially with the ability to compare the article and the potential source directly on the page.
So far, more than 8,200 cases have been identified and successfully closed, and a team of volunteers are visiting every day to address new cases as they come up. The old interface had an escalating backlog of unchecked cases. With the new interface, there's typically around 10–15 cases at any given time, so potential plagiarism cases are being checked out and fixed in less than a day.
We're currently working on adapting the bot to run on more languages; the only limitation is whether a particular language is represented well in the search database that we use for comparisons.
Improved diff comparisons
Wish #2 on Community Tech's Wishlist Survey is to improve the diff comparison when changes are made in long paragraphs.
The problem with that diff was that a small change was made to a very long paragraph, and the diff engine had a limit of 10,000 bytes before it would give up and mark the whole paragraph as changed. This was set as a performance limit, to keep the process of generating the diff from overloading the system. In the new version, the diff engine is estimating the complexity of the diff based on the number of words changed, not on the overall size of the paragraph.
Here's another example of a nightmare diff from Russian Wikisource that's easy to read now, thanks to this new change.
We won't make any other improvements to the diff view this year. Now that the original issue is fixed, we're going to look at the other examples collected on Community Tech/Improved diffs to see if there are any other possibilities. But changing diffs is a high-stress venture – Max's fix took six months of discussion and testing before it could be released. We're expecting to see some more diff comparison problems as proposals in the next Wishlist Survey.
Numerical sorting in categories
Wish #5 on Community Tech's Wishlist Survey is to fix the numerical sorting on category pages.
This project fixes a longstanding problem on category listings – that pages which begin with numbers sorted by digit, rather than as a whole number. A list should be ordered like this – 7 Dwarfs, 12 Monkeys, 101 Dalmatians – but the old category collation had it the other way around.
In September, Community Tech deployed a new collation system on English and Swedish Wikipedia, which lists the numbers in the correct order.
We're currently offering numerical sorting to every wiki that wants it. Re-sorting the categories is done using a script, which can take a day or so to complete, depending on the size of the wiki. During the time that the script is running, sorting in some categories will be unreliable. This issue goes away when the script is done.
If you’d like numerical sorting on your wiki:
- Please start a community discussion – RfC, vote, or however your wiki normally decides these things – to make sure there’s support for it.
- Once you’re sure it has support, post on User:DannyH (WMF)’s talk page on Meta with a link to the discussion.
And we'll be happy to set you up!
Wish #16 on the German Technical Wishes Survey is to view multiple edit summaries on the diff page.
The RevisionSlider helps editors navigate through diff pages without having to go back and forth between the diff and history pages. It allows users to navigate through diff pages and to access edit summaries in the diff view, too. It is a tool at the top of the diff screen and shows all revisions as bars on a line. Hovering over the bars shows details (such as the edit summaries and authors), and clicking on the bars selects the revisions to be compared.
The RevisionSlider is a Mediawiki extension. The WMDE team started to develop the feature in April 2016. It is based on a first prototype by the Community Tech team and inspired by DerHexer's revisionjumper gadget.
As of September 13 2016, the RevisionSlider is available as a Beta feature on all wikis!
Wish #9 on the German Technical Wishes Survey is for confirmation about successful and unsuccessful notifications.
Notifications about successful and failed mentions are two new notification types built within the Echo/Notifications-system. "Successful mentions" is an option to receive notifications whenever a mention has been sent. "Failed mentions" is an option to receive notifications when a mention has not been sent. The latter only applies for some cases. It aims to help users to better understand how and when a notification does work or not.
The two new notification types were deployed on all wikis as of September 8. The options are configured as "opt in" and can be enabled in the user preferences.
Google OCR for Indic language Wikisources
Wish #25 on the Community Tech Wishlist Survey is for better OCR support for Indic language Wikisources.
The open-source OCR tool used by most Wikisource projects doesn’t handle Indic languages well. Partnerships helped us to get free credits for Google’s API, so that we could help Indic Wikisources access this much-needed service. Right now, the new Google OCR tool works on Bengali, Sanskrit and Tamil; more Indic languages will be supported as Google improves their OCR services over the next several months.
This is one of the projects that we've worked on this year in an effort to help smaller user groups – in this case, Wikisource contributors.
PageAssessments is a simple extension for storing article assessments (e.g. for WikiProjects) in a database table. It works via a parser function that is embedded in the master assessment template for a wiki. The assessment data is then accessible via an API (or directly from the database). The extension has been deployed to English Wikivoyage and English Wikipedia and will be deployed to other wikis once roll-out and testing is complete on English Wikipedia. This project was based on a request from WikiProject volunteers and was started in 2015 before the first Wishlist Survey.
Community Tech fixed some long-standing problems with the
ccnorm() functions used in abuse filters. See the following bug tickets: T29987, T27619, and T29987. These changes should make these functions much more effective and intuitive to use.
Wish #4 on Community Tech's Wishlist Survey is a cross-wiki watchlist.
The cross-wiki watchlist project aims to make it easier for editors who edit on more than one wiki. It will enable editors to keep track of all pages and wikis they are watching, from one onwiki location. Community Tech will build this as an optional Beta feature, with plans to have it available on all wikis. Over the summer, we worked on wireframes for the watchlist in collaboration with community input.
We also developed a technical plan for the watchlist, in collaboration with many other Wikimedia developers. The team is currently making database changes that are required before we start building the cross-wiki watchlists. This project will spill over past the end of this year; we'll still be working on it in early 2017. You can follow our progress, ask questions, and offer suggestions, on the project page, or the project's Phabricator ticket.
Improve edit conflict handling
Wish #1 on the German-speaking Community's Technical Wishes Survey is to improve the current way of resolving edit conflicts.
The current edit conflict solution is highly confusing and people wish for a way to better understand and solve the conflict within the edit conflict view. The focus of the WMDE team therefore is to improve the UI.
The wish was picked up by the WMDE team in May 2016. The team started with a community feedback round to get a better understanding what exactly bothers people on the edit conflict view. The results of the discussions (in German) can be found here.
After multiple iteration rounds with the German-speaking community, at Wikimania 2016, and with a testable prototype with the international and German-speaking community, the WMDE team is now ready to start the implementation phase. Further information can be found here.
Wish #29 on Community Tech's Wishlist Survey is to improve the anti-vandalism tools for admins and CheckUsers.
The Community Tech team is currently working on a couple of new blocking tools to keep vandals and spammers off Wikimedia wikis. This is one of the projects that we've worked on this year in an effort to help smaller user groups – in this case, admins and CheckUsers.
As of October, we're working on sending a cookie with each block (ticket T5233), and we're investigating adding the ability to search by user agent on the CheckUser interface (T146837). We'll be working on more anti-vandalism tools this fall, and into the beginning of 2017, based on consultation with current CheckUsers.
Tables in PDFs
Wish #9 of the German-speaking community's Technical Wishes Survey was adding tables to PDFs.
When you download an article as a PDF, all tables of the article are omitted. This is what the wish of the German-speaking community wants to change.
Currently, PDFs are generated through Latex. Latex does not handle tables well that are too big for the paper size. Therefore, up to now, it cannot be guaranteed that a table in a PDF does not destroy the layout, or, in the worst case, prohibit the rendering of the whole PDF. This is why tables have been omitted so far.
As a first step, the WMDE team added a notice about the omission of PDFs to the article PDF download page. As an outcome of discussions with WMF colleagues at the wm2016:Hackathon the team then came up with a proposal to allow tables in PDFs through a browser-based rendering service (Electron PDF) provided by the WMF. This will be an additional link on the page that appears when you click on "download as PDF" for a single article. The resulting PDF will look similar to the website, i.e. be complete, but not as nicely designed as the PDF version that you can currently print. Users will then have the option to choose between a single column layout including all the tables (Electron PDF) and a two column layout without tables (Latex). The WMDE team has started with the implementation phase in September 2016.
Show text changes when moving text chunks
Wish #2 of the German-speaking Technical Wishes Survey in 2015 is to show text changes within a text chunk that's been moved on the page.
Currently, it is very hard to find a text change within a moved text chunk when looking at the diff view. This wish is about making text changes within a moved text chunk visible.
As of September 2016, the WMDE team has started to work on a first prototype.
There are challenges, which we're currently investigating. Diff code is highly complicated and extremely challenging to work on. Performance might be a major issue, too. In addition, the implementation of the change needs to be done in both C++ and PHP programming language. Due to open questions on performance etc., it is not clear yet if this wish can actually be implemented. As a first step, the WMDE team has started to work on a prototype written in C++. Once the prototype is done, the next steps will be: Investigation of performance issues, investigation on whether the C++ prototype could be implemented in a similar way in PHP, too.
This is a wish on the German Technical Wishes Survey 2013 and #12 on Community Tech's Wishlist Survey.
The Watchlist expiry project offers a way for users to add a page to their watchlist that will automatically expire after a day, a week, a month or a custom time limit. From January to April, the WMDE team has refactored the watched item code, a prerequisite to adding the watchlist expiry feature. More details about the refactoring can be found here. Since one change of the refactoring has to be applied to all existing wikis, the WMDE team is currently waiting for the change to be made on all wikis. You can see more details about the realization of the wish in this ticket.
Programs & Events Dashboard
Community Tech is also currently making some changes to the Programs & Events Dashboard, which is used by campaign and program organizers to manage editathons, writing contests, workshops and education programs. The tool is adapted from the WikiEducation Foundation's dashboard, with the help of Sage Ross, who built the original tool. The dashboard helps program organizers to sign up participants, assign articles and collect stats on the impact of the program's work.
Community Tech is adding the ability to create large-scale campaigns on the dashboard, and create programs associated with those campaigns. This is one of the projects that we've worked on this year in an effort to help smaller user groups – in this case, program organizers.
Central repository for gadgets, templates and Lua modules
This is #3 on Community Tech's Wishlist Survey.
Like cross-wiki watchlists, this is a fairly large project that requires some substantial back-end changes to MediaWiki. The main new feature that is required to support this request is implementation of shadow namespaces (phabricator:T91162). As a first step, Legoktm has been working on unifying the existing global user pages and foreignfilerepo features (both of which dynamically pull in content from other wikis). Once that is complete, it can be used as the basis for an abstracted shadow namespaces feature.
Most of the progress so far has been on reducing duplication and technical debt, as well as identifying and fixing performance issues. More details can be found in the shadow namespaces roadmap.
Allow categories in Commons in all languages
Wish #6 on Community Tech's Wishlist Survey.
As discussed in the last status update, this project is dependent on having structured data on Commons, which is currently being worked on by the WMDE Wikidata team. Since the last update, the Wikidata team has released a prototype on Labs and has continued work on multi-content revisions (phabricator:T107595) as well as modifying Wikibase to be able to act as both a data client and a data repository simultaneously (phabricator:T76007).
Global cross-wiki talk page
Wish #8 on Community Tech's Wishlist Survey is for a global cross-wiki talk page, which would be accessible and editable on any wiki, but would synch up across all of the wikis where a user is active.
Community Tech had to decline this wish. There's a full explanation of our analysis and rationale on the project page; here's the short version.
The proposal was to build a single global user talk page that would replace all of a user's talk pages on every Wikimedia project, so that messages could be written and read on any wiki. It's easy to imagine a simple version of this idea: one central wiki page would be the "master page", and on other wikis, you would see a mirror of that central page. But the talk page would have to keep track of where a message was posted, in order to show where the message comes from and provide proper interwiki links. That would require a structured discussion system, that can mark specific chunks of text as "a message" and "a conversation".
There have been two projects in the past to build a structured messaging system on Wikimedia wikis – LiquidThreads and Flow. Flow already has most of the structure that we’d need for this project. It doesn’t currently keep track of the wiki where a message originated, but that functionality could be added.
If we were to take on building a global cross-wiki user talk page, we would have to build it on top of the existing Flow structure. Modifying Flow to become a cross-wiki talk page would be much less work than trying to create a new structured messaging system from scratch. However, since the Wishlist Survey, the Collaboration team has released cross-wiki notifications, and Community Tech is currently working on a cross-wiki watchlist, so the need for a single cross-wiki talk page is not as urgent as it was at the time of the survey. We're declining this wish as being too difficult and controversial for the value that the feature would provide.
See Community Tech/Global cross-wiki talk page for more details.
Add a user watchlist
Wish #10 on Community Tech's Wishlist Survey is for a user watchlist, which would allow users to track other people's edits on a special watchlist.
Community Tech had to decline this wish. There's a full explanation of our analysis and rationale on the project page; here's the short version.
Essentially, there are three main uses for a user watchlist:
- Keep track of editors that you know have vandalized pages or violated the rules, to make sure they're not causing more trouble.
- Help mentors and trainers to keep track of the participants in their program.
- (Bad faith:) Keep track of editors that you don't like, so that every time they make an edit, you can jump in and revert their edit or argue with them. This is hounding, which is against our policies.
During the Community Wishlist Survey, the people who proposed and supported this idea were supporting #1 and #2. There's no doubt they would use it in good faith. But there were also editors in that discussion who noted that it could be used to facilitate stalking and hounding. As we’ve investigated this topic, we’ve met more and more concern about potential abuse. This was enough that we wanted to consult the Wikimedia Foundation Support and Safety team, who have a lot of experience when it comes to harassment on Wikimedia wikis. They recommended to us in no uncertain terms not to proceed.
It's true that the information that would be collected in this tool is already publicly available in Special:Contributions, so it's possible to argue that there's no real harm in having a tool that just makes stalking more convenient. Still, that's the same reason why people want the good faith version – it's easier to manage this as a watchlist, rather than looking at people's contributions – so it doesn't tip the scale either way.
People have offered several suggestions to reduce the likelihood that the user watchlist would be used in bad faith; those are discussed in detail on the project page.
See Community Tech/Add a user watchlist for more details.
There were 107 proposals on the international Wishlist Survey, and the Community Tech team is responsible for addressing the top 10. For all the other wishes from 11 on down, the Technical Collaboration team has been coordinating with volunteer developers from around the world, encouraging skilled people to work on some of these top community wishes – together at Hackathons, or as individual projects.
The table below summarizes some of the work being done on items from the Community Wishlist Survey. Follow the links to Phabricator tickets for more information about each project.