Community Tech/Page Curation and New Pages Feed improvements

From Meta, a Wikimedia project coordination wiki

This page documents a project the Wikimedia Foundation's Community Tech team has worked on or declined in the past. Technical work on this project is complete.

We invite you to join the discussion on the talk page.


Background[edit]

This project was proposed in the Community Wishlist 2019 and was voted #1 with 157 votes. Community Tech team has committed to addressing as many of the project goals as possible.
In 2018, the Community Tech and Growth teams worked on a project aimed at general improvement of the AfC process. For more information, see the project page and background research for the project.

Problem statement[edit]

The wishlist proposal presents a broad goal of improving the New Page Review process and enlists key phabricator tickets that are important for the project. These tickets were prioritized and deemed important by the NPP community. These are listed below (in no special order).

Task title Phabricator link Notes
Redirects with RfD tags should still display in the New Pages Feed as 'Nominated for deletion' task T157046 Done
'Potential Issues' flagged in Page Curation Toolbar Page Info flyout task T207847 Done
Allow filtering by no citations in page curation task T169120 Done
Send Message to creator without needing to 'unreview'/'re-review' the article task T207442 Done
Page curation adds text to first deletion discussion page if it already exists task T169441 Done
Implement addition of un-redirected pages to Special:NewPages and Special:NewPagesFeed task T92621 NoN Declined after analysis (details below)
Redirects converted into articles should appear in the New Pages Feed indexed by the date of creation and creator of the article, not of the redirect task T157048 NoN Declined after analysis (details below)
Adding a "Potential COI" alert to the feed task T207757 NoN Declined. Solution proposed in T233115. No consensus reached.
Add "previously deleted" as a possible issue (flagged in red) in the New Pages Feed/Page Curation Tool task T189929 Done
Allow filtering by date range in Special:NewPagesFeed task T167475 Done
Special:NewPageFeed - add option to filter by pageviews task T207238 NoN Declined. Original proposal was out of scope. Alternative proposed in T230567. No consensus reached.
Keyword Search for New Pages Feed task T207761 NoN Declined after analysis (details below)
Enable page curation tools to be loaded on any page (optionally) task T207485 Done
Reviewer Notes system in Page Curation Tools: system for reviewers to flag talk page comments on new pages to other reviewers task T207452 Done
Tagging Feedback in Page Curation Tools should also be sent to talk page task T207443 Done
Page Curation Tools to add userspace CSD Log/PROD Log functionality task T207237 Done
Dragable Corners on Page Curation toolbar windows (for resizing) task T207439 Done
Page Curation toolbar: do not mark pages as 'reviewed' when adding CSD and PROD tags task T208685 Done
Make PageTriage wiki agnostic task T50552 NoN Declined after analysis (details below)

Status updates[edit]

December 17, 2019[edit]

Hello, everyone! We have an important update for the Page Curation community: The Page Curation Improvements project is now complete, after 7+ months of work. As of today, we have addressed all 19 requests (with explanations below). This project was a huge endeavor! We released 17 changes that fulfilled 13 separate requests. Additionally, we dedicated substantial time and resources to this project, and we collaborated with many community members.

For this project, our goal was to address all 19 requests. This meant that, whenever possible, we tried our hardest to fulfill each wish, which required: early investigations and mockups, finalization of the requirements (i.e., technical, product, and design), implementation of the work by engineers, technical review and testing, and release of the changes. This was followed by community outreach in order to validate the changes.

Unfortunately, some wishes were out of scope. They were simply too large or complex, so they were inappropriate for our team. However, we still wanted to address these wishes. This meant that we shared technical analyses of each wish, which outlined the primary challenges that we faced. We also tried to propose an alternative approach that was technically feasible, whenever possible, to the Page Curation community.

In total, we dedicated considerable effort to each request. We have shared the details of the remaining work below, and we thank you for all of your feedback!

Recently Completed Work:[edit]

Work That Presented Challenges:[edit]

  • T207757: Adding a Potential COI Alert: As a team, we wanted to find a way to make this work, so we communicated a proposed solution, which we shared on Phabricator, Meta-Wiki, and Wikipedia. This proposal came out of careful discussion, (between engineers, product, and UX/design) about how we could implement a manageable solution, after first broadly discussing the request on Phabricator. However, this proposal never reached a general consensus among the Page Curation community, with some users expressing ambivalence regarding its usefulness. We didn’t want to continue with the work unless we felt there was strong community support, so we did not proceed with this request.
  • T207238: Special:NewPageFeed - add option to filter by pageviews: The original request was too large in scope, and we outlined the reasons (from an engineering standpoint) in the August 20th update. At that time, we also presented an alternative proposal: Display the number of pageviews in the article record, without allowing for sorting or filtering. This alternative was shared on Phabricator, Meta-Wiki, and Wikipedia, and a ticket was written to potentially take on this work (T230567: Display Number of Pageviews in New Pages Feed). Like the COI alert alternative, this proposal came out of careful discussion, (between engineers, product, and UX/design) about how we could implement a manageable solution. However, this proposal never reached a general consensus among the Page Curation community, with some users expressing ambivalence regarding its usefulness. We didn’t want to continue with the work unless we felt strong community support, so we were unable to proceed with this request.
  • T207761: Keyword Search for New Pages Feed: This work was ruled as out of scope, after analysis from the team.
    • Explanation from engineers: Like all NPP requests, we hoped that we could make this work, but it’s unfortunately beyond the scope of the team. Keyword search is an extremely volatile operation with significant performance implications. To search through the entire content of a revision for each page, stored in PageTriage, is unmanageable (in terms of performance). This is because it requires combining data from two or more MediaWiki tables into the PageTriage table and searching on the combined data. Moreover, searching through a database depends on fields that are “indexable.” Searching through text fields is an extremely heavy operation; search within MediaWiki is not done through the text fields, but rather through other systems like CirrusSearch, which have internal mechanisms to allow users to search for text content. This cannot be easily used with PageTriage pages. Even if we only search through the snippet of content that PageTriage stores, this means the database operation will need to sift through tens of thousands of rows, looking into text-based content, which is a significant performance challenge, and cannot be easily implemented.
  • T92621: Implement addition of un-redirected pages to Special:NewPages and Special:NewPagesFeed: This work was ruled as out of scope, after analysis from the team. In addition, an engineer conducted an investigation of the work to see if it was feasible (findings can be found in the original ticket), though it turned out to be out of scope.
    • Here’s an explanation from our engineers: This work is complicated, given PageTriage's current architecture. It can be broken into two parts: 1) storing a hash of a reviewed revision at the time at which it's marked reviewed; and 2) when a page in the queue is edited, checking to see if the new revision's hash matches the already-reviewed one and if it does then mark it reviewed. The first part seems reasonably solid and would only add complexity to the review process, although there looks likely to be an issue around wanting to have the same review details (reviewer, date, etc.) as the earlier review. It's the second part that is more complicated and may have performance implications. Because of the way things are processed and the structure of PageTriage, we don't have access to some of the required data in the required places, and so would have to query it even when it's not strictly required. For more details, please refer to notes in the Phabricator ticket.

Overall, this work is now complete. We’re thrilled that we had the opportunity to improve Page Curation for the English Wikipedia community. We learned a great deal about the important work you all do, and we’re glad that we could address all 19 requests. It is our hope that Page Curation is now a more flexible and manageable process. Thank you all for your help during this process, and we look forward to seeing how Page Curation continues to evolve in the future.

August 20, 2019[edit]

It’s been a few months, and we’re excited to post some updates. We’ve been continuously working on Page Curation & New Pages Feed improvements, and the team has made solid progress. With that in mind, we’ll share some recent news ⁠— both highlights and challenges — at this stage in the process. We look forward to your feedback on the Talk page. Thank you!

Work We’ve Completed So Far[edit]

We’ve been updating the project page table with the requests marked as “Done.” Before this update, we had 5 requests completed (T189929, T169120,  T207439, T208685, and T157046). We now have a few more completed items:

Work That is Almost Complete[edit]

Work that Presents Challenges[edit]

  • T50552: Make PageTriage wiki agnostic: We’ve discussed this request, and we unanimously feel that it’s beyond our scope. Here’s why (according to analysis from the engineering team): PageTriage, while a useful extension, is written in a way that’s completely based on English Wikipedia processes. In order to convert the extension to work on other wikis, the extension would need to be adjusted — not only for other processes, but also to have a configurable process definition that each wiki could define for itself, based on each community’s needs. Consequently, this request would require a slew of analyses and decisions, such as: what it means to tag an article for deletion (e.g. what pages messages goes to, what templates are used, if there are follow-ups the system should be aware of, etc), the way we tag articles, which articles show up in the queue, and more. Moreover, we couldn’t easily trim down the scope by disabling some features. The internal workings of the extension are deeply intertwined with English Wikipedia. We would still need to do a significant amount of development work to ensure that the behavior remained stable and useful to other wikis. For these reasons, this request is unfortunately too big, so we cannot take it.
  • T207238: Special:NewPageFeed - add option to filter by pageviews and the associated spike: T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested): This work presents significant challenges, but there may be an alternative solution.
    • First, the challenges (according to analysis from the engineering team): In order to filter/sort by inputted numbers, the numbers must be stored in the database in a specific manner. This first step alone would take several weeks, if not months, according to the estimates provided by Wikimedia database experts. Then, we would need to populate the sortable cells with pageview data, which comes from an external service. To do this, we would need to create a process that pulls the data from the external service and stores it in MediaWiki’s PageTriage table. Then, we would do this work repeatedly, so that the numbers would remain up-to-date, over the entire PageTriage database (which consists of tens of thousands of rows, if not more). This process is both uncommon (in MediaWiki servers) and complex; we would need to define this process and identify the correct way to implement it, in collaboration with Operations and Database experts. In total, we do not find the request, in its current form, within our scope. For more details on the technical analysis and discussion with the database administrators, you can check out the associated investigation ticket.
    • Second, the alternative solution (as described in the T225169 investigation): We could display the number of pageviews in the article record, without allowing for sorting or filtering. Would this be a satisfactory alternative to the community? And, if so, how would you like the number of pageviews displayed (e.g. average per day, median per day, total views in the last 30 days, etc)? Note that the results displayed will be from 24 hours earlier than the display time, and we’ll want to query from a maximum of 30 days ago (for the sake of general efficiency and manageability of this feature). We do not yet know if we can do this work — but, if we could, would it be worth our time and effort, in your opinion?

Requesting Feedback[edit]

We want to know your thoughts. Please let us know your thoughts on the Talk page. Thank you!

April 30, 2019[edit]

The Community Tech team has kicked off development work on this project. You may follow progress on the project tickets by looking at the phabricator board. I will also be updating the ticket status in the table above as things progress.

5 February, 2019[edit]

This project is in its early stages of research to investigate project goals, dependencies and potential roadblocks. Your feedback is welcome on the talk page.

12 March, 2019[edit]

We are beginning to assess technical feasibility of tickets prioritised in the wishlist proposal. The technical work on this project is slated to start in late April/early May.

Important links[edit]