Jump to content

Community Wishlist Survey 2023/Miscellaneous/Improve PagePile or create new tool for article lists for communication metrics

From Meta, a Wikimedia project coordination wiki

Improve PagePile or create new tool for article lists for communication metrics

  • Problem: Wikipedia lacks communication industry analytics, which is major barrier to organizational partnerships. Tools such as Magnus Manske's PagePile that allow creating arbitrary article lists which can then be fed into analytics tools such as Massviews, but there are limitations.
  • Proposed solution: Start by making a better "article list" generator which feeds into analytics tools, or improve the existing PagePile tool. The article list tool should have multilingual and cross-wiki support, and the ability to edit the list.
  • Who would benefit: Museum partners, communication professionals, Wikimedians in Residence, university collaborators
  • More comments: Check out this open source software traffic report demo for English Wikipedia which uses Massviews Analysis and the PagePile tool. Ideally we could do this without the use of two tools, or at least improve PagePile so that we can change the list
  • Phabricator tickets: phab:T231891
  • Proposer: Bluerasberry (talk) 19:19, 26 January 2023 (UTC)[reply]


Massviews Analysis of software pageviews is one application of an article list generator. If we had Gulp as a reliable tool, we could develop other reports for editing, image or citation use, multilingual support, or just better list management.

As stated in the nomination, to understand this proposal check out this open source software traffic report demo for English Wikipedia. Here is what is happening in this demo:

  1. A human user is interested in any arbitrary set of Wikipedia articles in any languages. They want to know which ones are popular and underdeveloped so that if they improve those articles, then they can make the biggest impact with the least effort.
  2. The user creates a list of those articles using the PagePile tool.
  3. The user puts the PagePile into Massviews Analysis, documented at Pageviews Analysis.
  4. The output is a traffic report for those articles.

The problem to address is that all organizations in the world with a communication department are making major money and resource investments in new media platforms, but Wikipedia is not the target of these investments. For marketing and promotion the investment is unwanted anyway, but it is problematic that Wikipedia is undesirable as a partner to universities, museums, research institutes, and government agencies which provide general information such as census, culture, demographics, and geography. When an organization has general reference information to share, they should recognize Wikipedia as a good communication option, but instead their investment too often goes in to Facebook, Twitter, YouTube, TikTok, Instagram, or their own websites. Wikipedia is worthy of being a communication partner! Every university in the world has staff who manage social media accounts - at least some of these could be persuaded to develop Wikipedia a little.

There are a few reasons why this does not happen, but one major difference between Wikipedia and all the other platforms is that only Wikipedia fails to provide an obvious entry point for communication professionals. If an institution wants to hire a professional social media manager, then that manager will immediately begin tracking dashboard metrics of how many people read their posts, which posts get user engagement, and where their efforts have the most impact. Wikipedia is not social media, and there are differences, but at least we need to support communication professionals with some measurable feedback. For example, a health organization may want to consider all the Wikipedia articles related to a medical condition to identify the most popular articles as targets for editing, and metrics could for example reduce a possible 5,000 articles on cancer to only the most popular 100 articles. We have 1000s of climate change articles, but the top 10% of them probably get more traffic than the other 90% put together, and identifying those popular ones is difficult. Wikimedia categories do not work for this because they include unwanted topics and exclude wanted topics; organizations simply must make their own lists.

en:Magnus Manske made https://pagepile.toolforge.org/ years ago and proposed Gulp as an update in 2019. I am proposing now that Gulp be the plan for this wish. This is a tool which creates lists of Wikipedia articles which a user can input into other tools to output metrics.

When such a list generation tool works and is reliable, then I expect there will be demand for many other kinds of analytics tools, like aspects of the XTools editing reports for arbitrary lists of Wikipedia articles, or future applications like identifying which articles have translated versions, multimedia from Commons, references, and all the other metrics which Wikimedians check and encourage as part of content development and quality control. Another desired feature is new ways of list generation. PagePile methods include manually inputting Wikipedia article titles or Wikidata identifiers. Running Wikidata queries is possible too. I also want to be able to input a Wikimedia URL and get a list with all the articles linked or Q ids linked on that page. If we went with this, I can imagine 100 feature requests for analytics across languages, Wikimedia projects, and data visualizations. This can start small and grow as needed. It would be a step toward major external investment in Wikimedia development from our partners. Bluerasberry (talk) 19:19, 26 January 2023 (UTC)[reply]

@Bluerasberry It sounds like the primary pain point is with list building, not with collection of metrics. Is that correct? I'm wondering if this proposal could be rewritten to basically be more about "make a better PagePile". "Gulp" is not a well-known name and the solution we come up with may differ considerably from Gulp, so for better voting chances I recommend renaming the proposal as well. Let me know if you need any help. Thanks, MusikAnimal (WMF) (talk) 20:37, 30 January 2023 (UTC)[reply]
@MusikAnimal (WMF):
  • Yes this proposal is for list building
  • A list builder is a means and not the end. I want the metrics and not a tool for organizing metrics, but since wishes have to be small, I am starting with the list builder.
  • If you or anyone you know care to revise the text then feel free. I would estimate that pagepile has fewer than 50 regular users but Magnus has 1000 fans who know his name and support his proposals on name recognition. To me calling a proposal a Magnus proposal seems like better promotion than the pagepile brand name. If you think there is confusion here, I think I would get the most votes by emphasizing Magnus' name.
  • Also an aside - I think that all the Pageviews Analysis variations (Massviews, Topviews, etc) should all get rebranded to one single name because documenting use of all these similar branded products by different names is a challenge. Ideally we could also get rid of the branding for Pagepile and Gulp too, and call everything something like "Wikimedia metrics dashboard" like Twitter does, with one branded product including 100 unbranded features. To me the important part of the tools is not their function, but the end use that they all inform communication and media professionals. I have thought about documenting the tool at https://www.protocols.io/ so that other academics could start using the tool while citing a published methodology. Bluerasberry (talk) 21:57, 30 January 2023 (UTC)[reply]
@MusikAnimal (WMF): Read the changes, love them all, this expresses what I want. 👍 Bluerasberry (talk) 17:37, 3 February 2023 (UTC)[reply]
Great! Approving now :) MusikAnimal (WMF) (talk) 18:17, 3 February 2023 (UTC)[reply]