Community Tech/Pageview stats tool

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T120497

The Pageviews Analysis tool presents accessible and up-to-date pageview stats for all articles. This project is complete; for documentation and further discussion, see Pageviews Analysis.

The Community Tech team is supporting User:MusikAnimal's work developing the Pageviews Analysis tool.

Rationale[edit]

stats.grok.se

We've long been dependent on stats.grok.se, which Henrik has been kind enough to host for years, but as he hasn't had the time to keep developing the tool, which still states that is very much a beta service that could disappear at any time, it is less reliable than what the communities want. The Analytics team has recently developed a pageview API that could be helpful. This was one of the top items on the 2015 Community Wishlist Survey, with 70 support votes.

Contributors would like to see pageview stats for several reasons: to decide which pages are most important to update, to measure the impact of their work as individuals or groups, to see changes in a page's popularity over time. They are looking for the ability to look up the pageview stats on a page they're interested in, both recently and historically.

Technical discussions and background[edit]

Possible feature list[edit]

This list of features comes from our investigation and conversations. We haven't determined the exact definition of done yet; it's still evolving as we figure out what's possible.

Basic features

  • Uses dashiki to display graphs.
  • Custom date-range filter.
  • Ability to view stats by day, week, month, year

Nice to have features

  • i18n for the tool
  • It would be GREAT if it can display historical data as well (before 2015).
  • Ability to switch between different kind of graphs
  • Ability to get Top X viewed pages by namespace on a wiki
  • Ability to compare different pages (over different wikis or same)

From discussion with CE folks:

  • Ability to compare view vs edit stats
  • Ability to see cumulative stats for a page for all the languages it exists in
  • Ability to see cumulative stats for a page and its subpages
  • Ability to see page views by category (http://tools.wmflabs.org/glamtools/treeviews/)
  • Top 10/100 most-viewed/most edited articles and similar fun stats (like this and this)
  • Compatibility with PagePile (http://tools.wmflabs.org/pagepile/)
  • Ability to differentiate stats between WMF staffers and other users.
  • Ability to view redirect traffic stats separate from the article traffic stats.

Community request

  • Given a list of Wikipedia articles and a language, present pageviews for each of them over any give set of months
  • Stats should link to a Meta-Wiki page where anyone can discuss the stats tool and make documentation

Status[edit]

This is a slightly less technical overview. For more details, please see our meeting notes and the Phabricator task.

2016[edit]

March 22[edit]

One issue that's come up is that the tool is sometimes blocked by ad blockers, partly because they see "pageviews" and "viewcounts" in the URLs as ad-serving or as privacy violations. We've pinged the maintainers of EasyPrivacy, a popular ad blocker tool, to see if we can get unblocked. We're investigating some other workarounds right now, including moving the URLs, or posting a notice to people who can't see the tool properly, encouraging them to update their ad blocker. (T128974)

March 7[edit]

The Pageviews Analysis link has been added to MediaWiki:Pageinfo-footer on English WP, adding a link at the bottom of every en.wp info page. (Example) Once we get translations up and running, the link will be added on other language WPs.

March 3[edit]

Some more things to improve on the pageview stats tool: Add i18n support (T128103), Register with Translatewiki (T128768), Add language selector (T128770), Add option to show stats for redirects (T128495).

February 23[edit]

Thanks to the Graph extension and Pageview API, we now have {{Graph:PageViews}} templates that can show pageviews for any wiki page for any of the wikis. See the examples below, as well as US politics pageviews.

Community Tech is going to work with Yurik on adding some more functionality.

February 17[edit]

MusikAnimal is working on a version of Marcel's pageview stats visualization: Pageviews Analysis. We've been talking with him for the last few days, and we're going to help with one of the items on his Issues list -- Improving the CSV output format. T127143

We're also working on making MusikAnimal's tool multi-lingual. T127356

January 26[edit]

Over the next couple months, a group of 8 students in Sweden will be working on the new Pageview stats tool, with support from Jan Ainali from Wikimedia Sverige. WMF's Community Tech and Analytics teams are supporting the students' work.

Niharika did the preliminary investigation on the tool, gathering requirements from Analytics and Community Engagement, which is on phab:T121732 and summarized below.

The students working on the project will have access to Tool Labs and Gerrit, and they'll work in Phabricator as much as they can. They'll also have a Trello board.

The students will be working independently, and Community Tech is happy to help with code review, or questions along the way. As they get close to finishing the project, we can help get it onto Tool Labs, or whatever still needs to get done so that everyone can use the new tool.

Here's the results of Niharika's investigation:

A look at stats.grok.se

  • Pros: Simple to use & straightforward UI, provides json stats dumps (such as http://stats.grok.se/json/en/200910/Michael_Jackson) for people who want to fetch data programmatically
  • Cons: Does not provide mobile view stats, does not provide stats for sister projects (although these stats are included in the raw data dumps (discussion), doesn't allow custom date range filtering, cannot compare page views for multiple pages, does not take into account redirect hit statistics

Possible solutions to this problem

  • Patching stats.grok.se and/or wikiviewstats:
    • Pros: Already established and widely liked tools
    • Cons: Overhead of working with legacy code, almost everything would need to be changed (down to the api being called for data), dashiki is the new preferred way of doing these stats as it makes it easy to embed these graphs in other tools - Analytics is planning to rebuild stats.wikimedia.org and would likely want to embed these page view stats in there
  • Having an extension with a Special page:
    • Pros: On-wiki data as preferred by a lot of the community
    • Cons: It would limit us displaying stats on per-wiki basis, extra overhead of having the extension deployed on every wiki.
  • Creating a new tool on Labs:
    • Pros: Ability to experiment with UI and features, ability to use dashiki
    • Cons: Will need to work from scratch

After weighing all pros and cons and discussions with Analytics folks, creating a new tool feels like the best decision.

January 20[edit]

There are currently several teams looking at this: Community Tech and Analytics from the Wikimedia Foundation, the TCB team (Wikimedia Deutschland) and a team of students supervised by Jan Ainali from Wikimedia Sverige. We're currently trying to answer the following questions:

  • How are we to work together on this? Should some developers who had planned to look at pageview stats work on something else?
  • Where should we put it? We could add the stats to the information pages, but almost no one knows they exist. We could create a special page where you enter the article names you want to look up. Should it be an extension?
  • What information should be included and how should it look? We need to do mockups. Can we get the Design team to help out?

Timeline[edit]

Too early to say anything yet, but when we have a good estimation, we'll put it here.

Graph extension[edit]

Examples of the {{Graph:PageViews}} templates that can show pageviews for any wiki page for any of the wikis. See also US politics pageviews.

{{Graph:PageViews}}
30 days for the current page
{{Graph:PageViews | 90 | Main Page | en.wikipedia.org }}
90 days for Main Page on English Wikipedia

Initial Community Tech team assessment[edit]

Support: High. Unanimous support votes.
Impact: High. This tool will help programs demonstrate impact, and will help researchers. Stats.grok.se goes down regularly, and is not reliable.
Feasibility: High (relatively easy, compared to other projects). Analytics has the pageview API, so it should mostly be front-end work. Could be implemented either on Labs or as a Wikimedia-specific extension. Labs implementation would be easier and faster. It needs front-end spec and design, iterations with community input.
Risk: Low. Just need to make sure it’s reliable and scalable so it doesn’t flood the API.