Wikipedia Administrative Pages Analytics

From Meta, a Wikimedia project coordination wiki
Wikipedia Administrative Pages Analytics

A project to measure and improve maintenance and inclusion in Wikipedia administrative pages across Wikipedia language editions.

Shortcut:
WAPA
Research Resources
Browse the administrative pages from your Wikipedia language edition. Find out which page deserves your attention.

Wikipedia Administrative Pages Analytics (WAPA) is a space to analyze, understand and make recommendations and tools to improve the inclusion and maintenance of Administrative Pages across Wikipedia language editions.

WAPA is an experimental research project that aims at creating a complete set of tools to understand admin. pages and their characteristics, how they are created, and find specific actions to do to improve on them.

Mission and vision[edit]

This is the project philosophy. We believe we must respect the disorganized way admin pages grow, but use tools to ensure that the most important ones are taken care of.

The Wikipedia Administrative Pages Analytics mission is to raise awareness on the current state of development, maintenance, participation, and inclusion in Administrative Pages, these are all those that are used by Wikipedians to coordinate and govern Wikipedia, ranging from policies and guidelines, help pages, essays, among others.

Administrative pages grow organically like Wikipedia content, and we believe the more awareness of their state of development and inclusiveness in the processes of creation, the more we will be able to improve on them.

Administrative pages and, in particular, Policies and help pages are essential to newcomers’ onboarding to Wikipedia. We aspire to a community that is able to continually revise these pages to make them usable to facilitate their learning. This space is a call to the research community to study them and provide tools for Wikimedians to be able to analyze them.

We envision a Wikimedia community that regularly consults analytical tools to make informed decisions on which pages to create, improve or delete and, also, to modify the course of current processes in order to be inclusive to all Wikimedians. We aspire to have more level-headed decisions based on discussions grounded on data so that we can continue improving rather than fall trap into personal and collective biases.


Admin pages need analytics

Admin Pages on Wikipedia are not paid the necessary attention (a) by the academic research community, (b) and from a product or operations analytics perspective.

Wikipedia admin pages grow in an organic and unplanned way. We believe in the usefulness of stats in order to give the communities the capacity to understand their current state from a more global perspective.

Admin pages are the forgotten “knowledge gap”. Differently than content gaps, editors, or readers gaps, we know little about the admin pages, and how each Wikipedia contextualizes them. We certainly know less about the gaps, the missing pages, or the outdated content. Yet, they are indicative of many aspects in relation to community and content development.

This is the first research project to quantify and analyze administrative pages. While in the current iteration we are testing the ideas and only tackling Wikipedia, we believe it can be expanded to Commons, Wikidata, and all the sister projects. The project is making exploratory efforts and setting the questions that can help us in improving the administrative pages in future iterations of this project or others.

Activities and Types of Administrative Pages[edit]

Wikipedia (ns4) and Help (ns12) constitute the group of admin pages. In them, there are many different types of pages. These are the most important ones.

This project primarily analyzes the distribution of Administrative Pages among different categories or topics. We define broadly administrative pages as those in namespace 4 (Wikipedia). Thus, we want to analyze the coverage of the different topics within the administrative pages, i.e., we want to classify them into different types.

Then, we create a dataset to characterize each page, and ultimately, make visualizations and tools to be able to make some decisions on which pages or groups of pages to work on.

We selected 8 main categories we consider representative of the types of administrative pages. They exist in English Wikipedia and in many other language editions, they happen to be general and comprehensive of the whole set of administrative pages, and they are intentional as they represent key aspects of Wikipedia.

  1. Policies and Guidelines
  2. Help
  3. Essays
  4. Village pump
  5. Wikiprojects
  6. Tools
  7. Disclaimers
  8. Copyright


Research Culture is Strategic

Similar to other research projects carried out in the Movement (Wikipedia Diversity Observatory or Community Health Metrics), WAPA does the full circle: dataset generation, data analysis, and prototyping. This means that its focus is on understanding rather than creating an end-product. Instead of focusing on one single solution, we prioritize transparency, and we encourage collaboration in developing this research area we believe is very important.

The Wikipedia Administrative Pages Analytics (WAPA) is a project that completely aligns with recommendations nº7 “Manage Internal Knowledge” and nº10 "Evaluate, Iterate, and Adapt" of the Wikimedia Strategy 2030. In particular, it wants to “encourage the growth and maintenance of the knowledge-base” by means of analytical tools.

Project Principles[edit]

The principles of the project are efficiency, maintenance and inclusion. A good way to remember them is to think about these classic mottos we tweaked for the project.

The tools and visualizations are aimed at highlighting specific aspects of the Administrative Pages, following the principles of efficiency, maintenance, and inclusion.

We believe there is always work on improving the administrative pages, at the page level (i.e., paragraph readability, etc.) but also at the graph or Wikipedia level, given that some pages may not have the appropriate visibility they should have. In other words, they do not have incoming links from other pages. We want the analytical tools to help the community to be efficient in this task.

At the same time, we must acknowledge that administrative Pages act as an interface between power members and the rest of the community, between the veterans and newcomers. Therefore, it is a goal to be able to stay inclusive and have these pages edited by everyone.

Measuring how each Wikipedia community includes editors in participating in the creation and edition of admin pages is valuable from the perspective of governance.

We believe that some metrics can reflect the traits of healthy communities, and, likewise, they can also be a leading indicator for a potential case of dysfunction or breakdown.

Project Needs and Research Questions[edit]

Project Needs[edit]

Throughout the last months (April to May, 2022), we interviewed several Wikimedians (editors and affiliate members) from multiple communities (cawiki, eswiki, itwiki, enwiki and arwiki the most) to ask them about admin pages. In particular, we asked the four following questions:

  • How do you know a specific admin page needs work?
  • What are the traits of an important admin page?
  • Who do you think edits admin pages the most?
  • Do you know the maturity of Wikipedia in terms of admin pages?

These conversations allowed us to detect the needs this project must address. At the same time, they were useful to identify the potential measurements that were necessary to make and the gap this project needs to fill.

Project Research Questions[edit]

These are the most important and general research questions the tools aim at answering:

  1. What are the main types of admin pages?
  2. How have the admin pages been created and edited over time?
  3. How has the edition of the admin pages engaged and included different types of editors to participate?
  4. How much do admin pages different in terms of completeness, relevance, popularity, editing regularity, editing conflict, and recency?
  5. Which are the most valuable admin pages that exist in one Wikipedia but do not exist in another one, thus creating a gap?
  6. Which are the admin pages that exist in one Wikipedia but are more complete in another?
  7. Which are the admin pages that present one or more “red flags”, and thus require editors’ attention/maintenance?
  8. Which are the admin pages that are being edited (recent changes) in the last 24 hours and of which type?

For a more extensive list of research questions we can address with analytics, you can look at this other page.

Key Findings[edit]

This is a list of six key findings or take-aways of this project. We recommend watching the video for a better understanding of the findings.

  • Administrative pages are considered the backbone of Wikipedia. An admin page is a type of page that helps Wikipedia achieve its purpose and includes protocols, conventions, help pages, etc. Admin pages are typically in namespace 4 (Wikipedia) or namespace 12 (Help). We classified the admin pages into 6 main types: Policies & Guidelines, Help Pages, Essays, Village Pump, Tools, and Wikiprojects.
  • Admin Pages Analytics classification has been done using namespaces, Wikidata properties, and category graphs. The results showed that only around 5% of the admin pages could be assigned to the types using the three approaches (in a collection of 20 languages including the largest and others with diversity criteria). This does not imply the classification approach is faulty or incomplete, but on the contrary, that most of the admin pages are not properly tagged to be found by other Wikipedians. Admin pages containing interwiki links (existing across languages) were more likely to be labeled and detected by the different our approach.
  • There is very little research on administrative pages and this project aims to fill the knowledge gap and provide a systematic understanding of these pages. To date, very few academic and scientific studies pay attention to admin pages and mostly on policies. There is an opportunity to study admin pages from the perspectives of topical analysis, content biases, quality, readership, participation and equality, and vandalism. These are common research topics in the Wikipedia scholar literature.
  • A significant part of the admin pages are not edited in the recent or distant past. In languages like Arabic, Catalan, Croatian, German, Polish or russian, the percentage of admin pages edited in the past 6 months is around 4 and 10%. Only in English, this grows to be the 40% of the admin pages. There is usually a 15-50% of the admin pages that have not been edited in the past 10 years. This is content that is likely outdated or could be signaled as not current in order not to mislead editors. These results imply that there are potentially valuable under-edited pages that could benefit from more editing.
  • Admin pages are created over the years even though the most relevant ones (i.e., those which exist across languages) were created in the years 2005-2010. Most languages have their highest admin page activity in 2006 (Arabic, Croatian, English, Polish, and Russian). The fact that relevant admin pages are created afterwards and still in the recent years imply that there is an opportunity for exchanging content across languages (in other words, filling the admin pages content gaps).
  • Even new admin pages are created by old editors in most Wikipedia language editions analyzed. A quick computation of the average of the year of the first edit of the editors who created these pages show us that they are mostly created by editors from previous generations. Only in Arabic Wikipedia we see that admin pages created in 2021 were in average from editors who started in 2019, implying that this wiki has an important activity from a renewed community. In other languages, admin pages created in 2021 were from editors who started almost a decade before: Catalan (2011), Croatian (2013), English (2010), German (2012), and Polish (2007). This implies there is a problem of inclusion - newbies or newer editors should feel called to create admin pages.


Academic Research Motivation[edit]

Research culture and awareness[edit]

Wikipedia is the product of a collaborative effort made by thousands of Wikipedians on a daily basis. While we are able to make the project grow piece by piece, we believe that everyone sees the trees, but none is able to see the forest. Wikipedia grows organically, and the community does benefit from research and tools that allow them to see the entire system. We want to build a “culture of collective awareness” built upon research.

Analytical tools counter the narrow view and allow one to step out and have a more panoramic understanding of Wikipedia. This is true for many research areas (e.g., content quality and gaps, community dynamics, etc.). However, little effort has been placed on admin pages, when in fact, they are essential for the development of the end content that readers consume.

Admin Pages on Wikipedia are not paid the necessary attention (a) by the academic research community, (b) and from a product or operations analytics perspective.

These are the admin namespaces that allow editors to work, classify, or set rules on how to content. There exist 5 namespaces and the admin pages relate to two of them: Wikipedia (ns4) and Help (ns12).

You can read more about the research on admin pages.


The forgotten knowledge gap

This is the first research project to quantify and analyze administrative pages. We believe that admin pages are the forgotten “knowledge gap. Differently than content gaps, editors or readers gaps, we know little about the admin pages, how each Wikipedia contextualize them, and we certainly know less about the gaps, the missing pages, or content.

  • How do we know if we have to improve our processes if we do not know how updated they are?
  • How do we know we are building the future of wikipedia, if newcomers never improve on our processes?

Data, Visualizations and Tools[edit]

Data, methodology and technical documentation[edit]

Dataset/Databases: https://wapa.wmcloud.org/databases/

On this page, you can read the methodology we employed to collect the types of admin pages. We also provide technical documentation that allows replicating the current state of the project.


Visualizations[edit]

In this initial iteration of Wikipedia Administration Analytics, we have visualizations that provide a general understanding and tools that show specific pages with clear calls to action.

These are the types of analysis we do:

  • Admin Pages Categorization analysis
  • Temporal analysis
  • List of “valuable” pages selection
  • Group of pages analysis / according to features (Gaps, Maintenance, etc.)
  • Cross-language analysis

The visualizations are not available yet.

Tools[edit]

VIDEO PRESENTATION: Wikimedia CEE Meeting 2022 talk | Session: Wikipedia Administrative Pages Analytics? An Analytical Project To Understand Underedited Pages. (slides and notes PDF | slides | video recording). In this 30 minutes presentation, we explain the project and introduce the tools.

We have created 4 different screening tools. Each of them addresses a different purpose. These are the metrics used by the screening tools. Warning: They are not ready yet!

Warning: They are not ready yet!

Admin Page Gaps[edit]

On this page, you can search for content gaps related to administrative pages in Wikipedia language editions. Gaps are pages that exist in one language edition but are missing in another, or simply that they are more complete.

This dashboard helps you find (for example):

  • Which are the most edited admin pages (policies) in English Wikipedia that do not exist in Catalan Wikipedia?
  • Which are the most popular help pages in French Wikipedia that are more complete in German Wikipedia?
Incomplete Pages[edit]

On this page, you can search for in-article content gaps related to administrative pages in Wikipedia language editions. Search existing articles in one language more developed in other languages (size, outlinks, etc.) (in-page gaps).

This dashboard helps you find (for example):

  • Which are the most viewed policies in French Wikipedia that are underdeveloped and in which languages are they more completed?
Underedited Pages[edit]

On this page, you can screen and find specific administrative pages in Wikipedia language editions based on aspects related to their development, inclusion, and participation level.

Use different metrics and ratios in order to find “red flags” that may indicate that a page is valuable but needs attention. This tool encourages page maintenance and the inclusion of different types of editors in editing admin pages

This dashboard helps you find (for example):

  • Which are the most popular help pages that have not been edited for a long time?
  • Which are the most edited policy pages that are not edited by newcomers?
  • Are Wikipedia Portals mostly outdated?
Recent Changes Monitor[edit]

On this page, you can retrieve the list of Recent Changes in a Wikipedia language edition according to the type of administrative page they belong to and some characteristics.

This dashboard helps you answering questions such as:

  • Which type of admin page that have been edited more in the past hours?
  • Do editors edit the admin pages which are in most need of editing according to some metrics?

Related Projects[edit]

Although there are many analytical efforts in the Movement, we believe it is the first one aimed at having a general understanding of administrative pages and disseminating them across Wikimedia communities.

Yet, there are some efforts in improving the general readability and design of Help pages.

Help Project / Community fellowship:

https://en.wikipedia.org/wiki/Wikipedia:Help_Project/Community_fellowship

The fellowship (concluded) is a community effort to identify issues with the current help pages on the English Wikipedia, coordinate community discussion, and even generate new content and/or designs for a number of key help pages and test them.

Wikipedia: Help Project (English Wikipedia):

https://en.wikipedia.org/wiki/Wikipedia:Help_Project

The Wikipedia: Help Project is an (active) place where editors can meet up, share opinions, and support each other with the mammoth task of helping clean up. Many of the help pages have few, if any, regular maintainers, so having a place to record ideas can ensure that good ideas endure.

Growth Team guideline on how to create help contents:

https://www.mediawiki.org/wiki/Help:Growth/How_to_create_help_contents

This (draft) guide is intended to help experienced users on Wikimedia wikis build good help pages for new users.

WikiProject Policy and Guidelines:

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Policy_and_Guidelines

WikiProject Policy and Guidelines is a (defunct) project to actively improve our policy and guideline pages. There are multiple policies listed and guidelines on what policies/guidelines should have.


Each of these projects addresses one type of administrative page. We see that the most important administrative pages address help pages, followed by policies and guidelines. Maintenance is key for these type of pages.

Although many of these projects have been discontinued, we think that the Wikipedia Administrative Pages Analytics would be suitable to support them with its quantitative-based approach. We hope to be helpful to any future project working in this direction.


Future Objectives for this project and the Wikimedia Movement[edit]

Taking into account the insights from this project and the Wikipedian experience over the years, we can list some objectives that we find reasonable for a better functioning of Wikipedia.

  • Objective 1: Display metrics that measure the impact of Admin Pages on the experience and retention of newcomers, on a public dashboard.

Newcomers come across the pillars and main policies/help pages either proactively (by checking them out) or reactively (by having them exposed to their talk page by another editor after an edit). Past research showed that policies/help pages are often a source of frustration. However, without actual figures of their impact on newcomer retention, it may be hard to iterate on them.

Once this dashboard would be deployed, Wikimedia communities may be able to see on a monthly basis the impact of outdated admin pages. Wikimedia chapters can get organized to work on them more actively and set stretch goals to work against.

These measurements or metrics could take into account the number of edits of a newcomer (to understand retention) and the warnings they have received, if visiting data is not available to researchers. These dashboards could be deployed by the WMF analytics / product teams with support from this particular project (Wikipedia Admin Pages Analytics).

Other valuable measurements over time would be usability tests measuring the time to complete a specific task that requires understanding a policy / help page or simply asking the newcomer to read the policy in order to see its level of readability. Most viewed admin pages should be prioritized. Therefore, this dashboard would include behavioural metrics (impact on retention) as well as attitudinal metrics (usability and readability measurements).

Given that they may not be considered “ready to edit” them, newcomers could give rapid feedback on the admin pages through a purposely deployed tool which could show some questions in a short a survey. This way, editors could see for each admin page questions related to its effectivity, clarity, etc. and act upon them.

Desired time horizon: creating this dashboard could be done within an year (2024) for a few small languages as a pilot, and then be expanded to larger ones (2025).

  • Objective 2: Display metrics that assess the state of inclusion and maintenance for editing Admin Pages on a public dashboard.

Inclusion

Analysis computed for this project taking into account the year-of-first edit of editors of admin pages showed that newer pages from the past years were still created by editors who started editing ten years ago. This means that there is a problem of inclusion - admin pages have an immense power over the rest of editors (especially newcomers), and yet, they are not created or edited by them.

We envision having a dashboard showing the diversity of participation in admin pages (from editing to creation), taking into account features such as tenure (year of first edit), but also aspects related to gender or geography - this may be relevant in languages spread across several countries. Having participation and inclusion metrics on a dashboard (similar to the one proposed in Objective 1) is the next logical step that derives from this project.

The metrics for inclusion should be on a monthly basis. In this project, as it can be seen on the results (presentation/video), the the measurements taken were using an yearly window and possibly a shorter time-frame measurement could better stimulate inclusion in communities.

Maintenance

Likewise, this project analyzed the extent of admin pages edited in different time-frames, showing that in languages like Arabic, Catalan, Croatian, German, Polish or russian, the percentage of admin pages edited in the past 6 months is around 4 and 10%. Only in English, this grows to be the 40% of the admin pages. There is usually a 15-50% of the admin pages that have not been edited in the past 10 years.

The use of these metrics to create a public interactive dashboard could also be helpful for editors to understand what part of the admin pages can be outdated and decide to do a “clean-up” by either updating or removing admin pages. This project has explored this avenue, and once this knowledge is consolidated, it would be the time to create mature product to serve the communities.

Desired time horizon: creating this dashboard could be done within an year (2024), given that there are no blockers and the first round of necessary research has already been conducted.

  • Objective 3: International contests are organized to exchange learnings on the most popular admin pages from each Wikipedia language edition

Currently, there exist a variety of events dedicated to exchange content across language editions. From some which are focused on a single topic (Asian Month) to be spread across languages, to those which encourage the exchange bewteen cultures (CEE Spring for Eastern and Central European languages).

In our analyses we detected the existance of numerous admin pages (policies, help pages, and other categories), which were constantly edited and visited by readers in one language edition, but did not exist in others. This implies there are opportunities for exchanging admin pages content across language editions.

With tools like Admin Page Gaps, it is possible to start identifying which can be more valuable to translate. Creating lists of valuable admin pages could be done manually or thanks to a few modifications to the tool. However, we see the discussions surrounding these pages (whether they are offline, online, synchronously or asynchronously) very beneficial to the communities, given that they can save a lot of trouble either by proactively tackling issues or limiting them.

Desired time horizon: organizing the first contest / event related to admin pages could be done in 2024 after some discussions in current events (e.g., Wikimania).

  • Objective 4: Creation of lists of “essential admin pages” to be included in the Small languages Toolkit and enforced as necessary for a Wikipedia to be functional

It is shocking that policies like Neutral Point of View, so central to the Wikipedia way of regulating content, only exists in 121 Wikiepdia language editions out of the 320 that exist today. This means that the majority of Wikipedia languages grow in content but this does not reflect the different points of view. Even though they are possibly smaller language editions, this implies at least, that they do not serve their readers with quality content.

Without going deep on the analysis of which should pages are more or less essential than Neutral Point of View, we can accept that it may be possible to create a list of essential admin pages that every language edition should have. This list would be the summary of learnings on how Wikipedia works best and it would include policies, help pages, essays, village pump pages, wikiprojects, tools, disclaimers and copyright.

The importance of having a list lies in how easy they can be used. This is the case for “List of articles every Wikipedia should have” that have been proliferated in many languages. And even though every project is sovereign, they all use the MediaWiki technology and Wikipedia brand. Therefore, having a minimal requirements in terms of admin pages seems logical. Obviously, the benefits from having these pages would be numerous, as the new editors of these language editions would be able to understand much more clearly how to behave in Wikipedia and what is expected from them.

Desired time horizon: creation of this list could already be done by a group of editors from various Wikipedia language editions (2023) using the tools made available by this project.

  • Objective 5: Wikimedia Chapters organize activities and programs to edit the most relevant admin pages

Editors will keep on creating and editing admin pages according to their needs and perceptions of the community. Dashboards showing problems of maintenance and inclusion can have a huge impact on their behavioural. Yet, they may act on an individual basis and moved by many other calls and motivations, as Wikipedians normally do. For this reason it is essential that Wikimedia organizations like chapters or user groups take an active role and organize events around the edition of admin pages.

Prioritizing lists of valuable admin pages is possible thanks to the tools provided by this project. However, effective action can only come from organizations, since they have the capacity to set times and goals to sufficiently large amounts of work much more effectively than individual editors. For this reason, we would find valuable that Wikimedia organizations would include milestones and goals (in the end, dedicated programs) in relation to the edition of admin pages on their annual plans.

WMF Grants committee members could encourage the creation of such programs during the preparation of the next annual plans.

Desired time horizon: these programs could start being organized in the next round of annual plans in autumn (2023).

  • Objective 6: Focused activities for newcomers to go through the admin pages

New editor onboarding can be tedious and take time. Growth team at the Wikimedia Foundation have been designing a set of core experiences for newcomers to discover Wikipedia. They receive new suggestions of edits in their Help and Home panel, among many tasks. We would encourage the use of the results from this project to design itineraries that include admin pages of all kinds: policies, help pages, essays, village pump pages, wikiprojects, tools, disclaimers and copyright.

The tasks can be expanded in time so that the newcomer is not overwhelmed and not presented with all the types of admin pages at once. However, it would be essential that they have a good understanding of how they all play a role in a Wikipedia language edition. This would ensure a proactive and positive experience with admin pages, rather than a reactive one as a consequence of a page or edit removal by another editor.

Desired time horizon: these tasks could be designed and introduced in the new panels in the next years (2023 - 2024).

How to Get Involved[edit]

The admin Pages Analytics does need dissemination in order to reach all the possible Wikimedia events and activities where it could provide some value. If you want to collaborate, get involved. Leave your username and e-mail us at tools.wcdo@tools.wmflabs.org. If you have any question, you can also message marcmiquel or other team members.

Getting involved can be useful in order to find a meeting point or a place to start working on diversity. In case you want to code some extra visualizations, you can find the project's code here: github page.

There are different ways to collaborate with the project:

  • Do research on admin pages (especially in the areas suggested).
  • Add needs and questions you would like to have answered.
  • Expand the code and add some new analyses.
  • Encourage your organization to do more work in this direction.