Wikipedia Administrative Pages Analytics/Methodology

From Meta, a Wikimedia project coordination wiki

In this page we explain the methodology to generate the database by selecting the admin pages.

Wikipedia administration[edit]

https://en.wikipedia.org/wiki/Wikipedia:Administration

Wikipedia requires a certain amount of administration and governance in order to further the project's goals. To achieve Wikipedia's purpose, a wide range of administrative pages are made available in various namespaces which enumerate the various protocols and conventions created and implemented by community consensus.“

Admin pages are essential to every language edition.

Namespaces “Wikipedia and Help”[edit]

Admin pages are in the namespaces “Wikipedia” and “Help”.

For example, the Wikipedia namespace (namespace 4) is defined in the following way:

“The project namespace or Wikipedia namespace is a namespace consisting of administration pages with information or discussion about Wikipedia. Pages in this namespace will always have the prefix Wikipedia.”

And, the Help namespace (namespace 12) is defined as:

“The Help namespace is a namespace consisting of "how-to" and information pages whose titles begin with the prefix Help. These pages contain information intended to help use Wikipedia or its software. Some of these pages are intended for readers of the encyclopedia; others are intended for editors, whether beginning or advanced. There is a large amount of overlap between the Help namespace and the Project namespace (pages with the Wikipedia: prefix).”

There are other namespaces with an administrative purpose (in this case, to manage and organize content) like Category (namespace 14), Portal (namespace 100), and Template (10).

Other namespaces like User (namespace 2), File (namespace 6), Mediawiki (8), Draft (118), and all the namespaces related to page discussions seem to be created for other purposes than collective management and administration of Wikipedia.

We assume then that all administrative pages employ namespaces “Wikipedia” and “Help”.

Category “Wikipedia administration” and Types of Admin Pages[edit]

Administrative pages are usually located within the category “Wikipedia administration” in English Wikipedia. But not all languages do have one. Only 150 Wikipedia language editions have an equivalent category. For example, in the Luxemburg language Wikipedia, the administrative pages are contained in the “Wikipedia” category.

After a quick look at the Wikipedia administration category, we can see the main categories in it; they represent general administration purposes. We want not only to collect all the admin pages, which in fact it is possible only by selecting all the Wikipedia pages in namespaces 4 and 14 (Help and Wikipedia), but to have basic cartography of their types. We want to know the topics and purposes admin pages cover.

We want to design a process to find the types of admin pages and to qualify admin pages in every Wikipedia language edition.

We want the process to be:

  • Language-agnostic. It should work for every current and future Wikipedia language edition that can be run automatically with an algorithm.
  • Comprehensive. It should capture all or the most relevant admin. pages in every language edition.
  • Contextual. It should categorize the admin. pages that are unique to each Wikipedia language edition.

At the same time, we expect that the final selection of types of admin pages are:

  • Manageable and memorable. They should be a reasonable number for memorability.
  • Universal. They should be as equivalent as possible between languages to compare the number of pages.
  • Intentional. They should point at valuable purposes or themes for Wikipedians.

Categories in “Wikipedia Administration” Category[edit]

The main categories contained in the English Wikipedia “Wikipedia administration” category allow us to understand the main topics and purposes, but not all of them. Within the category graph hanging from it, we can find numerous subtopics that in other languages may have more relevance. Nonetheless, we want the most important ones as they will represent the types of admin pages.

We want to have the admin types for all the Wikipedia language editions and for this, we need a starting point: English Wikipedia “Wikipedia administration”. Hwang and Shaw, (2022) found that similar rule-making activity across the five communities replicates and extends prior work on English language Wikipedia alone. Hence, we can assume that most of the other types other than policies and guidelines will follow a similar pattern.

Selected Admin. Top categories in English Wikipedia:

  1. Policies and Guidelines
  2. Help
  3. Essays
  4. Village pump
  5. Wikiprojects
  6. Tools
  7. Disclaimers
  8. Copyright
  9. Deletion
  10. Maintenance

These are categories chosen by their relevance, but also the cross-language availability by looking at the number of interwiki links.

We should notice that there can be an overlap between these categories. What is in fact a policy is sometimes seen as a help page (possibly not the other way around). Essays are reflections that can complement guidelines. Village pump is a space oriented to discuss any topic. Wikiprojects are spaces to coordinate to create content or organize events. Tools are documentation. Disclaimers and copyright are policies, but at a different level.

Deletion and maintenance are different than the rest. They do not present a topic or activity to regulate or encourage, but organize all the pages (content or not) that either should be updated or deleted.


Wikidata Properties for Admin Pages[edit]

Once we selected these ten categories in English Wikipedia, we want to know how to find pages that relate to them in each Wikipedia language edition.

Wikidata serves as a backbone for all Wikipedia language editions. Therefore, all the Qitems there that represent an admin page can be easily mapped to the language editions in which it exists.

In order to understand how Wikipedia could supports the page mapping, we looked at the ten categories corresponding Qitems in Wikidata and found that they all use the property “instance of”. These was associated with the following Qitems: “Wikimedia project page” (Q14204246), “Wikimedia internal item” (Q17442446), “Wikimedia project policies and guidelines” (Q4656150), “Wikimedia help page” (Q56005592), “Wikiproject” (Q16695773).

WikiProject Stub sorting

instance of. WikiProject

Wikipedia:Stub

instance of. Wikimedia project page

instance of. Wikimedia project policies and guidelines

Wikipedia:Articles for deletion

instance of. Wikimedia project page

instance of. Wikimedia project policies and guidelines

Wikipedia:Proposed deletion

instance of. Wikimedia project page

Project:Village pump

instance of. Wikimedia project page

Wikipedia:Community portal

instance of. Wikimedia project page

Wikipedia:Copyrights

instance of. Wikimedia project page

Wikipedia:Maintenance

instance of. Wikimedia project page

Wikimedia category

instance of. Wikimedia project page

instance of. Wikimedia project policies and guidelines

instance of. Wikimedia help page

Project:Autobiography

instance of. Wikimedia project page

instance of. Wikimedia project policies and guidelines

Help:Contents

instance of. Wikimedia project page

instance of. Wikidata internal entity

instance of. Wikimedia help page

Wikipedia:Unified login

instance of. Wikimedia help page

Wikipedia:Merging

instance of. Wikimedia project page

instance of. Wikimedia project policies and guidelines

però després, les categories són:

Wikipedia information pages  —> que penja de Wikipedia help

Wikipedia merging —> que penja de Maintenance.

WikiProject Merge —> que penja de l’anterior… Wikipedia merging.

Wikipedia:Maintenance

instance of. Wikimedia project page

subclass of. Wikimedia administration category

Wikimedia project policies and guidelines

instance of. Wikimedia project page

instance of. Wikimedia internal item

Wikipedia:List of policies and guidelines

instance of. Wikimedia help page

instance of. Wikimedia list article

instance of. Wikimedia project page

In order to get a grasp of Wikidata’s level of annotation, we decided to run a query to obtain the number of qitems of each type. The following is the query that obtains all the “Wikimedia project policies and guidelines” qitems.

SELECT ?item ?itemLabel

WHERE

{

 ?item wdt:P31 wd:Q4656150.

 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

}

We found the following results: 31520 Wikimedia project page, 74 internal item, 6529 Wikimedia project and guidelines, 663 Wikimedia help page and 3623 Wikiproject.

Category crawling in each Wikipedia language edition[edit]

While we use Wikidata to identify and map Policies and guidelines, Help pages, and Wikiprojects, to each Wikipedia language edition, we identified 7 more types of admin pages: Essays, Village pump, Wikiprojects, Tools, Disclaimers, Copyright, Deletion, and Maintenance.

We will use the category graph in each Wikipedia containing the selected categories to retrieve articles. First, we take the main category (e.g., the one related to policies and guidelines, Wikipedia policies and guidelines category from English Wikipedia, which exists in 85 Wikipedia language editions). Then, from this category, we go down the category graph up to five levels, collecting all the categories and articles within them, iteratively. This allows us to have more and more specific pages, still related to the main category of policies and guidelines. The further from the top level, the more likely that there is a wrong category and therefore pages may not entirely relate to the main topic. For this reason, after some tests, we arbitrarily decided to stop at level number 5. Similarly, in order to avoid loops, we kept track of the categories that have already been run, given that one category could be at different levels all below the main category.

Looking for missing categories and category crawling[edit]

Even though the selection of the ten categories was made with a criterion of cross-language availability, some of them do not exist in some language editions. In fact, many language editions may have very admin pages because they do not have the need to structure them using categories.

Other languages have different names for the categories and do not totally correspond with the selected ones. For this reason, we found an approach to first find the equivalent (or most similar category) in that language edition, and then, run the category crawling.

For example, the category “Wikipedia help” (as described in Wikidata: wiki category common for all wiki projects; the top-level category for help pages) exists only in 139 Wikipedia language editions. It does not exist in Swahili Wikipedia. What should we do in this case?

We take all the categories that were collected in English Wikipedia during the category crawling. Then, we check which ones exist also in Swahili Wikipedia. Then, we check the number of pages they have below until exhaustion, and we take the largest, category encompassing more pages in the below levels in the category graph.

Then, we check that at least it contains directly some pages that were also categorized as help pages in the English Wikipedia. We validate this category as the equivalent top category for “Wikipedia help” in Swahili Wikipedia. We use that one to start the category crawling and we run it down until exhaustion.

There is the possibility that some English Wikipedia categories part of the “Wikipedia help” that exist in Swahili Wikipedia were not collected doing this category crawling. In this case, we run a separate category crawling from each of them. We never repeat paths between the different category crawling.

Summary of Steps[edit]

In order to obtain the types of admin pages and identify the administrative pages in every Wikipedia language edition, we are performing the following steps.

  1. Identify the main namespaces associated to administrative pages: Wikipedia (4) and Help (12).
  2. Identify the main categories within the category “Wikipedia Administration” in English Wikipedia and select the ones representing specific purposes, having more articles within them, and with more interwiki links, i.e., more shared across languages.
    1. These are: Policies and Guidelines, Help, Essays, Village pump, Wikiprojects, Tools, Disclaimers, Copyright, Deletion and Maintenance.
  3. Find the properties (“instance of”) and the values (Qitems) associated with administrative pages:  “Wikimedia project page” (Q14204246), “Wikimedia internal item” (Q17442446), “Wikimedia project policies and guidelines” (Q4656150), “Wikimedia help page” (Q56005592), “Wikiproject” (Q16695773).
  4. Crawl the category graph:
    1. In the languages where the categories from (2a) exist, we map articles to them by doing a category crawling for five levels. We collect articles and categories associated to each top category.
    2. In the languages where the categories from (2a) do not exist, we use those categories associated to a category in English Wikipedia and use the interwiki links to see which exist in the secondary language. Then, we use these categories in the secondary language to do a category crawling starting with the largest one.
  5. Compute statistics, visualize groups of pages and retrieve specific pages based on each type of admin page (2) categories and (3) wikidata properties-values.