Wikipedia Administrative Pages Analytics/Policies & Guidelines Methodology

From Meta, a Wikimedia project coordination wiki

In order to collect the policies and guidelines in the selected Wikipedia language editions, we used Wikidata properties and Wikipedia categories.

As far as the first, we found that the Wikidata property “instance of” with the value “Wikimedia project policies and guidelines” (Q4656150) to be useful to identify Qitems of Wikipedia pages that are policies and guidelines. In this sense, we downloaded the Wikidata dump (from dumps.wikimedia.org), parsed it, and stored all the Qitems containing these properties and values. Then, we downloaded the page dumps for the selected Wikipedias in order to match them with the retrieved Qitems and have a selection of “Wikipedia policies and guidelines” for each Wikipedia.

We must acknowledge that not all the pages of policies and guidelines have a Qitem in Wikidata, given that several respond to local interests and idiosyncrasies.

As far as the second, we used the category graph in each Wikipedia to collect Wikipedia pages. First, we identified the main category related to policies and guidelines (the Wikipedia policies and guidelines category from English Wikipedia exists in 85 Wikipedia language editions). Then, from this category, we went down the category graph up to five levels, collecting all the categories and articles within them, iteratively. These allowed us to have more specific pages, still related to the main category of policies and guidelines.

We also must acknowledge that the category graph is sometimes imprecise and some policies might be categorized outside the categories hanging from the Wikipedia policies and guidelines main category. This implies that the selection made using the two approaches may not be comprehensive of all the policies and guidelines, but possibly encompass the most relevant ones.

Even though with limitations, through these two approaches we were able to collect a set of Wikipedia policies and guidelines pages.