Research:Top edited Portuguese Wikipedia articles in 2016
Based on similar researches regarding Wikipedia in other languages, this project aims to present a list of top edited pages in Portuguese Wikipedia.
Currently, all Wikipedia projects runs over MediaWiki software, maintened by Wikimedia Foundation. When every edit is made on this software, a new record is included in the revision table of the database describing all information of that edit (when it was done, what page, who made it, what changed etc). Knowing this, we measure the number of edits counting the number of records in revision table for each page, in the period of 2016. To achieve this, it is needed to access the Portuguese Wikipedia database. A backup of the database can downloaded in its full form from Wikimedia Downloads, or queries can be run online through Quarry.
This approach is similar to Top edited English Wikipedia articles in 2016, but since most queries should be run once, performance was not a concern here. So the result of this query represents exactly the main objective of this project.
Query executed in the database (SQL language)
use ptwiki_p; set @total_edit_rank = 0; set @prev_edit_count = 0; SELECT t.rank , REPLACE(t.page_title, '_', ' ') page_title , t.total_edits FROM ( SELECT t.* , @total_edit_rank := @total_edit_rank + IF(@prev_edit_count=t.total_edits, 0, 1) AS rank , @prev_edit_count := t.total_edits FROM ( SELECT page_title , COUNT(*) AS total_edits FROM revision INNER JOIN page ON (revision.rev_page = page.page_id) WHERE revision.rev_timestamp LIKE '2016%' and page.page_namespace = 0 GROUP BY page.page_title ) t ORDER BY t.total_edits DESC , t.page_title ASC ) t WHERE (t.rank <= 100) ;
In this approach, we list the 100 most edited pages, in a way that multiple pages may belong to the same position in the rank if they have the same number of edits (the number of listed pages may vary, but it is assured to be equals or greather than 100). We also filter pages of the main namespace only, that is, only Article pages.[note 1]
Policy, Ethics and Human Subjects Research
Although editors' information is publicly available, their names or IP are not the subject of this research project, so they can not be identified using the results published here. Their usernames or IP are stored in the queried table of the database, but it is not disclosed for any purpose in any part of the process (even to people involved in this project, or to people aiming to reproduce the results presented here).
The table bellow presents the results extracted from the database. Nothing was changed, despite formatting it as a wikitable with wikilinks (big thanks to TablesGenerator.com).
Technical info and other considerations
Run times and server info
The query presented here was executed in Wikimedia Tool Labs servers, at Feb 17, 2017, 10:00 PM (UTC). Their run times are:
real 1m26.288s user 0m0.003s sys 0m0.015s
Other relevant information about client and server instances:
$ uname -a Linux tools-bastion-03 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ mysql --version mysql Ver 15.1 Distrib 5.5.54-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2 $ ./sql enwiki 'SHOW VARIABLES LIKE "%version";' +------------------+-----------------+ / Variable_name / Value \ +------------------+-----------------+ / innodb_version / 5.6.21-70.0 \ / protocol_version / 10 \ / tokudb_version / tokudb-7.5.3 \ / version / 10.0.15-MariaDB \ +------------------+-----------------+
- Portuguese Wikipedia has several namespaces for various purposes (User pages, User talks, Discussion, Help pages etc). But articles are only in the main namespace, and there is nothing in the main namespace but articles.