WSoR datasets/policy counts

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This dataset contains edit counts of users who contributed to a selected set of relevant policy pages.

Location[edit]

db42:giovanni.policy_counts

Fields[edit]

giovanni@internproxy:~$ mysql -h db42 -e "EXPLAIN policy_counts; SELECT * FROM policy_counts LIMIT 3" giovanni
+-----------+-----------------+------+-----+---------+-------+
| Field     | Type            | Null | Key | Default | Extra |
+-----------+-----------------+------+-----+---------+-------+
| title     | varbinary(255)  | NO   |     |         |       |
| namespace | int(11)         | NO   |     | 0       |       |
| user_id   | int(5) unsigned | NO   | MUL | 0       |       |
| user_name | varbinary(255)  | NO   |     |         |       |
| year      | int(4)          | YES  |     | NULL    |       |
| editcount | bigint(21)      | NO   |     | 0       |       |
+-----------+-----------------+------+-----+---------+-------+
+---------------------------+-----------+---------+-----------+------+-----------+
| title                     | namespace | user_id | user_name | year | editcount |
+---------------------------+-----------+---------+-----------+------+-----------+
| Arbitration/Requests      |         4 |  126234 | !!        | 2007 |         2 |
| Arbitration/Requests      |         5 |  126234 | !!        | 2007 |         1 |
| Reference_desk/Humanities |         4 |  126234 | !!        | 2007 |        77 |
+---------------------------+-----------+---------+-----------+------+-----------+

Each row represents the edit count (field: editcount) of a user (fields: user_id, user_name) in a specific year (field... year!) for a specific page (fields title, namespace). The table is populated only with edit counts to pages from the following list of important policy pages (see here for more information; if you don't have access, you should ask Steven Walling to share it with you):

+----------------------------------------------------------+
| title                                                    |
+----------------------------------------------------------+
| Administrator_intervention_against_vandalism             |
| Administrators                                           |
| Arbitration/Policy                                       |
| Arbitration/Requests                                     |
| Arbitration/Requests/Enforcement                         |
| Arbitration_Committee/Noticeboard                        |
| Article_titles                                           |
| Autobiography                                            |
| Banning_policy                                           |
| Be_bold                                                  |
| Biographies_of_living_persons                            |
| Blocking_policy                                          |
| Bot_policy                                               |
| Categorization                                           |
| Categorization/Ethnicity,_gender,_religion_and_sexuality |
| Categorization_of_people                                 |
| Citing_sources                                           |
| Civility                                                 |
| Conflict_of_interest                                     |
| Consensus                                                |
| Content_forking                                          |
| Contributor_copyright_investigations                     |
| Copyright_problems                                       |
| Copyright_violations                                     |
| Criteria_for_speedy_deletion                             |
| Deletion_policy                                          |
| Disambiguation                                           |
| Dispute_resolution_noticeboard                           |
| Do_not_disrupt_Wikipedia_to_illustrate_a_point           |
| Edit_warring                                             |
| Editing_policy                                           |
| Editor_review                                            |
| Etiquette                                                |
| External_links                                           |
| External_links/Noticeboard                               |
| Fiction/Noticeboard                                      |
| Fringe_theories                                          |
| Fringe_theories/Noticeboard                              |
| Gaming_the_system                                        |
| Help_desk                                                |
| Ignore_all_rules                                         |
| Image_use_policy                                         |
| Manual_of_Style                                          |
| Manual_of_Style_(abbreviations)                          |
| Manual_of_Style_(accessibility)                          |
| Manual_of_Style_(article_message_boxes)                  |
| Manual_of_Style_(biographies)                            |
| Manual_of_Style_(capital_letters)                        |
| Manual_of_Style_(captions)                               |
| Manual_of_Style_(dates_and_numbers)                      |
| Manual_of_Style_(disambiguation_pages)                   |
| Manual_of_Style_(embedded_lists)                         |
| Manual_of_Style_(footnotes)                              |
| Manual_of_Style_(icons)                                  |
| Manual_of_Style_(infoboxes)                              |
| Manual_of_Style_(layout)                                 |
| Manual_of_Style_(lead_section)                           |
| Manual_of_Style_(linking)                                |
| Manual_of_Style_(lists)                                  |
| Manual_of_Style_(lists_of_works)                         |
| Manual_of_Style_(pronunciation)                          |
| Manual_of_Style_(proper_names)                           |
| Manual_of_Style_(self-references_to_avoid)               |
| Manual_of_Style_(spelling)                               |
| Manual_of_Style_(summary_style)                          |
| Manual_of_Style_(tables)                                 |
| Manual_of_Style_(text_formatting)                        |
| Manual_of_Style_(titles)                                 |
| Manual_of_Style_(trademarks)                             |
| Manual_of_Style_(trivia_sections)                        |
| Manual_of_Style_(words_to_watch)                         |
| Neutral_point_of_view                                    |
| Neutral_point_of_view/Noticeboard                        |
| No_disclaimers_in_articles                               |
| No_legal_threats                                         |
| No_original_research                                     |
| No_original_research/Noticeboard                         |
| No_personal_attacks                                      |
| Non-free_content_criteria                                |
| Non-free_content_review                                  |
| Notability                                               |
| Notability_(academics)                                   |
| Notability_(books)                                       |
| Notability_(events)                                      |
| Notability_(films)                                       |
| Notability_(music)                                       |
| Notability_(numbers)                                     |
| Notability_(organizations_and_companies)                 |
| Notability_(people)                                      |
| Notability_(sports)                                      |
| Notability_(web)                                         |
| Offensive_material                                       |
| Ownership_of_articles                                    |
| Patent_nonsense                                          |
| Plagiarism                                               |
| Please_do_not_bite_the_newcomers                         |
| Proposed_deletion                                        |
| Reference_desk/Computing                                 |
| Reference_desk/Entertainment                             |
| Reference_desk/Humanities                                |
| Reference_desk/Language                                  |
| Reference_desk/Mathematics                               |
| Reference_desk/Miscellaneous                             |
| Reference_desk/Science                                   |
| Reliable_sources/Noticeboard                             |
| Requests_for_mediation                                   |
| Sexual_content                                           |
| Sock_puppetry                                            |
| Spam                                                     |
| Vandalism                                                |
| Verifiability                                            |
| What_Wikipedia_is_not                                    |
| Wikipedia_is_not_a_dictionary                            |
| Wikipedia_is_not_for_things_made_up_one_day              |
| Wikiquette_assistance                                    |
+----------------------------------------------------------+

Reproduction[edit]

Run this SELECT inside a CREATE TABLE:

select 
  page_title as title,
  page_namespace as namespace,
  rev_user as user_id,
  rev_user_text as user_name,
sum(rev_user) as editcount
from page p join revision r
on p.page_id = r.rev_page
where page_title in ("Administrator_intervention_against_vandalism", "Administrators") --  page titles here
group by rev_user, page_id, year(rev_timestamp)

The query will create a temporary table and sort data on disk, so it will take some time, depending on how many pages you want to populate the table with.