Wikilytics Plugins

From Meta, a Wikimedia project coordination wiki

Wikilytics consists of two parts:

  • The data chain that downloads, extracts, and stores a Wikipedia dump file into a database.
  • The dataset functionality that runs a query against the dataset built in phase 1.

This page documents the inner workings of the dataset functionality, and in particular how to use plugins and how to write your own.

Running a plugin[edit]

The generic way to start a plugin is as follows:

python dataset -c name_of_plugin -k keyword1=value1,keyword2=value2

You can specify the granularity of the observations. The default is to aggregate observations to a year, for example the number of new editors in a given year. But you also do this:

python dataset -c new_editor_count -k time_unit=month

or even

python dataset -c new_editor_count -k time_unit=day


In the first case, you will break down the number of new editors to a monthly level and in the second case you can even get counts at a daily level. This option applies to all Wikilytics plugins.

Most plugins do not need the -k option, but it will give you an additional level of control over the output of the plugin.

Generic Plugins[edit]

Generic plugins are plugins that answer high-level trends of a Wikipedia project.

Plugin name Plugin description
new_editor_count This is the default plugin that will run if you do not explicitly call for another plugin. This plugin counts the number of New Wikipedians for every year / month combination since the start of the project being analyzed.
time_to_new_wikipedian This plugin calculates for each new wikipedian, how many days it took to become a new wikipedian, (a new wikipedian is generally defined as someone who has made 10 edits).
active_editor_count You can invoke this plugin as follows: python dataset -c active_editor_count -k time_unit=month,cutoff=5. This will count for every year/month combination the number of editors who made at least 5 edits in that given month/year.
histogram_edits This plugin is used to create dataset that can be visualized as a histogram. You can invoke this plugin by entering: python dataset -c histogram_edits -k time_unit=month,namespace=1,2. This will create a csv file that will count for each namespace/month/year combination in the frequency of number of edits.
total_cumulative_edits You can invoke this plugins as follows: python dataset -c total_cumulative_edits -k namespace=1;2;3;4,time_unit=month. This will count the number of edits for a given namespace/month/year combination. This data can then be used to create a line chart. The namespace keyword is optional, if you do not specify it then the main namespace is assumed.
total_number_of_articles Does not work yet.
total_number_new_wikipedians This plugin counts the number of new wikipedians in a given time unit (choices are year, month and day). A New Wikipedian is a person who made at least 10 edits. There are no other optional arguments for this plugin.

Editor Trends Study Plugins[edit]

Plugin name Plugin description
ets_cohort_backward_bar To be added
ets_cohort_backward_histogram To be added
ets_cohort_forward_bar To be added
ets_cohort_forward_histogram To be added

Taxonomy Plugins[edit]

More at Contribution Taxonomy Project
Plugin name Plugin description
taxonomy_burnout To be added
taxonomy_list_makers To be added

Plugins in Development[edit]

Plugin name Plugin description
edit_patterns The purpose of this plugin is to identify the most common editing patterns of Wikimedians. An editing pattern shows the monthly sequence of activity and inactivity. The output consists by editor / by year of raw with True and False values. True indicates that the editor made more than cutoff edits and False means the editor did not reach the cutoff value.