WMDE New Editors and Banner Campaigns Dashboard

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This is the dashboard's user documentation. For technical documentation go to Wikitech.

Introduction[edit]

The WMDE New Editors/Banner Campaigns Dashboard is an RStudio Shiny dashboard developed by WMDE in order to track, visualize, and compare two sources of data: (a) editor activity (user registration and edits) of spontaneously registered editors of the German Wikipedia, and (b) editor activity of those users of the German Wikipedia who registered through some of their Banner Campaigns. It provides an overview of user edits on the German Wikipedia since the encyclopedia’s birth and until nowadays as well as for a reference period beginning with 2017 (when WMDE started to run their Banner Campaigns).

Data Sources[edit]

The dashboard mainly relies on time series obtained from the WMF Data Lake and the MySQL tables for dewiki. Some of the data sets are obtained and processed separately for each Banner Campaign run by WMDE and they are integrated into the dashboard from different data sources. Finally, data on Training Modules – if and when they are used in some particular Banner Campaign – are integrated manually (i.e. by a separate set of R procedures), and must be handled by a person who has an NDA signed with the WMF. The reason behind the later constraint is related to the fact that the data sets obtained from Training Modules identify users by their user names; these information is used once in order to match the data sets identifying user by user IDs with those identifying users by their usernames. The Dashboard relies on a set of public data sets in which the user IDs are anonymized.

The organization of the dashboard[edit]

The dashboard is organized in five sections (tabs) that are available from its sidebar navigation: New Editors Overview, Campaigns, Campaign Archive, Campaign Codes, and Documentation. In the following lines we present the features present in each of the sections.

Note. All time series charts are produced from the R {dygraph} package. Hovering over the chart will result in the presentation of exact data for the given point in time (relative to the position of the cursor on the horizontal axis which represents time). The time resolution is monthly in all charts. The selectors below each time series chart can be used to focus on a specific period of time. Since the data on spontaneously and campaign registered users differ in scale, an option to use a logarithmic scale on the vertical axis (representing the data of interest, i.e. user registrations or use edits) is included. We suggest to always use the log scale.

Note: the download data buttons. The Data (csv) buttons are scattered all over the dashboard allowing the user to download any specific data set used to produce a visualization below which the button is found.

New Editors Overview Tab[edit]

  • Reaching the 10th edit. The time series chart represents the number of users who reached 10 or more edits since the birth of the German Wikipedia. The data for the campaign and spontaneously registered users are represented separately. The source of the data is the WMF mediawiki_history table; please note that this table is update monthly and that the chart will be updated as soon as its new snapshot is available.
  • Reaching the 50th edit. This is the same chart as the previously described except for it counts the number of users who have reached their 50th edit.
  • Time to reach the 10th edit. Distribution of the number of editors who have reached their 10th edit in a specific time, provided separately for the spontaneous and campaign registrations. The horizontal axes represent time, binned in intervals of five days. The vertical axes represent the number of editors for whom it took the respective number of days (e.g. 0 - 4 days, 5 - 9 days, etc.) to reach their 10th edit. All data from the beginning of time are considered. N.B. The outliers are removed. The source of the data is the WMF mediawiki_history table. The update of this dataset is dependent on the current table's snapshot which is itself updated once monthly. The Dashboard updates this chart as soon as the new snapshot of the relevant table is made available.
  • Time to reach the 50th edit. The same distribution as the previous one except for the criterion is the 50th edit.
  • Edits since January 2017. January 2017. is chosen as a reference point in time since WMDE started to run its Banner Campaigns in 2017. The edits of spontaneously and campaign registered users are provided separately. This chart is update daily and relies on the dewiki.revision table.
  • Edit Classes. We cross-tabulate the number of spontaneously and campaign registered users with several categories reflecting the number of edits they have made. The left table in this section provides exact data while the right table provides the percents.
  • Registrations. The count of spontaneous vs. campaign registrations is charted against time (since the birth of the German Wikipedia). The dataset is fetched from the WMF mediawiki_history table and thus depends upon its current monthly snapshot.

Campaigns Tab[edit]

Except for the following, the description of the dashboard functionality for this tab remains the same as it is for the previous tab:

  • there is selection box in the top of the page, “Please select campaign(s)”;
  • the box offers a list of all WMDE Banner Campaigns that are currently archive and thus available from the dashboard;
  • if a single campaign is selected, all tables and visualizations reflect the data for the selected campaign only;
  • if multiple campaigns are selected, the dashboard will aggregate the data across all selected campaigns and adjust the visualizations and tables accordingly.

Note: all time series on this tab are referenced from January 2017, since WMDE started running its Banner Campaigns in that year.

Campaign Archive[edit]

Upon a selection of one particular campaign from the drop-down menu on the top of the page, the Dashboard will generate three charts with accompanying tables:

  • Banner Impressions: this is an overview of the number of banner impressions across the time span of the campaign, provided separately for each banner used in the respective campaign. The exact data and the totals are provided in the table below the chart.
  • Pageviews: an overview of the pageviews for the selected campaign across its time span, provided for each campaign page separately. Again, the table below the chart provides exact pageviews and the total.
  • Registrations: the number of campaign registered users from the selected campaign across its time span; the table below the chart provides exact numbers.

Campaign Codes[edit]

In order to avoid overcrowding on the screen, the dashboard offers campaign codes instead of campaign “real names” in its drop-down menus. For example, the WMDE 2018 Summer Banner Campaign will be found as “2018_SuBC” in the dashboard. To ease the navigation, the Campaign Codes tab provides a table where the campaign codes for each WMDE Banner Campaign are found. The banner codes (i.e. the value of the “event_campaign” field) are also provided for each campaign.