Grants:Project/MSIG/Welcoming Newcomers Research Initiatives: Analysis, Tools, and Implementation/Project Update

From Meta, a Wikimedia project coordination wiki

Outcomes[edit]

Project description[edit]

“The “Welcoming Newcomers” project aims at developing research and tools to design more effective initiatives to welcome newcomers. The ultimate objective is to be able to validate these initiatives and advance on their implementation.

In this sense, this project wants to research areas that have a potential for high impact, supported by previous research, but are currently overlooked in the Movement. In specific, the project “Welcoming Newcomers” proposes studying these two research areas:

  1. Analyzing the policy & guidelines pages according to different criteria to generate valuable information to help editors keep them up-to-date or delete them.
  2. Understanding the impact of homophily in peer interactions on editor retention to improve the effectiveness of current mentorship features, programs, or events.”

In the past months, as planned, the project has addressed Research Area 1: “Policy & Guidelines Pages”.

Specific Outcomes[edit]

We want to present the different outcomes:

Update: We propose expanding the analyses from “Policy & Guidelines Pages”  to “Administrative Pages” instead of addressing Research Area 2 “Homophily in interactions” and creating a project called Wikipedia Administrative Pages Analytics.

More than “Policy & Guidelines”: deepening in the “Administrative Pages”[edit]

During our analysis of the “Policy & Guidelines” pages, we realized that they belong to a more general group of pages which can be called “Administrative Pages”.

Admin pages are defined as those pages that support administration and governance in order to further Wikipedia's goals. Leaving user pages and specific content discussion pages aside, all those pages dedicated to creating documentation, encouraging debate, and coordination are essential to Wikipedia administration. Wikipedia Administrative pages are usually contained within namespace 4 (Wikipedia) but also include Help pages (namespace 12).

We want to create a new set of analytical tools for the community to be able to draw conclusions from the current state of the administrative pages. A website to measure and improve maintenance and inclusion of editors in Wikipedia administrative pages across Wikipedia language editions.

These complete sets of tools will help understand admin pages and their characteristics, how they are created, and find specific actions to do to improve on them.

Administrative pages grow organically like Wikipedia content, and the more awareness of their state of development and inclusiveness in the processes of creation, the more we will be able to improve on them.

Administrative pages and, in particular, Policies and help pages are essential to newcomers’ onboarding to Wikipedia, and also to regulate everyday behavior and collaboration in Wikipedia.

We aspire to a community that is able to continually revise these pages to make them usable to facilitate their learning. This space is a call to the research community to study them and provide tools for Wikimedians to be able to analyze them.

Rationale for extending the first research area.[edit]

Why does it pay the effort to go further than “policy & guidelines” and expand to “administrative pages”? We quickly give 7 reasons that explain it.

a) The need to conceptualize an area beyond “policy pages”[edit]

Policy and guidelines are indispensable. Help pages support the Wikipedia projects and expand on guidelines, but there are even more types of administrative pages, including Wikiprojects, Wikipedia discussions (Village pump), among others.

There is no typology of administrative pages, and thus, we do not know what happens across languages. Without a systematic view of administrative pages, it is hard to later analyze their state of development, maintenance, or inclusion, among other key aspects.

b) Benefits for the research community[edit]

Policy pages and Wikiprojects are the only types of administrative pages that have been studied by the academic community. Nonetheless, there is no research quantifying their extent, activity, or cultural idiosyncrasies across language editions.

We believe that by creating a framework to look at them and then generating a database/dataset that includes the 300 language editions, we can stimulate work in this direction.

c) New tools: more practical value for the community[edit]

Policies and guidelines have a profound impact on newcomers, as they are the first pages they read on how Wikipedia works. As found by previous research, their contributions to new policies also tend to be rejected, thus decreasing editor retention.

It is undeniable that policy and guidelines (as well as help pages) are impactful to newcomers. However, in previous discussions with the MS strategy team, we realized that the study of “policies and guidelines” are also essential to the rest of the editors, and therefore, any study should look at how to create value for them.

By approaching administrative pages as a whole and across languages, it may be possible to both generate visualizations that enable a better understanding of them, and at the same time, recommendations based on the findings. These recommendations can be useful to guide future work on these pages, and at the same time to point at specific pages that need more attention.

Thus, we believe that by studying Administrative pages and creating analytical tools, we can provide a more direct and tangible outcomes that editors can benefit from - compared to the research 2 (Homophily in peer interactions).

d) Goals of inclusion and completeness[edit]

Policy pages are possibly the administrative pages with the most impact on Wikipedia functioning. However, we believe that the goals of inclusion (all types of editors) and completeness/maintenance should be extended to all administrative pages.

e) Valuable to identify community dysfunction[edit]

Administrative pages are essential to Wikipedia’s proper functioning. Editing them means taking care of Wikipedia, but at the same time, it is also a sign of power since it is there where lies all the decision-making. We believe that not seeing enough inclusivity can be detrimental to general community health.

In other words, we hypothesize that some metrics that explain editing behaviour in administrative pages can also be leading indicators of deteriorating community health and a possible general editor engagement decrease.

f) Consensus in the team and requests[edit]

As soon as we discussed the possibility of expanding the analysis of policies and guidelines pages to all the administrative, we only received positive answers from team members and other project stakeholders.

The set of admin pages is really seen as a messy and unknown group of pages for many Wikipedians, who could benefit from a more distant eyesight of a quantitative approach.

g) Alignment with “Knowledge Gaps” from Wikimedia Research[edit]

Selecting and quantifying the different types of admin pages is a novel activity in Wikimedia and in the Wikipedia research community. Over the years there have been several efforts to quantify the content gaps (e.g., gender, culture, geography, etc.) giving place to tools and visualizations.

The Wikimedia Foundation research team proposed a taxonomy of knowledge gaps, including those related to content, readers, and editors. While this taxonomy appears to be very extensive and takes into account all the past research, it does not address the “Wikipedia infrastructure”, i.e., the admin pages.

Admin pages are susceptible to having content gaps like Wikipedia articles on any topic. Hence, the study of admin pages like any other gap can contribute to the general understanding of the development of Wikipedia.

The Knowldge Gaps taxonomy is a useful framework to study gaps and its causes across dimensions (e.g., the relationship between the gender gap in content and the gender gap in editors). The study of admin pages can contribute to expanding this framework to a new dimension.

Main new outcome: “Wikipedia Administrative Pages Analytics”[edit]

The Wikipedia Administrative Pages Analytics is an extended set of tools to understand and find actions to improve the admin pages in any Wikipedia language edition.

Wikipedia Admin Pages Analytics’ mission is to raise awareness on the current state of development, maintenance, participation, and inclusion in Administrative Pages, these are all those that are used by Wikipedians to coordinate and govern Wikipedia, ranging from policies and guidelines, help pages, deletion discussions, among others.

We envision a Wikimedia community that regularly consults analytical tools to make informed decisions on which pages to create, improve or delete and, also, to modify the course of current processes in order to be inclusive to all Wikimedians. We aspire to have more level-headed decisions based on discussions grounded on data so that we can continue improving rather than fall trap into personal and collective biases.

Research Questions (Summary)[edit]

  1. What are the main types of admin pages?
  2. How have the admin pages been created and edited over time?
  3. How has the edition of the admin pages engaged and included different types of editors to participate?
  4. How much do admin pages different in terms of completeness, relevance, popularity, editing regularity, editing conflict, and recency?
  5. Which are the most valuable admin pages that exist in one Wikipedia but do not exist in another one, thus creating a gap?
  6. Which are the admin pages that exist in one Wikipedia but are more complete in another?
  7. Which are the admin pages that present one or more “red flags”, and thus require editors’ attention/maintenance?
  8. Which are the admin pages that are being edited (recent changes) in the last 24 hours and of which type?

Eight Dashboards (Tools and Visualizations)[edit]

In order to answer these questions, we are developing the following dashboards. We believe this will make it easier for anyone to analyze the data and find easy actions in order to improve administrative pages.

Visualizations[edit]
Types of Admin Page[edit]

This page shows statistics and graphs that explain the different types of administrative pages in Wikipedia language editions.

This dashboard answers questions such as:

  • Which are the main types of admin pages?
  • What is the extent of admin pages?
Pages Over Time[edit]

This page shows statistics and graphs that explain the creation and edition of different types of administrative pages over time.

They depict both the accumulated articles and the new articles created on a monthly basis.

This dashboard answers questions such as:

  • Which are the types of admin pages that grow more over the years?
  • What is the difference between a healthy Wikipedia and a stagnant one?
  • Is there any concentration of edits by a particular type of editor?
  • How have the help pages been taken care of?
Page Characteristics[edit]

This page shows statistics and graphs that describe the characteristics of administrative pages in Wikipedia language editions. It allows comparing the different types of pages in one Wikipedia language edition or one type across two language editions.

This dashboard answers questions such as:

How are articles distributed by length, number of edits, etc.?

How different are articles from these two language editions?

Tools[edit]
Under-edited pages[edit]

On this page, you can screen and find specific administrative pages in Wikipedia language editions based on aspects related to their development, inclusion, and participation level.

Use different metrics and ratios in order to find “red flags” that may indicate that a page is valuable but needs attention. This tool encourages page maintenance and the inclusion of different types of editors in editing admin pages

This dashboard helps you find (for example):

  • Which are the most popular help pages that have not been edited for a long time?
  • Which are the most edited policy pages that are not edited by newcomers?
  • Are Wikipedia Portals mostly outdated?
Admin Page Gaps[edit]

On this page, you can search for content gaps related to administrative pages in Wikipedia language editions. Gaps are pages that exist in one language edition but are missing in another, or simply that they are more complete.

Incomplete Gaps[edit]

On this page, you can search for in-article content gaps related to administrative pages in Wikipedia language editions. Search existing articles in one language more developed in other languages (size, outlinks, etc.) (in-page gaps).

Page Across Languages[edit]

On this page, you can compare an admin. page characteristics across Wikipedia language editions. Introduce the page title and the metrics you want to compare. Browse the results and sort the columns.

Recent Changes Monitor[edit]

On this page, you can retrieve the list of Recent Changes in a Wikipedia language edition according to the type of administrative page they belong to and some characteristics.

This dashboard helps you answering questions such as:

Which are the type of admin page that have been edited more in the past hours?

Do editors edit the admin pages which are in most need of editing according to some metrics?

What changes and what does not[edit]

Budget and Timeline remains the same. The hours allocated to the second research area (400) are employed on this new part of the project.

Project activities remain the same:

• Research ideas that can have a high impact on newcomer retention (after the update: on other types of users).

• Transform the research insights into concrete solutions.

• Disseminate state of the art around these topics and the research results.

• Discuss the results and prototypes to collect feedback on them.

• Invite affiliates and other stakeholders to deploy these initiatives.

Current stage of the development[edit]

The creation of these dashboards is already advanced. They are at: http://wapa.wmcloud.org