Community Health Metrics
Community Health Metrics (CHM) is a project to measure, understand and make recommendations and tools to improve Wikipedia communities health.
Mission and Vision
We believe that research is an essential activity and service to the community. Our mission is explaining the current state of matters in the community and providing valuable and actionable advice. We believe in the need for an open space to share insights and raise awareness on community health among Wikimedia contributors.
We envision a community who is aware of its current state of health and both accompany newcomers in the process of becoming Wikipedians, sustain and takes care of itself, and ultimately, farewell and recognize editors in their process of retirement.
Community Health Areas
There exist some specific categories that are especially relevant for discussing community health: retention, activity, and drop-off.
As far as retention, we want to understand how to persuade newcomers to continue editing through the use of mentorship, onboarding processes, among others.
As far as drop-off, we want to understand how to mitigate the causes of stopping editing and, in any case, to guarantee a peaceful farewell.
Editor Drop-off Framework
This iteration of the project is focused on drop-off (project proposal with a short summary, problem and solution proposed). For this reason, we developed a framework to measure it. We are currently retrieving and analysing data prior to generate public dashboards we will make available in this page.
Why Editor Drop-Off
When an editor leaves the project, we lose their participation and contribution to the community. Multiple factors can influence the decision of leaving Wikipedia, also external to the project, but it could signal an issue related to internal dynamics and the health of the community. However, there is a limited understanding of the dynamics that affect the editors' lifecycle. We lack basic knowledge about the process: at which point in time of their Wikipedian life do people abandon the project? How many veteran editors are leaving the project today compared to the past? Basic statistics currently available, such as the number of active editors, do not capture the various facets of the editors' lifecycle. We also lack solid knowledge about the factors associated with editors leaving the project.
This issue is particularly relevant for underrepresented groups of editors: we know there is a gender gap in the editor community, so it would be especially helpful to understand when and why women editors leave the project; the same applies to other groups that lack representation on Wikipedia, according to aspects such as geographic provenance.
Conflict may impact editor participation, and especially for women editors. Having one's edits reverted reduces the probability of an editor to keep active; this is especially important for newcomers, and a significant effort was devoted to understand and mitigate the barriers for new editors, regulating the behaviour of automated bots and promoting initiatives to welcome newcomers and guide them
within the project. While a big effort was dedicated to retaining new editors, we lack knowledge and initiatives focused on experienced editors. We argue that, for maintaining community health, knowledge of when and why editors leave the project should be available to as many language communities as possible, so that we can understand what is happening and decide how to act to prevent detrimental dynamics and get safer and more constructive spaces in the project.
Community Vital Signs
We propose the creation of 6 indicators that we call Vital Signs. In Medicine, vital signs indicate the status of the body’s vital (life-sustaining) functions. These measurements are taken to help assess the general physical health of a person, give clues to possible diseases, and show progress toward recovery.
In the case of Wikipedia, Vital Signs are related to the community capacity and function to grow or renew itself. Three of them are focused on the entire group of “active editors” creating content: retention, stability, and balance; the other four are related to more specific community functions: admins, specialists, and global community participation. We believe that obtaining the current “active community capacities” in these areas can constitute a valuable reference point to plan to guarantee “openness” in these areas, and at the same time, to observe growth and renewal, and foresee and prevent future risks (e.g., bus factor).
We are currently working on a live dashboard to consult the Vital Signs. For now, you can consult the Report on Polish Wikipedia Vital Signs. You can also check the slides and videos of the presentations in WikiArabia, Wikiindaba and Wikimedia CEE conferences.
Code and Data
vital_signs.py(github) it retrieves the data from Wikimedia dumps MediaWiki History Dump into the database vital_signs_editors.db. It also creates and populates the database vital_signs_web.db, which is used by the website.
Tools and Data
These are some of the outcomes of this project. You can also read the technical documentation on how we built our infrastructure.
not available yet
You may find the current datasets we are using in this folder.
You can check the code that the project is producing at the Github repository.
- Privacy considerations
We are aware that aggregating data and making them available could expose sensitive or personal information about editors. For this reason, we will make sure that the metrics and tools we make available are anonymized and aggregated at a community level or centered on pages and not on editors.
We are very interested in understanding all the potential factors that affect editor retention and editor drop-off. If you think you can provide valuable insight, you can give feedback on drop off through this questionnaire/interview. We have a results report (PDF) based on the feedback received (May 2nd, 2022).
Also, if you have any idea for a specific metric that you think would be helpful to understand community health, please, contact us.
For metrics to be effectives, they must be co-created between researchers and Wikimedians. It is essential that metrics respond to community needs and priorities.
The project proposes the following five goals for this iteration, 2020 - 2021:
- Assess editor drop-off across multiple languages
Get an overview of drop-off statistics in multiple language editions, by analyzing their edit history.
- Increase our understanding of the factors associated with editor drop-off
Find evidence of interaction patterns that tend to be associated with editor drop-off. We will collect data about different kinds of interaction over time and study their relationship with editor drop-off.
- Understand drop-off dynamics for underrepresented groups
Put a special focus on the factors that may affect participation for groups of editors that are under-represented in the project (e.g. by gender, territory, native language).
- Identify spaces potentially affected by detrimental interactions
Help the community to detect pages or groups of pages that are undergoing a critical situation, that might negatively affect editors. We will develop metrics and indicators to detect such situations that may need attention, and eventually the intervention of the community.
- Increase awareness about community health
Help the community to make sense of data by exposing page-level metrics and indicators developed into interactive dashboards. Disseminate the results using community channels and in the scientific community.
Previous Community Health Initiatives
Community Health is a term coined probably in the year 2009 in the process of developing a strategy for the Wikimedia movement back then. Ever since, there have been different initiatives willing to measure and to influence community health. These are some of them:
- Research:Community health
- Grants:Evaluation/Community Health learning campaign
- Community Health on the Strategy/2010-2015 wiki
- Technical Collaboration/Community health
- Community health initiative/Metrics kit
For a more complete view on the initiatives on the topic, you can consult the category Community Health.
These are the latest actions we did in order to raise awareness on the Wikipedia community health. It is the dissemination of research results, concepts, and tools.
- 25/04/2022 | WikiWorkWorkshop @ The Web Conference 2022 | Academic Paper/Presentation: Wikipedia, Elder or Teen? A Look at Growth, Stagnation and Decline Patterns Across 50 Language Communities. published at the Wiki Workshop 2022 (PDF)..
- 14/04/2021 | Academic Paper: Miquel-Ribé M, Consonni C, Laniado D. Community Vital Signs: Measuring Wikipedia Communities’ Sustainable Growth and Renewal. Sustainability. April 14, 2022; 14(8):4705. https://doi.org/10.3390/su14084705 (PDF)..
- 24/11/2021 | Online presentation for the Italian Wikipedia community hosted by Wikimedia Italia | “Community Health Metric” Misuriamo lo stato di salute delle comunità di Wikipedia (slides PDF|notes PDF)
- 20/11/2021 | Viquitrobada (Catalan Wikipedia Annual Gathering) 2021 talk (program page) | Session: Measuring Catalan Wikipedia Community Health: Are We “Open” to Community Growth and Renewal?. (slides and notes PDF (català)).
- 7/11/2021 | Wikimedia CEE Meeting 2021 talk (program page) | Session: Measuring Central and Eastern Europe Wikipedias Growth and Renewal. (slides and notes PDF | video recording).
- 5/11/2021 | Wikiindaba 2021 talk (program page) | Session: African language Wikipedias - indicators for development, growth and renewal. (slides and notes PDF | video recording).
- 15/10/2021 | WikiArabia 2021 talk (program page) | Session: Measuring Arabic Wikipedia Community Health: Are We “Open” to Community Growth and Renewal?. (slides and notes PDF | video recording).
- 15/08/2021 | Wikimania | Session: Presentation and Open Discussion. (slides PDF | video recording).
- 14/04/2021 | WikiWorkWorkshop @ The Web Conference 2021 | Academic Paper/Presentation: Miquel-Ribé, M., Consonni, C., & Laniado, D. Wikipedia Editor Drop-Off A Framework to Characterize Editors' Inactivity (PDF).
- 24/10/2020 | ItWikiCon | Presentation: Cristian Cantoro Community Health Metrics: presentazione del progetto).
Community Health Metrics is an open space to research, understand and propose solutions to improve the community health in the different Wikimedia projects. If you like the initiative, please join the team. Any community member, affiliate (chapter or User Group) member or director interested in learning about community health is welcome.
- Lam, S. T. K., Uduwage, A., Dong, Z., Sen, S., Musicant, D. R., Terveen, L., & Riedl, J. (2011). WP: clubhouse? An exploration of Wikipedia's gender imbalance. In Proceedings of the 7th international symposium on Wikis and open collaboration.
- Hill, B. M., & Shaw, A. (2013). The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation. PloS one, 8(6).
- Laniado, D., Kaltenbrunner, A., Castillo, C., & Morell, M. F. (2012). Emotions and dialogue in a peer-production community: the case of Wikipedia. In Proceedings of the eighth annual international symposium on Wikis and open collaboration (pp. 1-10).
- Bear, J. B., & Collier, B. (2016). Where are the women in Wikipedia? Understanding the different psychological experiences of men and women in Wikipedia. Sex Roles, 74(5-6), 254-265.
- Halfaker, A., Kittur, A., & Riedl, J. (2011). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th international symposium on wikis and open collaboration (pp. 163-172).