Research:Developing Metrics for Content Gaps (Knowledge Gaps Taxonomy)/Choosing a Set of Metrics
In this page, we present the process to select a set of metrics to characterize the content gaps (Phase 2 of the research process).
To be able to choose the metrics, we needed to identify a set of initial criteria based on the project constraints, then review the scientific literature and especially the different models describing the various aspects in which gaps can be measured, and finally, identify the criteria derived from communities' goals and needs while bridging the gaps. Hence, the chosen metrics are the coincidence between scientific literature and community needs.
Setting the criteria
[edit]a) Project constraints
[edit]The selection of metrics should be a set of N metrics (where N is a small number by design constraints) that are easy to learn and to compare. At the same time, the initial set of metrics should be consistent across content gaps - whether it is the gender gap or the geography gap. This means that we cannot select metrics that are too specific at explaining a topic-related aspect which does not apply to another topic.
Initial Criteria
- Metrics should be a few to be referential and easily memorable (1-4).
- Metrics should be consistent across gaps (topic-agnostic).
b) Scientific literature review
[edit]The selection of metrics should be a prime variety of aspects. First, we consider that they should represent the content gap, which means that they need to be able to explain it. In this sense, one single metric may not encompass all the different aspects of a gap, and at the same time, there's an increased risk that it stops being a good measure (following Goodhart’s Law, "When a measure becomes a target, it ceases to be a good measure.").
Second, we need to be specific on which aspects of a content gap need to be measured, and then do not choose more than one metric for it. In this sense, reviewing the different content gap models is essential, since they explain them according to some dimensions. The final selection of metrics should be orthogonal according to these dimensions.
Orthogonality is always the best criterion for the group of metrics, so that different metrics capture different dimensions. We cannot have two metrics for the same dimension. If there are important aspects not taken into account, we may need to revise the “content gaps models” and add new dimensions in the future.
We studied these models in the following distinct dimensions: Selection, Extent, Frame, Visibility-Positioning, Topic sources (External Selection), Wikimedia global projects (External Selection), Editor Engagement. The dimensions with a higher degree of coincidence across models were selection, extent, and visibility. Selection is related to the number of articles of a gap, extent to the completion of the content, and visibility to the degree of prominence in the project.
Third, we reviewed content gaps metrics in the existing scientific literature for each of the gaps. We found different maturity for each gap (22 papers on the content gender gap, 4 on LGBTQ+, 16 geography gap, 18 cultural background gap, and 2 on the time gap). Many research papers create metrics to understand a particular aspect of Wikipedia rather than explaining the state of the gap and support the communities.
Scientific Criteria
- Metric selection should represent the most defining characteristic of a content gap.
- Metric selection should be orthogonal to the different aspects of a gap.
- Metrics should be scientifically mature and have been employed in past research.
c) Community goals and needs
[edit]The selection of metrics should support communities' goals and needs while bridging the gaps. For this reason, we first reviewed some gender-gap affiliates documentation and then engaged in exploratory conversations trying to understand their priorities and mindset. Choosing gender gap affiliates was an arbitrary choice under the assumption that they're the best organized in terms of bridging the gaps.
These conversations were essential to uncover the current use of some metrics and at the same time the need for obtaining indicators for certain aspects of the gap. In this sense, the visibility in terms of number of women appearing on the Main Page (i.e., outlinks to the women biographies) turned out to be a valuable indicator of visibility.
We realized that metrics should be actionable, as in being able to translate into specific actions to improve on them. But also, they should be able to support goals on different time span. Because affiliates and individual contributors set their goals and establish milestones in order to encourage the community.
Community Criteria
- Metrics should be already in use or have some clarity and consensus of their value among stakeholders.
- Metrics should have some actionability and encourage immediate change.
- Metrics should be designed to both support and short-term and long-term goal setting.
Final choice of metrics
[edit]The final choice of metrics fulfils the different criteria. As an initial set, they are these three:
- Selection (Number of articles for each category of the gap). e.g., number of articles for each country for the geography gap.
- Extent-Score (indicator similar to wikirank to explain the degree of completion/quality of articles based on length, # sections, # images). The extent-score will explain “how good” the articles in each category are.
- Visibility (Percentage of articles for each category in spaces like the “Main Page” or the group of “Featured articles”).
The first metric is the number of items (Selection). This is the result of mapping articles to the content gap, and it stands out as the most obvious and necessary metric. Editors need to know the number of articles created for a specific category or the imbalance between two (i.e., men biographies vs women biographies).
The second metric is a compound-quality score, which indicates the degree of completion of an article. This is a compound metric which encompasses length, # sections, # images, among other aspects. Compound metrics can be a good solution for certain dimensions. e.g., in Extent, in which a single indicator like length (in number of Bytes) might be valuable but too similar to number of references or number of images. Therefore, we risk not having orthogonality.
Here we find some tension between what the community would like (“actionability”) and scientific maturity, which tends to create scores. We lean towards the second, as choosing an actionable metric (e.g., number of sources, which fits very well with certain campaigns) may not be enough. Community members work on various aspects.
The third metric is visibility, which is the percentage of articles for each category of a gap that are visible in the Main Page. This has been requested for the Main Page, although it would also make sense to explore its implementation in the Featured Articles, considering that these reviewed articles with a quality distinction that obtain some extra visibility in the project.
These metrics are implemented and can be seen in the form of visualizations.
Unaddressed aspects
[edit]Some very valuable aspects of the content gaps were mentioned in the interviews and have not been addressed.
They would require new original research beyond the scope of this project and will be addressed in the future.
- Deletion process (deletionability)
- Engagement (power dynamics behind the article edit history)
- Different types of references (authority sources)
References
[edit]- ↑ Wagner, Claudia; Garcia, David; Jadidi, Mohsen; Strohmaier, Markus (2015-04-21). "It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia". Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media.
- ↑ Beytía, P., & Wagner, C. (2021). Visibility Layers: A Framework for Facing the Complexity of the Gender Gap in Wikipedia Content. Available at SSRN 3774293.