Research:Knowledge Gaps Index/Taxonomy/Full paper

From Meta, a Wikimedia project coordination wiki

This paper, titled "A Taxonomy of Knowledge Gaps for Wikimedia Projects", was previously made available at https://arxiv.org/pdf/2008.12314.pdf . Licensed under CC-BY.

A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft)
MIRIAM REDI, Wikimedia Foundation
MARTIN GERLACH, Wikimedia Foundation
ISAAC JOHNSON, Wikimedia Foundation
JONATHAN MORGAN, Wikimedia Foundation
LEILA ZIA, Wikimedia Foundation
EXECUTIVE SUMMARY

In January 2019, prompted by the Wikimedia Movement’s 2030 strategic direction[1], the Research team at the Wikimedia Foundation[note 1] identified the need to develop a knowledge gaps index—a composite index to support the decision makers across the Wikimedia movement by providing: a framework to encourage structured and targeted brainstorming discussions; data on the state of the knowledge gaps across the Wikimedia projects that can inform decision making and assist with measuring the long term impact of large scale initiatives in the Movement.

Since July 2019 and as the first step toward building the knowledge gap index, the Research team has developed the first complete draft of a taxonomy of knowledge gaps for the Wikimedia projects. We studied more than 200 references by scholars, researchers, practitioners, community members and affiliates—exposing evidence of knowledge gaps in readership, contributorship, and content of Wikimedia projects. We elaborated the findings and compiled the taxonomy of knowledge gaps in this paper, where we describe, group and classify knowledge gaps into a structured framework. The taxonomy you will learn more about in the rest of this work will serve as a basis to operationalize and quantify knowledge equity, one of the two 2030 strategic directions, through the knowledge gaps index.

We hope you enjoy the read and join the conversation about how we can improve the taxonomy of knowledge gaps for the Wikimedia projects. If you have any suggestion or feedback, please reach out to us via the Knowledge Gaps Index Meta page.[note 2]

INTRODUCTION[edit]

With almost 54 million articles written by roughly 500,000 monthly editorsdoes this mean 500,000 different editors in a month? Something else?[unclear] across more than 160 actively edited languages, Wikipedia is the most important source of encyclopedic knowledge and one of the most important knowledge resources available on the internet. Every month, the project attracts users on more than 1.5 billion unique devices from across the globe, for a total of more than 15 billion monthly pageviews[2].

While global and massively popular resources, Wikipedia and its sister projects such as Wikimedia Commons and Wikidata, suffer from a wide range of knowledge gaps which we define as disparities in participation or coverage of a specific group of readers, contributors, or content.

A typical example of a knowledge gap is the gender gap, one of the most well-studied gaps in the Wikiverse. Researchers and practitioners have investigated the gender gap by measuring the representation of different gender groups in content, readers, and contributors, and found, for example, that less than 20% of the biographies in Wikipedia are about women[3], roughly 75% of readers in Wikipedia are men[4], and that this disparity becomes even more extreme when analyzing editors’ gender distribution[5].

Beyond the gender gap, only a handful of works have studied ways to quantify or address other kinds of knowledge gaps—for example by analyzing readers’[6] and editors’[7] motivations for accessing the site, studying the role of technical skills and awareness in Wikimedia contributorship[8][9] and readership[10], designing algorithms to identify and prioritize missing content in different languages[11], or proposing question-and-answer facilities to satisfy readers’ information needs[12].

While these works made substantial progress towards measuring and identifying directions for bridging individual gaps, they did not provide a holistic and systematic framework to understand the scope of knowledge gaps in Wikimedia and the relationships between them.

To see why having such a comprehensive map of the space of knowledge gaps could be key for Wikimedia projects, consider the main directions identified as part of the Movement Strategy[1]: by 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge, and achieve knowledge equity by focusing efforts on including all “knowledge and communities that have been left out by structures of power and privilege”. To support this goal, and measure the progress towards knowledge equity, the Research team at the Wikimedia Foundation has identified the need to operationalize knowledge equity into a knowledge gaps index – a composite index tracking the collective evolution of knowledge gaps in Wikimedia.

While extremely important, operationalizing knowledge equity by studying and measuring its underlying factors is not a trivial task. The first step towards this goal is to generate a systematic picture of those Wikimedia audiences, groups, and cultures that could be underrepresented in terms of participation, representation, and coverage.

Therefore, in this work we propose the first taxonomy of knowledge gaps[note 1] in the context of Wikimedia projects. We identify three macro-dimensions of the Wikimedia ecosystem, namely Readers, Contributors, and Content as the root of the taxonomy. We then review studies and discussions from scholars, researchers, practitioners and community members, and compile for each of these dimensions a list of knowledge gaps as areas of the Wikiverse where we found evidence of potential inequality. Finally, we group these gaps into facets containing highly related knowledge gaps. The final 3-layer taxonomy is illustrated in Figure 1.

Fig. 1. The Knowledge Gaps Taxonomy

This taxonomy provides a theoretical framework based on which we can develop different approaches to address knowledge gaps. First, it constitutes a necessary first step for the development of robust metrics and indicators to quantify gaps, which allow to track progress towards knowledge equity[17]. Second, it facilitates the conceptualization of the interdependence between different gaps in order to understand potential causes and barriers, and to design more effective interventions. Third, with this taxonomy, we aim at fostering conversations across the Wikimedia and academic communities around the nature and composition of the “content and communities that have been left out by structures of power and privilege”[1].

In the remainder of this paper, we will first present each macro branch of the taxonomy: Readers in Section 2, Contributors in Section 3 and Content in Section 4. We will then provide some suggestions on how different audiences can use and benefit from this taxonomy (see Section 5), expose interesting directions for future work in Section 6, and finally look at the methodology used to compile the taxonomy (Section 7).

  1. Following strict definitions[13], the final product should be defined as a typology rather than a taxonomy, as “Taxonomies differ from typologies in that they classify items on the basis of empirically observable and measurable characteristics”[13], while typologies group concepts according to characteristics which are less tangible and might not exist in physical reality. However, given the widespread usage of the term “taxonomy” to classify objects according to abstract properties (e.g. for emotions, clustering algorithms, or educational objectives)[14][15][16], we will use this term to define the final product of our work throughout the manuscript.

READERS[edit]

The readership dimension of knowledge gaps encompasses all those gaps related to readers’ access to Wikimedia sites. We identify different areas where readers are under-represented according to literature, surveys, and community strategic directions and organize them around three main facets: sociodemographics, information needs, and accessibility. We also highlight different initiatives aiming at closing each gap. We define readers as all users who connect to the site to consume Wikimedia content. While there exists a body of research[18][19][20][21][22][23] studying how content consumption happens outside of Wikimedia, e.g., voice assistants, search engines, or thirdparty apps, we scope down our definition of readership to readers who come directly to the projects to access content. The objectives for this dimension are mainly motivated by the 2020 Movement Strategy, which recommends that Wikimedia platforms should be designed to “enable everyone—irrespective of gender, culture, age, technological background or skills, or physical abilities—to enjoy a positive experience during both consumption and contribution to knowledge throughout the Wikimedia ecosystem”[24].

Facet Gap Description Source
Sociodemographics

Objective: readers with different social status, demographics, and cultural background can easily and safely accessing free knowledge

Gender Difference between readers of different gender identities in how and how much they access the sites. literature[25][26][10], surveys[27][28][29][30][31][32][33][34][35], strategy[1][24]
Age Difference between readers of different age in how and how much they access the sites. literature[25][26][10], surveys[27][28][29][30][31][32][33][34][35], strategy[1][36][24], community[37][38][39]
Locale Differences in readership between rural areas, towns, and cities literature[10], surveys[30][31][32], community[40][41]
Language Differences in readership depending on readers’ ability to read one or more languages surveys[42][28][32][35], strategy[1][24], community[43][44]
Income Difference on how readers with different income, wealth, or employment status access Wikimedia sites literature[25][10], surveys[42][42][45][27][28][33][46][34]
Education Differences in readership depending on readers’ educational background literature[25][10] surveys[42][45][27][28][32][33][34][34][35][47], community[48][49]
Beliefs Difference in how and how much people having different beliefs access content on Wikimedia sites community[50]
Background Differences in readership among people with different cultural, political and sexual preferences community[51][52]
Information Need

Objective: readers with different information needs can find and consume free knowledge

Motivation Differences in readership depending on the reason behind readers’ visit to the site literature[53][6], surveys[32]
Information Depth Differences in readership depending on

the depth of information for which a reader is looking

literature[6][54], surveys[32][55]
Familiarity Differences in readership depending on one’s prior familiarity with a topic literature[53][6], surveys[32], community[56]
Accessibility

Objective: readers with different technical setup and skills can easily access Wikimedia projects

Internet connectivity Contrasts among the ability of readers with different internet connections to access Wikimedia sites surveys[42][28][30][34], strategy[24], community[40][57][58]
Device Difference in accessibility to the site depending on readers’ devices surveys[28][34], strategy[24], community[59]
Tech Skills Differences in readership depending on readers’ general internet skill literature[10], strategy[24], other[60]
Disabilities Disparities in ability to access the knowledge within Wikipedia depedning on individual disabilities literature[61], community[37][62][63][64][65][66][67][68]

Sociodemographics Gaps[edit]

Sociodemographic readership gaps have been widely discussed across the Wikimedia movement, exposed by surveys organized by the Wikimedia Foundation and the chapters, and analyzed by research studying the Wikimedia ecosystem. This facet includes gaps in readership related to demographics such as gender, age, language and location, and social status such as income and education. The objective for this facet states that different socio-demographics groups should not face greater barriers to their ability to access and consume content through Wikimedia projects.

Gender[edit]

The gender gap is the difference between readers of different genders in how and how much they access the sites. The gender gap has been measured through many readership surveys organized by the Wikimedia Foundation[42][45][28][29][30][31][32], different chapters[27][33][35], independent organizations[46][34][47] and research communities[25][26][10] by asking respondents for their gender identity. This makes the gender gap one of the better understood constructs with respect to readership. However, almost all surveys have only included men and women as gender identities, making it difficult to draw any conclusions about readership among people with non-binary gender identities. Furthermore, conclusions about gender gaps might differ depending on the definition of readers. For example, the most recent survey in this space[32] defined readers as people who read Wikipedia daily, and found that, in almost every country/language surveyed (with the exception of Romanian), there is a substantial gender gap, i.e. men tend to generate many more page views than women even in regions where previous surveys have shown no strong gender gaps in occasional usage. A number of factors cause some genders to be underrepresented in Wikimedia’s readership population, and across the years, different community initiatives have focused on bridging this gap[36].

Age[edit]

The age gap reflects how and how much readers of different age access Wikimedia sites. There exists a large volume of data on readers’ age, collected by surveys and academic literature[42][45][27][28][29][30][31][32][25][33][26][46][34][10][35][47]. In cases where the agedistribution at country level is available (so the age distributions can be compared against the full population), the data indicates that readers tend to be much younger than the general populace. Among other things, the quality and presentation of content might affect this gap, and various initiatives across the movement focus on making content more available and readable for specific age ranges, such as Simple English Wikipedia[39], the Wikijuniors project from Wikibooks[38], or WikiProject Accessibility[37], which, among other things, aims at making content more accessible to elderly readers.

Locale[edit]

The locale gap refers to the different levels of readership between rural areas, towns, and cities. Proximity to urban areas tends to be a strong proxy for availability of services. While there are no global standardized definitions for how to define urban/rural, which makes it difficult to compare self-reported locale to official statistics, results from WMF-led and academic surveys[30][31][32][10] suggest strong over-representation of urban areas among the reader population. To help bridging this gap, projects such as WikiConnect[41] and the WMF’s “New Readers" initiative[40] work on connecting Wikipedia with rural areas and those people who don’t have direct access to Wikipedia.

Language[edit]

The language gap reflects the different levels of readership depending on readers’ ability to read one or more languages. What languages an individual can read greatly impacts what content is available to them and can introduce greater barriers if they are forced to read content in a language that is less familiar to them. Surveys have been conducted to estimate readers’ literacy[42][28][32][35] suggesting that certain languages have highly-literate readers. For example, languages that are specific to one country show high levels of literacy amongst readers. In contrast, other languages such as English or French, which are more strongly associated with colonialism, have many readers for which English / French is not their native language. In order to address this issue, in English Simple Wikipedia was introduced using a simpler grammar and a limited vocabulary. While improving readability in comparison to English Wikipedia, research has shown that its level is still not ideal for readers with limited language literacy[69]. Other community initiatives attempting to bridge this gap aim at growing under-represented languages, for example Scribe[44] or the GapFinder tool[43].

Income[edit]

The income gap is the difference in how readers with different income, wealth, or employment status access Wikimedia sites. Income, wealth, and employment status are three separate constructs, but they all relate to an individual’s means and class status. Of note, it can be difficult to disentangle the impact of these concepts from other, highly-correlated measures such as education level. Surveys[42][42][45][27][28][33][46][34] and recent research[25][10] consistently indicate that people with more income are also more likely to be readers of Wikipedia though the relationship between employment status and readership is less clear. The other strong, and sometimes countervailing, trend is that students also tend to be much heavier readers of Wikipedia.

Education[edit]

The education gap reflects differences in readership depending on readers’ educational background. While education systems vary across different countries, one commonly-used country-agnostic proxy for education level is the number of years of education. Surveys from researchers[25][10] and from WMF and affiliates[42][45][27][28][32][33][34][34][35][47] consistently demonstrate that individuals with higher levels of education are more likely to be readers of Wikipedia. Some Wikimedia projects are trying to close this gap, by specifically targeting readers with different levels of education, for example Wikiversity[49]; relatedly, initiatives such as “Wikimedia+Education” specifically focus on creating community around the topic of making Wikipedia more integrated with education systems at different levels[48].

Background[edit]

The background gap includes all differences in readership among people with different sexual preferences or cultural, political and religious beliefs. Studies on readers from emerging markets found some evidence that local socio-political constructs might impact the way in which Wikipedia’s credibility is perceived[40]. While we could not find studies of how sexuality and culture relates to readership, there have been discussions of how sexuality is represented in content that make it clear that a reader’s sexual identity could relate to how much they trust and feel safe within Wikipedia, and initiatives such as the "Wikimedia LGBT+ Portal"[51] actively work on creating a “safer environment for LGBT+ readers of Wikimedia projects, with a special interest in improving the experiences of LGBT+ youth on Wikimedia projects”. Finally, Wikipedia by nature aims to structure its content to be welcoming to readers with different individual identities, perspectives and opinions. For example, the “Neutral Point of View” policy[52] encourages editors to represent “fairly, proportionately, and, as far as possible, without editorial bias, all the significant views that have been published by reliable sources on a topic”. Most friendly space policies in Wikimedia also encourage inclusiveness at many levels, including the respect of people’s religious views. Moreover, surveys around community health report religion as one prominent factor creating barriers to involvement in Wikimedia communities[50].

Ethnicity and race bring additional context to an individual’s cultural background and often relate strongly with access to resources. Studies in the United States have shown gaps in usage along racial lines[46][34][10] and presumably these same trends are found in other countries in that people who are disenfranchised due to their race or ethnicity also would be less likely to access Wikipedia.

Information Need Gaps[edit]

This facet deals with the motivation, information needs and prior knowledge held by a given reader, and how those affects their likelihood to read Wikipedia. Hypotheses for why readership might vary under these different information need contexts include readers’ perceptions of the utility of Wikipedia, their ability to locate information effectively, and external re-use of Wikipedia content (e.g., rich search results) that satisfy readers’ needs before they reach the Wikipedia platform. It is nearly impossible to predict a given reader’s motivation, familiarity, or information depth based upon the article they are reading and anecdotally readers experience the full range of these contexts over time or even within a given day or reading session.

The objective in this facet is to make the site and its content suitable for people with different information needs.

Motivation[edit]

The motivation gaps reflect different levels of readership depending on the reason behind readers’ visit to the site. There is quite a bit of variability between languages in readers’ predominant motivations[53][6]. The 2019 Reader Demographics Surveys[32] confirmed that these differences cannot be fully explained by individual demographics.

Information Depth[edit]

The information depth gaps reflect different levels of readership depending on what level of content a reader is looking for on the site—i.e. checking a quick fact, looking for an overview of a topic, or an in-depth read. Analogous to reader motivations, there is quite a bit of variability between languages in what readers’ predominant information depths are. The 2019 Reader Demographics Surveys[32] confirmed that these differences cannot be explained by individual demographics. Still not fully understood is how these information needs might relate to the utility of Wikipedia for a given reader and whether readers with information needs like “fact” or even “overview” are largely consuming Wikipedia via external re-use and are therefore underrepresented in the existing data. Additional information about reader information depth was gathered in past reader surveys[55][6][54].

Familiarity[edit]

The familiarity gaps reflect different levels of readership depending on one’s prior familiarity with a topic. Similar to the motivation, someone’s prior knowledge about a topic might affect their perception of Wikipedia as an appropriate data source, and average readers’ prior knowledge on the topic they read varies across Wikipedia language editions[53][6]. Ongoing community discussions about content “Depth vs Breadth” indirectly reason about solutions to address these gaps[56].

Accessibility Gaps[edit]

This facets describes the gaps in readership related to the ability to consume Wikimedia content. There are a number of factors that affect the accessibility of Wikimedia content, including the physical resources available to someone, and their individual abilities. While there is overlap to the gaps described in other facets, the accessibility facet specifically focuses on barriers that prevent individuals who would like to read Wikimedia content from accessing the knowledge, such as internet availability, tech skills, or disabilities. Evidence from Wikipedia Zero[70] indicates how extreme and contextual these gaps can be: for example, pageviews from Angola dropped by 80% following the deactivation of Wikipedia Zero whereas no change was seen in Kuwait. The Wikimedia Foundation has done much formative research in this domain, which led to supporting offline access and awareness campaigns[40]. The objective in this facet, inspired by the “Better User Experience” recommendation of the movement strategy[24], is to break down the technical barriers preventing people from accessing free knowledge.

Internet Connectivity[edit]

The internet connectivity gaps reflect contrasts among the ability of readers with different internet connections to access Wikimedia sites. Even though a given Wikipedia article is relatively small in size, slow internet connections can still make it very difficult to reach and browse Wikipedia. From surveys linking internet connectivity to readership[42][30][34], we see much lower readership among low-speed internet users. Evidence from early experiments after the introduction of a new data center in Singapore showed that lower latency correlates with higher longer-term reader engagement[58]. Internet cost can also be a huge barrier to internet access even when a country has good connectivity[28]. The data on usage after Wikipedia Zero program ended suggest that in places where data have indeed a high cost, it is a significant barrier to readership. Initiatives such as Kiwix[57] and other projects from the Inuka team at the Wikimedia Foundation[40] aim at bridging this gap by making Wikipedia more available offline.

Device[edit]

The device gap is the difference in accessibility to the site depending on readers’ devices. With predominantly mobile usage in countries that are only now coming online[28][30], making Wikipedia more accessible via mobile phones is a high priority. Efforts are already underway to support KaiOS[note 1] (lightweight phone OS) and continued development of the mobile interface as well as official Android and iOS apps is important for welcoming this increasingly mobile population[59].

Tech Skills[edit]

The tech skills gap reflects different levels of readership depending on readers’ general internet skills. The latter capture one’s experience with the internet and ability to not only find the content one is looking for but also the ability to verify it. Academic research[10] has found that high internet skills are associated with an increase in participation and readership on Wikipedia. External initiatives such as Mozilla’s Digital Skills Observatory[60], a research project studying the impact of digital skills training on confidence and agency of low-income first-time smartphone users, provide useful insight in how to bridge this gap.

Disabilities[edit]

The disabilities gap reflects how individual disabilities might affect one’s ability to access the knowledge within Wikipedia. While individuals who are blind might be the most salient example, disabilities fall into many categories: cognitive, developmental, intellectual, mental, physical, or sensory disabilities. There is scant literature on the degree to which individuals who read Wikipedia have various disabilities beyond anecdotal evidence—e.g., an interview with Graham Pierce, an individual who is blind and a prolific editor[65], tips for improving accessibility[64], a survey from 2008 that indicated that Wikipedia was a popular site for individuals who use screen readers[61]. A number of projects focus on improving the accessibility of Wikimedia sites though, such as WikiProject Accessibility[37], WikiProject Usability[66], WikiBlind User Group[63], and Para-Wikimedians User Group[62].

While there is little data on readers with disabilities, the barriers facing these individuals are not new so initial metrics might focus on measuring the accessibility of a given project—e.g., proportion of images with captions—as opposed to levels of readership in communities of people with disabilities. In general, providing a variety of ways to access content—e.g., text, images, video, audio—while reducing barriers to understanding—e.g., good color contrast, high readability— is a good approach to improving the accessibility for readers of all abilities[71]. For example, Wikispeech[68] provides text-to-speech for Wikipedia articles and VideoWiki[67] provides a tool for collaboratively editing videos from images and wikitext.

CONTRIBUTORS[edit]

The contributor dimension of knowledge gaps covers all gaps related to categories of people contributing to Wikimedia sites. We define contributors as all individuals who edit or otherwise maintain Wikimedia content. For the purpose of this taxonomy, this definition does not include technical contributors—i.e. the individuals who build the MediaWiki software on which Wikimedia sites run—though the software and choices made in its design certainly are highly impactful on what types of contributors feel supported and what content is created. Contributors gaps are organized into three main facets: sociodemographics, motivations, and accessibility. Similar to the Readership dimension, the objectives in the Contributors facets are informed by the 2020 Movement Strategy, which invite Wikimedia communities to be safe and inclusive[72] and projects to be designed so that everyone is welcome to contribute[24].

Facet Gap Description Source

Sociodemographics

Objective: contributors with different social status, demographics, and cultural background can easily and safely access and contribute to free knowledge

Gender Differences between contributors of different gender identities in how and how much they contribute to the sites. literature[25][26][10], surveys[42][45][73][74][75][28][76][31][77][78][5][79][47], strategy[1][24], community[27][80][81][33][82][83]
Age Differences between contributors of different ages in how and how much they contribute to the sites. literature[25][26][10], surveys[42][45][73][74][75][28][76][31][77][78][47], strategy[1][36][24], community[27][80][81][33][82][84][83]
Locale Differences between contributors of different locales (urban, rural) in how and how much they contribute to the sites. literature[85][10], surveys[31][78], community[80][86][87]
Language Differences between contributors of different reading abilities in a language in how and how much they contribute to the sites. surveys[42][28][78], strategy[1][24], community[80][81][82][88]
Income Differences between contributors with different income, wealth, or employment status in how and how much they contribute to the sites. literature[25][10], surveys[42][45][73][74][75][28], community[27][80][33][82]
Education Differences between contributors of different educational backgrounds in how and how much they contribute to the sites. literature[25][10] surveys[42][45][73][74][75][28][77][78][47], community[27][80][81][33][82][89]
Contextual

Objective: contributors with different motivations and roles can access and contribute to free knowledge

Motivation Differences in contribution depending on one’s reason for contributing to the site. literature[90][91][92][93][94], surveys[73][75][28][81][47]
Role Differences in contribution depending on the type of editing that one chooses to do. literature[95][96], surveys[73][74][75][28][76], community[97][98][99]
Accessibility

Objective: contributors with different technical resources and abilities can easily access and contribute to Wikimedia projects

Internet connectivity Disparities in ability to contribute to the knowledge within Wikipedia depending on one’s access to high-speed internet surveys[28], strategy[24], community[100]
Device Disparities in ability to contribute to the knowledge within Wikipedia depending on one’s device. surveys[42][45][73][74][75][28], strategy[24], community[101][102]
Tech Skills Disparities in ability to contribute to the knowledge within Wikipedia depending on one’s general internet skills literature[10], strategy[103][24]
Disabilities Disparities in ability to contribute to the knowledge within Wikipedia depending on individual disabilities literature[104], community[37][62][63][65]

Sociodemographics Gaps[edit]

Similar to their Readers counterpart, contributors’ Sociodemographic gaps, i.e. demographics gaps such as gender, age, language and location, and social status gaps such as income and education, have been widely discussed by different sources. The objective for this facet states that groups with different socio-demographic characteristics should not face greater barriers to their ability to access and contribute content through Wikimedia projects.

Gender[edit]

The gender gap is the difference between individuals of different gender identities in how likely they are to contribute to Wikimedia sites. The gender gap has been measured through many surveys organized by the Wikimedia Foundation[42][45][73][74][75][28][76][31][77][78][5][79], different chapters[27][80][81][33][82][83], independent organizations[47] and research communities[25][26][10] by asking respondents for their gender identity, though generally limiting the possible answers to the binary man/woman choice. Conclusions about gender gaps might differ depending on the definition of contributors. For example, recent surveys in this space[5] stratified editors by how many edits were associated with their account. For all three languages surveyed—Arabic, English, and Norwegian—the gender gap is less extreme for editors with fewer edits. While researchers are still working on investigating the causes behind some gender identities being under-represented in Wikimedia’s contributor population, different community initiatives are focusing on bridging this gap[36].

Age[edit]

The age gap is the difference between individuals of different age in how likely they are to contribute to Wikimedia sites. Readers’ age data has been collected by surveys and academic literature[42][45][73][74][27][75][28][76][31][77][78][25][80][81][33][26][10][82][47][83]. These studies found that the median age of contributors varies substantially, but the median age tracks slightly lower than country-by-country median age.[note 1] This is best captured by the 2018 Community Insights survey[77]. Median age ranges from around 40 in Western Europe[76][77][80][81][83] to the 20s in Eastern Europe[26][82] and “Global South”[28]. Globally, the median age has increased overtime from the low 30s[74][75] to high 30s[76][77]. Throughout the years, several initiatives have been developed to bridge this gap and involve senior citizens as contributors for Wikimedia projects[84].

Locale[edit]

The locale gap is the difference between where an individual lives—e.g., rural areas, towns, cities—and how likely they are to contribute to Wikimedia sites. Results from WMF-led and academic research[31][78][80][85][10] suggest strong over-representation of urban areas among the contributor population. Some projects have started to help grow the pool of contributors from rural areas, for example activities for Wikimedia Spain[87] or the “Wiki Loves Villages” initiatives[86].

Language[edit]

The language gap is the difference between an individual’s fluency in a language and how likely they are to contribute to Wikimedia sites. Surveys have been conducted to estimate contributors’ literacy or language skills[42][28][78][80][81][82] and the Babel system[88] is widespread on user talk pages and offers an alternative to understanding the fluency of contributors. Though it may feel intuitive that fluency would be required to contribute, lowering the barrier to contribution by lower-fluency individuals can be important for effective patrolling in small wikis[105], increase the diversity of contributors, and allow for the cross-pollination of content that might otherwise remain locked up in other languages[106]. Many editors are multilingual and contribute to Wikipedia in a variety of languages, with small wikis heavily depending on multilingual editors and English the most common second-language outside of one’s native language[28][106][81].

Income[edit]

The income gap is the difference between individuals of different income, wealth, or employment and how likely they are to contribute to Wikimedia sites. Research on contributor income[42][10] and employment[42][45][73][74][27][75][28][25][80][33][82] does not show consistent patterns in contribution by income[10] but higher contribution rates among individuals who are employed[28][10]. This is presumably convoluted by the high number of student contributors to Wikipedia[75].

Education[edit]

The education gap is how an individual’s level of education affects the likelihood that they contribute to Wikimedia sites. Surveys[42][45][73][74][27][75][28][77][78][25][80][81][33][10][82][47] consistently demonstrate that individuals with higher levels of education are more likely to be contributors to Wikipedia. Despite this skew, many academics, students, librarians, and other scholars do not contribute to Wikipedia and organizations like Wiki Education[89] and the Wikipedia Library work to change this.

Additional Characteristics[edit]

It is not possible to document all the ways in which gaps might be identified in the sociodemographics of the contributor population. We describe here several characteristics, however, that either are largely unstudied or too contextual to be easily studied in a global manner.

While studies of how sexuality relates to contributors are limited, there have been surveys and discussions of how sexuality is represented in content that make it clear that a contributor’s sexual identity could relate to how much they trust and feel safe within Wikipedia[73]. For more information, see the LGBT+ Portal[51].

Ethnicity and race are very contextual as to what it means about an individual’s status and access to resources. Studies in the United States have shown gaps in contribution along racial lines[10] and presumably these same trends are found in other countries where groups of people who are disenfranchised due to their race/ethnicity are also less likely to contribute to Wikipedia. See also[73][27][81] and initiatives such as AfroCrowd[107], Black Lunch Table[108], and Whose Knowledge?[109].

No comprehensive surveys were found that included measures of individual beliefs related to politics, religion, or culture such as those asked by the European Values Survey[110]. Research[111][112] has attempted to assign political leanings to editors, however, based upon their contribution history, showing interest in how individual beliefs of contributors affect the process of collaboration, the degree to which they contribute, and what articles they edit. One of the main recommendation within the 2030 movement strategy encourage communities to bridge these gaps by making the community more inclusive and safe[72]. One of the main mechanisms the community uses to ensure respect for minorities and for anyone disregarding of religious belief or sexual orientation is the Friendly Space Policy, which is in force at every community event[113].

Contextual Gaps[edit]

Individuals contribute to Wikipedia for many different reasons and in many different ways. Our tools need to support this diversity of motivations and types of work or we risk exacerbating gaps by only supporting certain types of contributors. Of note, while research that seeks to reduce barriers to contributing (e.g., socialization via Wikipedia Teahouse[98], personalized edit recommendations from SuggestBot[114] or GapFinder[11]) has been demonstrated to increase contributions, research that has focused on playing directly to editors motivations is a more cautionary tale and can instead supersede existing motivations (see Gamified Onboarding[115]), or be perceived as manipulative[116].

Motivation[edit]

Motivation gaps reflect different levels of contributions depending on the reason behind an individual’s desire to contribute to the site. Contributor motivations include both the intrinsic (e.g., support of free knowledge, for fun, curiosity, personal satisfaction) and extrinsic (e.g., fixing errors, promoting a topic, professional or school-related reasons, learning a new skill, money). This wide range of motivations affect contributors’ levels of activity[90][92] and roles that they take on[91]. Contributors’ initial motivations[note 2] can also differ greatly from the reasons they stay.[note 3] See “why do people edit?”[94] , Balestra et al.[93], various surveys[73][28][81][47], or the detailed results[note 4] from the 2012 Editor Survey[75] for additional details.

Role[edit]

Roles reflect different levels of contributions depending on what type of work an individual does to support the wikis. There are explicit roles—i.e. user access levels[99]—on Wikipedia through which users ascend[95]. However, within these access levels, individuals can do very different types of work. Yang et al.[96] identified seven editor roles that users of all access levels undertake. These can be expanded to more specific types of work but offer a high-level view and can be modeled from edit activity[note 5]

  • Social Networker: communicate on user pages and communication namespaces
  • Fact Checker: removal / verification of content
  • Substantive Expert: content producers – adding substantive content to articles
  • Copy Editor: grammar, paraphrase, relocation
  • Wiki Gnomes: clean up content and wikitext markup issues
  • Vandal Fighter: reverting vandalism, warning editors, and other patrolling work
  • Fact Updater: updating templated content or Wikidata
  • Wikipedian: working behind the scenes (non-article mainspace) to keep things organized etc.

Surveys generally ask contributors what types of work they do on-wiki[73][74][75][28][76], but these tasks can be matched to the above roles. Initiatives such as the Wikipedia Teahouse[98] or the Newcomer Page[97] help newer Wikipedians learn how to familiarize themselves with the platform and progressively become more expert editors.

Accessibility Gaps[edit]

This facets describes the gaps in contribution related to the ability to access and edit Wikimedia sites. These gaps specifically focus on barriers that prevent individuals who would like to contribute to Wikimedia from sharing their knowledge, such as internet availability, tech skills, or disabilities. These gaps can be extreme and far-reaching. Zhang and Zhu[117] found that blocks to Chinese Wikipedia in Mainland China in October 2005, beyond preventing contributions from individuals located in Mainland China, also led to a 42.8% drop in contributions from non-blocked individuals. The objectives in this facet, inspired by the “Improve User Experience” recommendation of the movement strategy[24], encourage the breaking down of the technical barriers preventing people with accessibility limitations to access free knowledge.

Internet Connectivity[edit]

The internet connectivity gap relates to how internet access affects one’s ability to contribute. This covers both internet speed and internet cost. While Wikipedia content is relatively lightweight, cost and speed have been identified as a major barriers to contributors[28]. Graham et al.[118] found that removing broadband access barriers is a necessary but insufficient condition for content generation. Initiatives like GLOW[100] work on addressing this gap by subsidizing internet costs for contributors.

Device[edit]

The device gap refers to different levels of contributions depending on what device someone has available. Editing classically requires a good laptop or desktop computer in order to easily use the available editing interfaces while potentially also gathering research in other windows. There has been a shift to mobile that has been accompanied with huge improvements in mobile editing interfaces (e.g., Suggested Edits[102]). Additionally, efforts have been made to provide laptops (e.g., GLOW[101]) to new editors in order to reduce this technological burden. Various surveys[42][45][73][74][75][28] have asked contributors about their devices.

Tech Skills[edit]

The tech skills gap reflects different levels of contributions depending on an individual’s general internet skills. While extensive effort has been put into simplifying the process of contributing to Wikipedia, editing articles, uploading images, and identifying sources can still pose substantial technical challenges for individuals. Academic research[10] has found that high internet skills are associated with an increase in awareness that Wikipedia can be edited and having edited Wikipedia. Edit-a-thons[note 6] in particular can help to bridge this gap and the 2030 strategy calls for further investment in skills and training[103].

Disabilities[edit]

The disabilities gap reflects how individual disabilities might affect one’s ability to access and contribute to the knowledge within Wikipedia. While individuals who are blind might be the most salient example, disabilities fall into many categories: cognitive, developmental, intellectual, mental, physical, or sensory disabilities. There is scant evidence beyond the anecdotal of the degree to which individuals with various disabilities contribute to Wikipedia (see Userboxes in WikiProject Accessibility[37], an interview with Graham Pierce[65], and evaluation from 2008 of editing with a screenreader[104]). Various groups have also been established for individuals with disabilities such as the WikiBlind User Group[63] and Para-Wikimedians User Group[62].

CONTENT[edit]

Wikipedia is incomplete by design. The opportunity to share new information with the world is a major motivating factor among both new and established Wikipedia contributors. However, when important information about a topic is absent, incomplete, biased, or otherwise inaccessible to readers, these content gaps can undermine Wikipedia’s ability to serve the needs of its global audience.

The goal of this Section is to characterize gaps in content coverage. In the widest sense, “content” is information about a topic, i.e. a piece of knowledge that could be the focus of one or more Wikipedia articles. “Coverage” refers to how well Wikimedia project content addresses a particular topic. In turn, a content gap refers to differences in coverage of one or more topics.

Facet Gap Description Source

Policy

Objective: content is consistent with core content policies

Verifiability Differences in the use of reliable sources in order to verify content. literature[119][120][121][122], community[123][124][125][126][127][128]
Neutrality Biases in the content across Wikipedia articles . literature[129][130][131][132][133][134][135]. community[127][52][136][137]
Accessibility

Objective: content is accessible to different audiences

Multimedia Differences in coverage with respect to the type of media used to share the content literature[138][139][140], strategy[1][141], community[142][143][144][145][146][147][148][149][150]
Structured Data Differences in the use of information which is indexed and machine-readable literature[151][152], strategy[153][1], community[154][155][156][157]
Readability Barriers for accessing or consuming information originating from content literature[158][69][11][159], strategy[1], community[160][161][162]
Diversity

Objective: content covers knowledge that is underrepresented, marginalized, and locally relevant

Gender Differences in content coverage depending on the gender identity of subjects literature[163][164][165][166][167][25][168][169][170][171][172][173][174][175], strategy[153][176], community[177][109][178][179][180][3][150][181]
Geography Differences in coverage of topics related to geographic regions or population distribution literature[182][183][184][185], strategy[153][176], community[109][179][186][187]
Impactful topics Differences in coverage of topics that are of common interest literature[188][189][190][191], strategy[153][176], community[192][193][109][194][179]
Cultural context topics Differences in coverage of topics related to the history, heritage, and characteristics of a current or former cultural group literature[195][196], strategy[153][176], community[197][109][179][198]

Policy Gaps[edit]

Wikipedia content is governed by three principal core content policies – Neutral point of view, Verifiability, and No original research – which define the scope and the material that should exist in the online encyclopedia. These policies shape the way in which content is added to Wikipedia, and, to some extent, to its sister projects. Gaps in this facet relate to the two main policies of neutrality and verifiability reflecting biases and lack of reliability in Wikipedia articles, respectively. Thus, the objectives in this facet are mainly inspired by the guidelines within the core content policies, which mandate that content in Wikipedia should be verifiable and neutral.

Verifiability Gap[edit]

The verifiability gap reflects differences in the use of reliable sources in order to verify content in Wikipedia. In fact, Wikipedia’s Verifiability core content policy[128] requires every piece of information which has been challenged – or is likely to be challenged – to be backed by a reliable source. Thus, adding high-quality citations is one of the key mechanisms that the communities have to bridge the Verifiability gaps. Previous research has deepened our understanding of citation usage by highlighting gaps in terms of the types of sources used[120] or their accessibility[121] as well as in developing methods for how to automatically detect the citation span of individual sources[119] or classify what content is missing citations[122]. There exist different initiatives to help editors monitor citation quality at scale such as the Citation Detective tool[123], which automatically produces dumps of sentences requiring citations in English Wikipedia, or Wikicite[126], whose goals include the “improvement of citations in Wikimedia projects and an open, collaborative repository of bibliographic data for innovative applications”. Furthermore, the Wikipedia Library portal[124] constitutes a fundamental tool to help bridging the verifiability gap, as it provides reliable sources that editors can use to improve the articles, as well as promoting community initiatives such as the “1Lib1Ref” campaign, which encourages librarians to add references missing in Wikipedia. While citations and references are a key tool to monitor the quality of knowledge coming from written sources, how to incorporate oral knowledge within the verifiability framework, remains an open point for discussion[125].

Neutrality Gap[edit]

The neutrality gap reflects biases in the content across Wikipedia articles. In fact, the Neutral Point of View (NPOV) policy[52] demands that all encyclopedic content should represent all significant views fairly, proportionately, and without bias. To help understanding and bridging this gap, researchers in various fields have tried to characterize the dynamics of NPOV disputes[131][134], as well as quantify the NPOV gap from explicit (statements supporting a certain POV) or implicit (omission of certain aspects) bias[132]. Examples include the use of NPOV-templates to automatically detect biased language in the content of Wikipedia[133][135], or documenting biases in specific topics such as politics[130] or culture[129] among others. The Neutral point of view Noticeboard[136] is one of the main sub-communities dedicated to the discussion of neutrality of content in English Wikipedia.

Accessibility Gaps[edit]

Content in Wikimedia projects is multimodal by nature, and can take very different forms, such as images, text, structured data, etc., and the form of presentation clearly affects how accessible content is to different audiences[1]. The gaps in this facet cover the different nature of the content types available in Wikimedia spaces through the lenses of their ability to break down barriers preventing people from accessing free knowledge. The objectives in this facet mainly focus on bridging accessibility gaps from a content perspective, aiming at having more images, structured data, and readable text across Wikimedia projects.

Multimedia Gap[edit]

The multimedia gap reflects differences in coverage with respect to the type of media used to share the content. Acknowledging the potential of different forms of media beyond text (image, audio, video, geospatial, etc.) to convey content to different audiences, the Movement strategy[141] recommends building the necessary technology to make free knowledge content accessible in various formats and support more diverse modes of consumption and contribution to Wikimedia projects. For example, with the bulk of a typical article made up of text, the use of images is encouraged in order to increase readers’ understanding of the subject matter (see, for example, English Wikipedia’s policy on image use[147]). However, on average, half of Wikipedia articles are missing images, and around 95% of Wikidata items do not have a value for the image-property (P18)[note 1][143]. This is exacerbated by the fact that with few exceptions (e.g.[138][139][140], the role of visual and multimedia aspects in Wikipedia and other projects has largely been ignored by researchers. To help bridging this gap, several recent research initiatives[142][143][144][145] aim at designing smart tools for image analysis and adding structure to the Wikimedia Commons repository. Tools built by community members also support editors in discovering and adding images in Wikimedia projects, including the WDFIST[146] and the Wikishootme[148] tools. Finally, several community initiatives are organized around adding pictorial representations of under-represented people and topics, such as Wiki Loves Monuments[149], Visible Wiki Women[150], and Wikipedia Pages Wanting Photos[199].

Structured Data[edit]

The structured data gap reflects the differences in the use of structured data across Wikimedia projects, namely the amount of information which is organized and indexed in a machine-readable fashion. Structured data offers the potential for managing data for Wikimedia projects on a global scale, allowing easier indexing and offering tools for multilingual data creation and description. One of the most prominent examples of structured data is Wikidata, a structured data knowledge base which was designed with the goal of increasing knowledge diversity[152]. Information from Wikidata is being re-used in Wikipedia articles in different ways, most notably in infoboxes[157] but perhaps most commonly in metadata templates[200]. The Wikidata Concepts Monitor[155] provides quantitative insight into the degree of Wikidata usage, albeit its definition of usage in terms of aspects makes it non-trivial to interpret the statistics with respect to individual templates or infoboxes. In addition, it must be noted that, beyond some established cases, the degree to which Wikidata should be used in Wikimedia projects is part of ongoing discussions within the respective communities (see for example the case in English Wikipedia[156]). Given the importance and potential of structured data, the WMF’s Medium Term Plan 2019-2020 (Platform Evolution) aims at increasing its usage across Wikimedia projects[153]. Product initiatives such as the Structured Data on Commons program[154], as well as new proposals for an Abstract Wikipedia[151] aim at closing this gap and enable communities to consume and contribute structured data across languages.

Readability[edit]

The readability gap refers to how difficult it is to read the content of an article or a piece of information in Wikimedia projects in comparison to the reading abilities of their readers. Using statistical tools such as the Gunning-Fog index,[note 2] researchers have tried to quantify this gap by computing language complexity of texts in Wikipedia articles[159]. Different studies[158][69] have shown that the difficulty is considerably above the reading ability of average adults even for projects such as Simple English Wikipedia, which are explicitly aimed at closing this gap.

An indirect approach to bridge the gap is to make content available in different languages given the high proportion of non-native readers in some languages[32]. For this, researchers developed machine-learning models to discover and prioritize articles missing in a given Wikipedia language edition which, in turn, are recommended for creation to editors[11]. Platforms such as the Content Translation Tool[160] help in translating articles to any language available in Wikipedia.

Additionally, content might be structured in such a way that it is inaccessible to readers with various cognitive, developmental, intellectual, mental, physical, or sensory disabilities—e.g., blindness, dyslexia. For English Wikipedia, the Accessibility component of the Manual of Style[162] was written to address these issues and the Accessibility Dos and Don’ts[161] provides a quick overview of some of the more prominent accessibility gaps in content. In particular, the following barriers are identified (to which we add how they might be better tracked):

  • Do use high contrast and color-blind friendly color schemes: {{Template:Overcolored}}, {{Template:Cleanup colors}}, and {{Template:Overcoloured}}, all of which add articles to [[Category:Wikipedia articles with colour accessibility problems]].
  • Do provide alt text and a caption for most images: {{Template:Alternative text missing}} adds articles to [[Category:Unclassified articles missing image alternative text]], but this should also be detectable from wikitext dumps.
  • Do provide a text description of any charts or diagrams: for tables, this should be detectable via wikitext dumps.
  • Do nest section headings sequentially: pseudoheadings would be harder to detect but finding articles where section headers are out of order should just require going through parsed wikitext (or parsing the wikitext dumps with mwparserfromhell[note 3] or similar libraries).
  • Do create correctly structured tables: it should be possible to detect lack of headers / scope for tables but other issues such as tables for layout or incorrectly structured tables would be much harder to detect.
  • Do encase non-English words or phrases in {{lang}}: {{Template:Cleanup lang}} tracks instances where the {{lang}} template should be used and adds them to [[Category:Pages with nonEnglish text lacking appropriate markup]]. There are also machine-learning models that do language identification but they are unlikely to work well for identifying the short stretches of other-language text for which these templates tend to be used.

Content diversity[edit]

The content diversity facets includes all gaps related to topic coverage in Wikimedia projects. Developing a comprehensive, hierarchically-structured, and canonical representation of Wikipedia content is a non-trivial if not impossible task. Wikipedia’s own category structure for content[note 4] is not hierarchical, and the language and culture-specific nature of topic relevance make the creation of an exhaustive list of topic gaps infeasible. Therefore, our aim here is to provide a list of example topic gaps that have received widespread attention in public media, research studies, and include topics highly curated by community projects and discussed across different parts of the movement, such as gender, geography, and culture gaps. Objectives in this facet are inspired by both the Wikimedia Foundation’s Medium-term plan[153], which supports diverse content creation “by actively foster[ing] the inclusion of underrepresented and marginalized knowledge, and ensur[ing] content is locally relevant to communities”, as well as the the movement strategy directions[176] which encourages “improving coverage of collectively-identified priority topics that impact our world and improve people’s lives”. Community initiatives such as the Wikiproject Countering Systemic Bias[179], the Wikipedia Diversity Observatory[201], or WhoseKnowledge?[109] aim to address imbalances in the coverage of subjects and topics.

Gender Gaps[edit]

Topic gender gaps refer to the differences in content coverage depending on the gender identity of subjects. The gender gap – the fact that the majority of Wikimedia content about people focuses on male subjects – is one of the most-studied examples of content gaps and well-documented in the research literature[163][164][165][166][167][25][168][169][170][171][172][173][174][175] and interactive tools such as the Wikidata Human Gender Inidicators[3], Denezeleh[177], or the Wikidata Concepts Monitor[180]. A number of Community initiatives such as the Gender Gap Portal[178], the Wikiproject Women[181], as well as organizations such as Whose Knowledge?[150], are focusing their efforts and working hard to address the gender content gap across Wikimedia projects.

Geographical Gaps[edit]

The geographical gap captures coverage differences in topics related to geographic regions or population distribution. Looking at geo-tagged information on Wikipedia content, research has shown that the geographic coverage differs substantially across language editions[182][185] and that geographic coverage is extremely uneven and clustered with a strong bias towards content related to the United States and Western Europe[183][184]. To address this gap, several initiatives across Wikimedia projects aim at increasing content coverage of underrepresented areas, for example the Africa Portal in Wikipedia[186] or the Wiki Loves Africa Contest in Wikimedia Commons[187].

Impactful Topics Gaps[edit]

The topic gap captures coverage differences among common interest and impactful topics implicitly considered equally within the scope of an encyclopedia. The abundance of topics varies substantially across different Wikipedia language editions[188][189], and it is of crucial interest to assess differences in coverage of specific topics; for example the coverage of medical knowledge on Wikipedia[190] since “[its] health content is the most frequently visited resource for health information on the internet”[191] (see also the WikiProject Medicine in English Wikipedia[194]). While the Movement Strategy indicates that the community is still missing tools to identify which topics are most impactful in the world, initiatives such as WikiProject Vital Articles[192] or All Human Knowledge[193] help in addressing this gap by gathering lists of important and impactful topics which should be present in all Wikipedia editions.

Cultural Context Topics Gaps[edit]

The cultural context gap captures coverage differences related to the history, heritage, and characteristics of a current or former cultural group. Cultural identity has been shown to be a crucial part for the motivation of editors to contribute[195], and the project Wikipedia Diversity Observatory[198] aims to define and quantify the extent of articles which can be considered cultural context content[196]. Initiatives specific to individual projects, such as Wikiproject French Caribbean Culture in Wikidata[197] gather efforts to address culture-specific content gaps.

HOW TO USE THIS TAXONOMY[edit]

The taxonomy of knowledge gaps developed in this research and the corresponding literature review can be utilized in a variety of ways by different actors in the free knowledge ecosystem and the Wikimedia projects. This includes Wikimedia community organizers, affiliates, Wikimedia Foundation staff, contributors, researchers and partners. Below we describe the different aspect of the taxonomy and how they can be utilized

The taxonomy as a framework for conversations. As evidenced through the work in developing the taxonomy of knowledge gaps, Wikimedia projects face many different knowledge gaps. However, prior to the development of this taxonomy, the bulk of the attention of large actors within the Movement such as the Wikimedia Foundation has been heavily focused on specific gap types. We hope that the more comprehensive taxonomy of gaps can provide a framework for the decision makers to learn about the different gap types, brainstorm about their possible relationships, and devise ways to address them.

The references that support the gaps. As part of the process of developing the taxonomy of knowledge gaps, we conducted a major literature review. This review of past work which is presented as sources in the tables throughout this paper as well as in the reference section can be a valuable resource for those who are eager to expand their knowledge of the particular gaps and learn about their possible causes.

A first step towards measuring knowledge gaps. The definitions of the different gap types provided through the taxonomy is a necessary step for developing metrics to measure the different gap types. The measurement of the gap types and understanding the relationships between them can further help to develop the knowledge gap index, a composite index that can operationalize knowledge equity in the Wikimedia projects and help support more data-informed decision-making processes.

FUTURE WORK[edit]

This taxonomy represents a first step towards understanding the underlying mechanisms that prevent us from reaching knowledge equity, and designing solutions to remove those barriers preventing people from accessing free knowledge. In this Section, we identify the next steps to reach this goal by building on top of the taxonomy presented in this manuscript. We ordered such areas for future work by increasing complexity.

Metrics[edit]

In order to track progress towards knowledge equity, it is necessary to quantify each of the knowledge gaps described in this taxonomy. For this we need to i) carefully define metrics reflecting the gap extent across different categories ii) identify relevant reference data sources to apply the metric and measure the gap amplitude. The main challenges in this endeavour are:

  • Operationalizing a gap: The definition of each gap must be translated into a set of measurable quantities. For example, for the content gender gap in Wikipedia one often uses the property P21 (sex or gender)[note 1] of the corresponding item in Wikidata. Operationalizing other gaps might be less trivial. The case of topical content gaps serves as an illustrative example. Identifying content related to, e.g., medicine, is possible through manual annotation in Wikiprojects[194] though it is time-intensive and requires constant updating which might be unfeasible for the many small and medium-sized projects. Automatic methods such as the ORES topic classification[202] alleviate some of these issues, however, they are currently limited in their general applicability as they only support a small set of languages and a fixed set of 64 topics.
  • Availability of data: Some data for specific gaps is just not available or hard to obtain for privacy or other reasons. For example, obtaining information readers’ age in order to measure the age socio-demographic gap requires careful legal frameworks and succinctly capturing complex and contextual social phenomena like race is largely not possible at a global scale. Furthermore, some data from, e.g. surveys, might come in irregular intervals or is not consistent across time, as for the case of metrics related to contributorship gap[203].
  • One gap, many metrics: For every individual gap, there exist multiple, equally relevant, ways in which to measure it. Thus, it is important to note that there many metrics which capture different aspects of the same gap. The content gender gap provides an illustrative example, as it constitutes one of the most well-studied gaps. Public tools documenting the gap (such as Wikidata Human Gender Indicators[3] or Denelezh[177]) document the number of articles on men and women, respectively, thus capturing only the selection of content. Research has shown that it is equally important to go other aspects such as extent (e.g. comparing the quality of articles on men and women[167]) or the framing (comparing the language in the articles of men and women[173]). Building on initial insights[204], future research needs to understand the role of different metrics of the same gap and how the well-explored content gender gap can inform on metrics in less-explored gaps.
  • Goal: The definition of a metric for a gap allows one to make a statement about whether the gap is small or large. Such definition thus leads to conclusions about whether a gap is closed or not. What this goal should be might not be as obvious as it seems. For example, in the context of the content gender gap, recent discussions in the community revolve around what are suitable baselines for the comparison of the number of biographies on men and women[205]. Future work involves curation of external datasets such as sociodemographic data on country-level and consultation and discussions with communities on how to define the metrics.

Knowledge gap index[edit]

One of the end goals of this taxonomy is to generate a knowledge gaps index, namely a single indicator combining metrics from different gaps, measuring the overall knowledge equity of Wikimedia projects. Socio-economic indices have been adopted by many organizations, advocacy groups and policy makers such as the Global Innovation Index,[note 2] the Human Development Index from the United Nations,[note 3] or the Gender Equality Index from the European Union.[note 4] Such indices are a useful tool for analysis of policies and communication with the public because they can give a coarse-grained view on complex multi-dimensional realities. The index requires the projection and aggregation of different metrics into a composite index and is developed over the course of 10 steps, the first being the construction of a theoretical framework[17]. The main challenges to reach this goal are:

  • Availability of metrics: An index requires the availability of validated metrics (see the discussion in the previous subsection).
  • Robustness: Even if all metrics are available, the robust construction of a composite index is non-trivial. According to recommended procedures, this taxonomy indeed represents only the first, theoretical step towards a comprehensive, robust and high-quality index[17]. Additional steps would include, for example, uncertainty and sensitivity analysis.

Interactions between gaps[edit]

The taxonomy presented here reflects a simplified version of the complex landscape of knowledge gaps by depicting readership, contributorship, and content as three independent dimensions.

However, these dimensions are interrelated and feed into each other, as pointed out by the community[137], research[10], and Wikimedia Foundation in their medium-term plan[153]. Apart from few examples (e.g.[206] on the misalignment between content and readership), our understanding of how different gaps interact with each other is still very limited, and part of our future research work includes shedding light on these interactions

Causes and Interventions[edit]

In order to bridge knowledge gaps, we not only need metrics to monitor the efficacy of individual interventions, but also require an understanding of the underlying causes to design effective interventions. However, causal evidence on causes is scarce. Recent research has identified several candidate hypothesis for individual gaps[207]. An example comes from previous work analyzing knowledge gaps in readership and contributorship[10]: the authors found that the gender gap is higher in contributors than in readers, and proposed interventions including continued efforts in female editors recruitment, as well as awareness campaigns to increase awareness that Wikipedia is editable among women readers.

METHODOLOGY[edit]

The Taxonomy of Knowledge Gaps emerges as the result of hours of literature review, survey analysis, in-depth movement strategy reading, research brainstorming and Wikimedia Foundationwide discussions. To help understand the methodology we used to compile this taxonomy, in this Section we explain its main guiding principles and structural characteristics.

Guiding Principles[edit]

Our task is to order elements of Wikimedia spaces lacking knowledge equity in a taxonomy of knowledge gaps. There are endless ways in which we could identify an order through this large pool of unstructured pieces of information. To help narrowing down the scope while delivering a consistent final product, we let different themes, principles and values guide the selection of the taxonomy classes and levels.

Driven by communities.
The taxonomy is largely inspired by values and principles of the Wikimedia communities and the latest recommendations of the Movement Strategy[1]. At the same time, the taxonomy is grounded in the literature produced by the broader academic and scientific communities who study Wikimedia space from a computational and sociological perspective (for an overview, see reviews[208][209][210] or the Wikimedia Research Newsletter[211]).
Neutrality.
Similar to Bloom’s taxonomy of educational objectives[212], one of the foundational principles of this taxonomy is its neutrality. Our mission is to define a classification of knowledge gaps, thus helping to define some of the barriers preventing people from accessing free knowledge. While doing so, we aim at being as impartial and inclusive as possible, without expressing value judgements on the importance of one gap over another.
Flexibility.
Like any project in Wikimedia spaces, we want this taxonomy to be “editable”. This product is a starting point for an open, structured discussion on knowledge gaps, and we will make the taxonomy available on a Wiki to foster in-depth conversations and research studies. Also, its inherent structure allows the taxonomy to evolve and be expanded and completed by experts and community members who want to contribute to the improvement of its content.
Measurability
One of the end goals of this taxonomy is to help Wikimedia communities measure the impact of their initiatives and content creation. With this in mind, when possible, we explicitly formulated gaps and objectives to incorporate elements that are quantifiable via surveys, large-scale data analysis or other computational methods.

Sources[edit]

To understand relevant components of the taxonomy of knowledge gaps, and make decisions about which aspects of Wikimedia ecosystems to include in the taxonomy, we gathered information from different sources describing or discussing inequalities among readers, contributors, and content in Wikimedia spaces. We explain here our main data sources, and the rationale behind choosing each of them.

Academic Literature.
We tap into ideas from the bulk of academic literature which studies knowledge gaps in Wikimedia. The majority of related research works belongs to the broad field of “computational social science” and tries to characterise and quantify different aspects of Wikimedia communities using a computational approach. For example, researchers have studied gender gaps across different dimensions[25][168], quantified the usage of visual content across different languages[138] or estimated the geographical bias of Wikipedia content[182]. Research works aiming at bridging knowledge gaps using tools from recommender systems and natural language processing also largely helped the creation of this taxonomy: these include methods to grow Wikipedia languages via recommendations[11], as well as machine learning models to score readability of Wikipedia articles, or discover sentences needing citations[158][122].
Community Surveys
Throughout the years, the Wikimedia Foundation, Wikimedia Affiliates, and independent organizations have run surveys to characterize Wikimedia readers and editors. We tap into the questions asked in these surveys to define some of the gaps and facets in our taxonomy, for example sociodemographics gaps or accessibility gaps. Most of these surveys are referenced throughout this manuscript, and a superset of them can be found by checking the appropriate categories on meta.wikimedia.org[note 1].
Movement Strategy
The third source of content for this taxonomy is a set of guidelines from the Wikimedia Movement Strategy. Such guidelines include broad strategic directions, namely Knowledge Equity and Knowledge as a Service[1], as well as more specific recommendations to implement such strategic directions[141][24]. We also borrow from Wikimedia Foundation’s medium term plan priorities[153]. Collectively, these guidelines helped us identify gaps, barriers, and common objectives related to free knowledge.
Community Initiatives
Throughout the years, the Wikimedia community have worked on many initiatives and discussions aiming to address knowledge gap. This not only includes research (e.g. trying to conceptualize or measure the gap) but also covers a wide variety of initiatives such as WikiProjects (e.g. WikiProject Women[181]).

Taxonomy Structure[edit]

Beyond the identification of potential gaps within the Wikimedia ecosystem, one of the main challenges in building a taxonomy of knowledge gaps is to provide a structure for how to analyze and classify the different gaps. In this section we give an overview over the overall structure, and define and motivate the different levels in the taxonomy.

Gap.
A gap corresponds to an individual aspect of the Wikimedia ecosystem (for example readers’ gender, or content topics) for which we found signals of unbalanced coverage across its inner categories (for example, proportion of people who identify as men, women or non-binary across readers, in the case of the reader gender gap). For each gap, we include the following characterizing fields:
  • Description: This field is the definition of the gap, and describes which type of disparity is covered by the gap. For example, the gender gap is defined as “the difference between readers of different genders in how and how much they access the sites”.
  • Sources: The decision to include a gap in the taxonomy is heavily driven by existing sources. The source property of a gap includes all the references we examined when characterising and describing a given gap, grouped by source type (literature, survey, strategic guidelines and/or community).
Facet.
We grouped semantically-related gaps into facets, namely macro-categories describing the general semantics of a gap. For example, gaps such as “gender” or “age” both belong to the “sociodemgraphics” facet. Gaps in the same facet are also characterized by the same Objective. The objective represents the ideal goal to be reached in order to close or address the knowledge gaps in a facet. To make sure we follow the neutrality principle, the objectives we propose are largely inspired by movement strategy recommendations and priorities, thus echoing the results of months of community discussions around the future of Wikimedia projects.
Dimension.
We identify three foundational dimensions as root nodes in our taxonomy grouping different facets: Wikimedia readers, Wikimedia contributors, and Wikimedia content. We will expand the rationale behind the choice of the dimensions in the next Subsection.

Root Dimensions: Readers, Contributors, and Content[edit]

This taxonomy is developed along three dimensions representing macro-elements of the Wikimedia ecosystem: Readers, Contributors, and Content. The choice of these dimensions is broadly inspired by two models summarizing the inner mechanisms of Wikimedia communities.

Figure 2a

A model proposed by Wikimedia Foundation as part of their medium-term plan[153] sketches a diagram in the form of a spinning wheel describing the relationship between awareness, consumption, contributors, content, and advocacy, as shown in Figure 2a.

The model shows how people initially engage with the site (i.e. they become Wikimedia readers) through spontaneous search or awareness campaigns. Some readers then might become more involved in the values or the mission of the Wikimedia movement, thus potentially converting into contributors or donors. When more people engage with the site, the content production becomes more diverse, thus making the site more inclusive for new readers, and the cycle begins again.

Figure 2b

A model by Shaw and Hargittai[10] (“The Pipeline of Online Participation Inequalities: The Case of Wikipedia Editing”) proposes a framework in the form of a pipeline to describe how gaps form at different levels of engagement with Wikipedia (see Figure 2b). Specifically, they propose a series of steps through which a user must go to become a contributor: being an internet user, having heard of the site (awareness of Wikipedia), having visited the site (Wikipedia reader), knowing it’s possible to contribute to the site (awareness they can edit), and then finally having contributed (is a contributor). At each of these steps, different groups of people might be more or less likely to drop off in participation.

For each dimension of this Knowledge Gaps Taxonomy – Readership (Section 2), Contributorship (Section 3) and, Content (Section 4) – we presented a short overview of its meaning and the corresponding taxonomy in a tabular. For each of the dimension’s facet, we provided a detailed explanation for the rationale behind each of the facets

References[edit]

  1. a b c d e f g h i j k l m n o Wikimedia Movement. 2017. Wikimedia Movement Strategy 2017. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017
  2. 2020. Wikimedia Statistics - Monthly Overview. https://stats.wikimedia.org/#/all-wikipedia-projects.
  3. a b c d WHGI. [n.d.]. Wikidata Human Gender Indicators. http://whgi.wmflabs.org/
  4. Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, and Leila Zia. 2020. Global gender differences in Wikipedia readership. arXiv preprint arXiv:2007.10403 (2020).
  5. a b c d Wikimedia Foundation. 2019. 2019 Editor Gender Surveys. https://meta.wikimedia.org/wiki/Research:Surveys_on_the_gender_of_editors
  6. a b c d e f g Florian Lemmerich, Diego Sáez-Trumper, Robert West, and Leila Zia. 2019. Why the World Reads Wikipedia: Beyond English Speakers. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 618–626.
  7. Hichang Cho, MeiHui Chen, and Siyoung Chung. 2010. Testing an integrative theoretical model of knowledge-sharing behavior in the context of Wikipedia. Journal of the American Society for Information Science and Technology 61, 6 (2010), 1198–1212.
  8. Eszter Hargittai and Aaron Shaw. 2015. Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia. Information, communication & society 18, 4 (2015), 424–442.
  9. Amanda Menking and Ingrid Erickson. 2015. The heart work of Wikipedia: Gendered, emotional labor in the world’s largest online encyclopedia. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 207–210.
  10. a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af Aaron Shaw and Eszter Hargittai. 2018. The pipeline of online participation inequalities: The case of Wikipedia editing. Journal of communication 68, 1 (2018), 143–168.
  11. a b c d e Ellery Wulczyn, Robert West, Leila Zia, and Jure Leskovec. 2016. Growing wikipedia across languages via recommendation. In Proceedings of the 25th International Conference on World Wide Web. 975–985.
  12. Anamika Chhabra and SRS Iyengar. 2016. Should wikipedia and quora collaborate?. In 2016 8th International Conference on Communication Systems and Networks (COMSNETS). IEEE, 1–2.
  13. a b Kevin B Smith. 2002. Typologies, taxonomies, and the benefits of policy classification. Policy Studies Journal 30, 3 (2002), 379–395.
  14. Benjamin S Bloom et al. 1956. Taxonomy of educational objectives. Vol. 1: Cognitive domain. New York: McKay (1956), 20–24.
  15. Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y Zomaya, Sebti Foufou, and Abdelaziz Bouras. 2014. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE transactions on emerging topics in computing 2, 3 (2014), 267–279.
  16. Maya Tamir. 2016. Why do people regulate their emotions? A taxonomy of motives in emotion regulation. Personality and Social Psychology Review 20, 3 (2016), 199–222.
  17. a b c Joint Research Centre-European Commission and Others. 2008. Handbook on constructing composite indicators: methodology and user guide. OECD publishing.
  18. Milad Alshomary, Michael Völske, Tristan Licht, Henning Wachsmuth, Benno Stein, Matthias Hagen, and Martin Potthast. 2019. Wikipedia Text Reuse: Within and Without. In European Conference on Information Retrieval. Springer, 747–754.
  19. Kristofer Erickson, Felix Rodriguez Perez, and Jesus Rodriguez Perez. 2018. What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use. In Proceedings of the 14th International Symposium on Open Collaboration. 1–6.
  20. Connor McMahon, Isaac Johnson, and Brent Hecht. 2017. The substantial interdependence of Wikipedia and Google: A case study on the relationship between peer production communities and information technologies. In Eleventh International AAAI Conference on Web and Social Media.
  21. Annabel Rothshild, Emma Lurie, and Eni Mustafaraj. 2019. How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources. In Computation+ Journalism Symposium.
  22. Nicholas Vincent, Isaac Johnson, and Brent Hecht. 2018. Examining Wikipedia with a broader lens: Quantifying the value of Wikipedia’s relationships with other large-scale online communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
  23. Nicholas Vincent, Isaac Johnson, Patrick Sheehan, and Brent Hecht. 2019. Measuring the importance of user-generated content to search engines. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 505–516.
  24. a b c d e f g h i j k l m n o p q Wikimedia Movement. 2020. Movement Strategy Recommendations - User Experience. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recommendations/Improve_User_Experience
  25. a b c d e f g h i j k l m n o p q r s Marit Hinnosaar. 2019. Gender Inequality in New Media: Evidence from Wikipedia. Journal of Economic Behavior & Organization 163 (2019), 262–276. https://doi.org/10.2139/ssrn.2617021
  26. a b c d e f g h i Ioannis Protonotarios, Vasiliki Sarimpei, and Jahna Otterbacher. 2016. Similar gaps, different origins? Women readers and editors at Greek Wikipedia. In Tenth International AAAI Conference on Web and Social Media.
  27. a b c d e f g h i j k l m n o p q Wikimedia Foundation. 2012. 2012 Bangla Wikipedia Survey. https://meta.wikimedia.org/wiki/Small_Wiki_Editor_Engagement_Project/Report/Monthly_(June-July_2012)
  28. a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah ai Wikimedia Foundation. 2014. 2014 Global South User Survey (11 countries). https://meta.wikimedia.org/wiki/Research:Global_South_User_Survey_2014
  29. a b c d Wikimedia Foundation. 2015. 2015 WMF Donation Survey (US). https://upload.wikimedia.org/wikipedia/commons/2/25/Wikimedia_Reader_Survey_November_2015.pdf
  30. a b c d e f g h i Wikimedia Foundation. 2016. 2016 New Readers Phone Surveys. https://meta.wikimedia.org/wiki/Global_Reach/Insights
  31. a b c d e f g h i j k l Wikimedia Foundation. 2017. 2017 Strategy Surveys (6 countries). https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Sources/Brand_awareness,_attitudes,_and_usage_research_(July_2017)
  32. a b c d e f g h i j k l m n o p q Wikimedia Foundation. 2019. 2019 Reader Demographics Surveys. https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases#Reader_Demographics
  33. a b c d e f g h i j k l m n o p Wikimedia Pakistan. 2014. 2014 Pakistan Reader Survey. https://blog.wikimedia.org/2015/02/20/pakistani-readerssurvey/
  34. a b c d e f g h i j k l m n Pew Research. 2010. 2010 Pew Research Study (US). https://www.pewresearch.org/internet/2011/01/13/wikipediapast-and-present/
  35. a b c d e f g h Wikimedia UK. 2016. 2016 Welsh Wikipedia Reader Survey. https://meta.wikimedia.org/wiki/Research:Readership_of_Welsh_Wicipedia
  36. a b c d Wikimedia Movement. 2020. Addressing Wikipedia’s Gender Gap. https://wikimediafoundation.org/our-work/addressing-wikipedias-gender-gap/
  37. a b c d e f WikiProject Accessibility. [n.d.]. Wikipedia:WikiProject Accessibility. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Accessibility
  38. a b Wikibooks. [n.d.]. Wikijunior. https://en.wikibooks.org/wiki/Wikijunior
  39. a b Simple English Wikipedia. [n.d.]. Wikipedia:About. https://simple.wikipedia.org/wiki/Wikipedia:About
  40. a b c d e f Wikimedia Foundation. 2019. New Reaeders Research. https://meta.wikimedia.org/wiki/New_Readers
  41. a b WikiProject WikiConnect. [n.d.]. Wikipedia:WikiConnect. https://en.wikipedia.org/wiki/Wikipedia:WikiConnect
  42. a b c d e f g h i j k l m n o p q r s t u v w x y Wikimedia Foundation. 2011. 2011 Mobile Readers Survey (11 countries). https://meta.wikimedia.org/wiki/Research:Wikipedia_Mobile_Readers_Survey_2011
  43. a b GapFinder. [n.d.]. Wikipedia Gap Finder tool. https://recommend.wmflabs.org/
  44. a b Scribe. [n.d.]. Helping editors of under-resourced languages to create new high-quality Wikipedia articles. https://meta.wikimedia.org/wiki/Scribe
  45. a b c d e f g h i j k l m n o p Wikimedia Foundation. 2011. 2011 Readership Survey (16 countries). https://meta.wikimedia.org/wiki/Research:Wikipedia_Readership_Survey_2011/Results
  46. a b c d e Pew Research. 2007. 2007 Pew Research Study (US). https://www.pewresearch.org/internet/2007/04/24/wikipediausers/
  47. a b c d e f g h i j k l UNU-Merit. 2008. 2008 UNU-MERIT Survey (22 languages). https://meta.wikimedia.org/wiki/Research:UNU-MERIT_Wikipedia_survey
  48. a b Wikimedia+Education. [n.d.]. Wikimedia+Education Conference 2019. https://meta.wikimedia.org/wiki/Wikimedia%2BEducation_Conference_2019
  49. a b Wikiversity. [n.d.]. Wikiversity:Main Page. https://en.wikiversity.org/wiki/Wikiversity:Main_Page
  50. a b Wikimedia Movement. 2018-2020. Community Health Survey Results. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Working_Groups/Community_Health/Survey_results
  51. a b c Meta-Wiki. [n.d.]. Wikimedia LGBT+/Portal. https://meta.wikimedia.org/wiki/Wikimedia_LGBT+/Portal
  52. a b c d English Wikipedia. [n.d.]. Wikipedia:Neutral point of view. https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view
  53. a b c d Wikimedia Foundation. 2017. 2017 Reader Motivation Surveys (14 languages). https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Prevalence_of_Wikipedia_use_cases
  54. a b Philipp Singer, Florian Lemmerich, Robert West, Leila Zia, Ellery Wulczyn, Markus Strohmaier, and Jure Leskovec. 2017. Why we read wikipedia. In Proceedings of the 26th International Conference on World Wide Web. 1591–1600.
  55. a b Wikimedia Foundation. 2019. The role of citations in how readers evaluate Wikipedia articles. https://meta.wikimedia.org/wiki/Research:The_role_of_citations_in_how_readers_evaluate_Wikipedia_articles
  56. a b Meta-Wiki. [n.d.]. Breadth and depth. https://meta.wikimedia.org/wiki/Breadth_and_depth
  57. a b Kiwix. [n.d.]. Kiwix lets you access free knowledge. https://www.kiwix.org/en/
  58. a b Phabricator. 2019. T222078: Analyze readers’ engagement in countries affected by Singapore Data Center’s switch. https://phabricator.wikimedia.org/T222078
  59. a b Wikimedia Foundation. 2017. Wikipedia for KaiOS. https://www.mediawiki.org/wiki/Wikipedia_for_KaiOS
  60. a b Mozilla Foundation. 2016. Digital Skills Observatory. http://mozillafoundation.github.io/digital-skills-observatory/
  61. a b WebAIM. 2009. Survey of Preferences of Screen Readers Users. https://webaim.org/projects/screenreadersurvey/
  62. a b c d Wikimedia Movement Affiliates. [n.d.]. Para-Wikimedians Community User Group. https://meta.wikimedia.org/wiki/Para-Wikimedians_Community_User_Group
  63. a b c d Wikimedia Movement Affiliates. [n.d.]. WikiBlind User Group. https://meta.wikimedia.org/wiki/WikiBlind_User_Group
  64. a b The Signpost. 2013. Making Wikipedia more accessible. https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2013-09-04/Technology_report#Making_Wikipedia_more_accessible
  65. a b c d Tony Souter. 2017. What is it like to edit Wikipedia when you’re blind? Meet Graham Pearce. https://blog.wikimedia.org/2017/03/06/graham-pearce/
  66. a b WikiProject Usability. [n.d.]. Wikipedia:WikiProject Usability. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Usability
  67. a b VideoWiki. [n.d.]. The Free Multi-Media Encyclopedia that anyone can edit. https://videowiki.wmflabs.org/
  68. a b Wikispeech. [n.d.]. Wikispeech projects. https://meta.wikimedia.org/wiki/Wikispeech
  69. a b c Teun Lucassen, Roald Dijkstra, and Jan Maarten Schraagen. 2012. Readability of Wikipedia. First Monday (2012).
  70. Wikimedia Foundation. 2012. Wikipedia Zero. https://en.wikipedia.org/wiki/Wikipedia_Zero
  71. Wikipedia contributors. 2020. Universal Design for Learning — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Universal_Design_for_Learning&oldid=950795194
  72. a b Wikimedia Movement. 2020. Moevement Strategy Recommendations - Impact. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recommendations/Provide_for_Safety_and_Inclusion
  73. a b c d e f g h i j k l m n o p Wikimedia Foundation. 2011. April 2011 Editor Surveys. https://meta.wikimedia.org/wiki/Research:Wikipedia_Editors_Survey_2011_April
  74. a b c d e f g h i j k l m Wikimedia Foundation. 2011. November 2011 Editor Surveys. https://meta.wikimedia.org/wiki/Research:Wikipedia_Editors_Survey_2011_November
  75. a b c d e f g h i j k l m n o p Wikimedia Foundation. 2012. 2012 Editor Surveys. https://meta.wikimedia.org/wiki/Research:Wikipedia_Editors_Survey_2012
  76. a b c d e f g h Wikimedia Foundation. 2017. 2017 Community Insights Survey. https://meta.wikimedia.org/wiki/Community_Insights/2016-17_Report
  77. a b c d e f g h i Wikimedia Foundation. 2018. 2018 Community Insights Survey. https://meta.wikimedia.org/wiki/Community_Insights/2018_Report
  78. a b c d e f g h i j Wikimedia Foundation. 2018. 2020 Community Insights Survey. https://office.wikimedia.org/wiki/Community_Insights_Survey_Report_(2020)
  79. a b Wikimedia Foundation. 2019. 2019 YouGov Survey of Women and Wikipedia. https://meta.wikimedia.org/wiki/Communications/YouGov_survey_on_women_and_Wikipedia
  80. a b c d e f g h i j k l m Wikimedia Community Ireland. 2016. Wikimedia Community Ireland 2016 Editor Survey. https://meta.wikimedia.org/wiki/Research:Survey_of_user_in_Ireland
  81. a b c d e f g h i j k l m Wikimedia NL. 2015. 2015 Wikimedia Nederland Editor Survey. https://upload.wikimedia.org/wikipedia/commons/5/5c/Report_on_survey_among_editors_of_NLWP_2015.pdf
  82. a b c d e f g h i j k Wikimedia Ukraine. 2018. Wikimedia Ukraine 2018 Community Survey. https://meta.wikimedia.org/wiki/Wikimedia_Ukraine/Community_Survey_2018
  83. a b c d e WMDE. 2016. 2016 WMDE Editor Survey. https://meta.wikimedia.org/wiki/Wikimedia_Deutschland/Editor_Survey_2016
  84. a b Wikiversity. [n.d.]. TAO/Wikimedia Seniors Outreach. https://en.wikiversity.org/wiki/TAO/Wikimedia_Seniors_Outreach
  85. a b Isaac L Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 13–25.
  86. a b Meta-Wiki. [n.d.]. Maithili Wikimedians User Group/Outreach/Wiki Loves Villages. https://meta.wikimedia.org/wiki/Maithili_Wikimedians_User_Group/Outreach/Wiki_Loves_Villages
  87. a b Meta-Wiki. [n.d.]. Wikimedia EspaÃśa/Plan Anual/2019. https://meta.wikimedia.org/wiki/Wikimedia_Espa%C3%B1a/Plan_Anual/2019
  88. a b English Wikipedia. [n.d.]. Wikipedia:Babel. https://en.wikipedia.org/wiki/Wikipedia:Babel
  89. a b WikiEdu. [n.d.]. Wiki Education. https://wikiedu.org/
  90. a b Yann Algan, Yochai Benkler, Mayo Fuster Morell, and Jérôme Hergueux. 2013. Cooperation in a Peer Production Economy Experimental Evidence from Wikipedia. Available at SSRN 2843518 (2013).
  91. a b Ofer Arazy, Hila Liifshitz-Assaf, Oded Nov, Johannes Daxenberger, Martina Balestra, and Coye Cheshire. 2017. On the" how" and" why" of emergent role behaviors in Wikipedia. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 2039–2051.
  92. a b Martina Balestra, Ofer Arazy, Coye Cheshire, and Oded Nov. 2016. Motivational determinants of participation trajectories in Wikipedia. In Tenth International AAAI Conference on Web and Social Media.
  93. a b Martina Balestra, Lior Zalmanson, Coye Cheshire, Ofer Arazy, and Oded Nov. 2017. It was Fun, but Did it Last? The Dynamic Interplay between Fun Motives and Contributors’ Activity in Peer Production. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1–13.
  94. a b Anna C Rader. 2020. Why do people edit? https://commons.wikimedia.org/wiki/File:WDPE_Literature_Review_Anna_Rader.pdf.
  95. a b Ofer Arazy, Felipe Ortega, Oded Nov, Lisa Yeo, and Adam Balila. 2015. Functional roles and career paths in Wikipedia. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 1092–1105.
  96. a b Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. 2016. Who did what: Editor role identification in Wikipedia. In Tenth International AAAI Conference on Web and Social Media.
  97. a b Wikimedia Foundation. [n.d.]. Growth/Personalized first day/Newcomer homepage. https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_homepage
  98. a b c Jonathan T Morgan and Aaron Halfaker. 2018. Evaluating the impact of the Wikipedia Teahouse on newcomer socialization and retention. In Proceedings of the 14th International Symposium on Open Collaboration. 1–7.
  99. a b English Wikipedia. [n.d.]. Wikipedia:User access levels. https://en.wikipedia.org/wiki/Wikipedia:User_access_levels
  100. a b GLOW. [n.d.]. Supporting Indian Language Wikipedias Program/Support. https://meta.wikimedia.org/wiki/Supporting_Indian_Language_Wikipedias_Program/Support
  101. a b GLOW. [n.d.]. Global Reach/Announcements/Project Glow FAQ. https://meta.wikimedia.org/wiki/Global_Reach/Announcements/Project_Glow_FAQ
  102. a b MediaWiki. [n.d.]. Wikimedia Apps/Suggested edits. https://www.mediawiki.org/wiki/Wikimedia_Apps/Suggested_edits
  103. a b Wikimedia Movement. 2020. Movement Strategy Recommendations - Invest in Skills and Leadership Development. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recommendations/Invest_in_Skills_and_Leadership_Development
  104. a b M Claudia Buzzi, Marina Buzzi, Barbara Leporini, and Caterina Senette. 2008. Making Wikipedia editing easier for the blind. In Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges. 423–426.
  105. Jonathan Morgan. 2019. Research:Patrolling on Wikipedia. https://meta.wikimedia.org/wiki/Research:Patrolling_on_Wikipedia/Report
  106. a b Scott A Hale. 2014. Multilinguals and Wikipedia editing. In Proceedings of the 2014 ACM conference on Web science. 99–108.
  107. AfroCrowd. [n.d.]. Afro Free Culture Crowdsourcing Wikimedia. https://afrocrowd.org/
  108. Black Lunch Table. [n.d.]. Black Lunch Table Wikipedia. http://blacklunchtable.com/wikipedia/
  109. a b c d e f Whose Knowledge? [n.d.]. Whose Knowledge? https://whoseknowledge.org/
  110. GESIS Data Archive Cologne. 2020. European Values Study 2017: Integrated Dataset (EVS 2017). https://doi.org/10.4232/1.13511.
  111. Brian C Keegan. 2019. The Dynamics of Peer-Produced Political Information During the 2016 US Presidential Campaign. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–20.
  112. Feng Shi, Misha Teplitskiy, Eamon Duede, and James A Evans. 2019. The wisdom of polarized crowds. Nature human behaviour 3, 4 (2019), 329–336.
  113. Meta-Wiki. [n.d.]. Friendly space policies. https://meta.wikimedia.org/wiki/Friendly_space_policies
  114. Dan Cosley, Dan Frankowski, Loren Terveen, and John Riedl. 2007. SuggestBot: using intelligent task routing to help people find work in wikipedia. In Proceedings of the 12th international conference on Intelligent user interfaces. 32–41.
  115. Sneha Narayan, Jake Orlowitz, Jonathan Morgan, Benjamin Mako Hill, and Aaron Shaw. 2017. The Wikipedia Adventure: field evaluation of an interactive tutorial for new users. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1785–1799.
  116. Di Yang and Robert Kraut. 2018. Research:How role-specific rewards influence Wikipedia editors’ contribution. https://meta.wikimedia.org/wiki/Research:How_role-specific_rewards_influence_Wikipedia_editors%E2%80%99_contribution
  117. Xiaoquan Michael Zhang and Feng Zhu. 2011. Group size and incentives to contribute: A natural experiment at Chinese Wikipedia. American Economic Review 101, 4 (2011), 1601–15.
  118. Mark Graham, Bernie Hogan, Ralph K Straumann, and Ahmed Medhat. 2014. Uneven geographies of user-generated information: Patterns of increasing informational poverty. Annals of the Association of American Geographers 104, 4 (2014), 746–764.
  119. a b Besnik Fetahu, Katja Markert, and Avishek Anand. 2017. Fine Grained Citation Span for References in Wikipedia. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1990–1999. https://doi.org/10.18653/v1/D17-1212
  120. a b Heather Ford, Shilad Sen, David R Musicant, and Nathaniel Miller. 2013. Getting to the source: where does Wikipedia get its information from?. In Proceedings of the 9th International Symposium on Open Collaboration (Hong Kong, China) (WikiSym ’13, Article 9). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/2491055.2491064
  121. a b Reed H Harder, Alfredo J Velasco, Michael S Evans, and Daniel N Rockmore. 2015. Measuring Verifiability in Online Information. (Sept. 2015). arXiv:1509.05631 [cs.SI]
  122. a b c Miriam Redi, Besnik Fetahu, Jonathan Morgan, and Dario Taraborelli. 2019. Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia’s Verifiability. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). ACM, New York, NY, USA, 1567–1578. https://doi.org/10.1145/3308558.3313618
  123. a b Ai-Jou Chou, Guilherme Gonçalves, Sam Walton, and Miriam Redi. [n.d.]. Citation Detective: a Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale. ([n. d.]).
  124. a b The Wikipedia Library. [n.d.]. Wikipedia:The Wikipedia Library. https://en.wikipedia.org/wiki/Wikipedia:The_Wikipedia_Library
  125. a b Wikimedia Movement. 2020. Oral Knowledge. https://meta.wikimedia.org/wiki/Oral_knowledge
  126. a b WikiCite. [n.d.]. wikicite. http://wikicite.org/
  127. a b English Wikipedia. [n.d.]. Wikipedia:Core content policies. https://en.wikipedia.org/wiki/Wikipedia:Core_content_policies
  128. a b English Wikipedia. [n.d.]. Wikipedia:Verifiability. https://en.wikipedia.org/wiki/Wikipedia:Verifiability
  129. a b Ewa S Callahan and Susan C Herring. 2011. Cultural bias in Wikipedia content on famous persons. Journal of the American Society for Information Science. American Society for Information Science 62, 10 (Oct. 2011), 1899–1915. A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft) 27 https://doi.org/10.1002/asi.21577
  130. a b Shane Greenstein and Feng Zhu. 2012. Is Wikipedia Biased? The American economic review 102, 3 (May 2012), 343–348. https://doi.org/10.1257/aer.102.3.343
  131. a b Tsila Hassine. 2005. The dynamics of NPOV disputes. In Proceedings of Wikimania.
  132. a b Christoph Hube. 2017. Bias in Wikipedia. In Proceedings of the 26th International Conference on World Wide Web Companion (Perth, Australia) (WWW ’17 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 717–721. https://doi.org/10.1145/3041021.3053375
  133. a b Christoph Hube and Besnik Fetahu. 2018. Detecting Biased Statements in Wikipedia. In Companion Proceedings of the The Web Conference 2018 (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1779–1786. https://doi.org/10.1145/3184558.3191640
  134. a b Umashanthi Pavalanathan, Xiaochuang Han, and Jacob Eisenstein. 2018. Mind Your POV: Convergence of Articles and Editors Towards Wikipedia’s Neutrality Norm. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–23.
  135. a b Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic models for analyzing and detecting biased language. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1650–1659.
  136. a b English Wikipedia. [n.d.]. Wikipedia:Neutral point of view/Noticeboard. https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view/Noticeboard
  137. a b English Wikipedia. [n.d.]. Wikipedia:Systemic bias. https://en.wikipedia.org/wiki/Wikipedia:Systemic_bias
  138. a b c Shiqing He, Allen Yilun Lin, Eytan Adar, and Brent Hecht. 2018. The tower of babel.jpg: Diversity of visual encyclopedic knowledge across wikipedia language editions. In 12th International AAAI Conference on Web and Social Media, ICWSM 2018. AAAI Press, 102–111.
  139. a b Emily Porter, P M Krafft, and Brian Keegan. 2020. Visual Narratives and Collective Memory across Peer-Produced Accounts of Contested Sociopolitical Events. ACM Transactions on Social Computing Article 4 (Feb. 2020).
  140. a b F B Viegas. 2007. The Visual Side of Wikipedia. In 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07). 85–85. https://doi.org/10.1109/HICSS.2007.559
  141. a b c Wikimedia Movement. 2020. Moevement Strategy Recommendations - Innovate. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recommendations/Innovate_in_Free_Knowledge
  142. a b Wikimedia Foundation. 2018. Recommending Images to Wikidata Items. https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikidata_Items
  143. a b c Wikimedia Foundation. 2019. Recommending Images to Wikipedia Articles. https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikipedia_Articles
  144. a b Wikimedia Foundation. 2020. The Role of Images for Knowledge Understanding. https://meta.wikimedia.org/wiki/Research:The_Role_of_Images_for_Knowledge_Understanding
  145. a b Wikimedia Foundation. 202020. Prototypes of Image Classifiers Trained on Commons Categories. https://meta.wikimedia.org/wiki/Research:Prototypes_of_Image_Classifiers_Trained_on_Commons_Categories
  146. a b WDFIST. [n.d.]. Wikidata Free Image Search Tool. https://fist.toolforge.org/wdfist/index.html
  147. a b English Wikipedia. [n.d.]. Wikipedia:Image use policy. https://en.wikipedia.org/wiki/Wikipedia:Image_use_policy
  148. a b WikiShootMe. [n.d.]. Wiki ShootMe! https://wikishootme.toolforge.org/
  149. a b WLM. [n.d.]. Wiki loves monuments. https://www.wikilovesmonuments.org/
  150. a b c d Visible Wiki Women. [n.d.]. #VisibleWikiWomen. https://whoseknowledge.org/initiatives/visiblewikiwomen/
  151. a b Denny Vrandecic. 2020. Architecture for a multilingual Wikipedia. (2020).
  152. a b Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (Sept. 2014), 78–85. https://doi.org/10.1145/2629489
  153. a b c d e f g h i j Wikimedia Foundation. 2019. Medium Term Plan. https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Medium-term_plan_2019/Platform_evolution
  154. a b SDC. [n.d.]. Structured Data on Commons. https://meta.wikimedia.org/wiki/Structured_Data_on_Commons
  155. a b WDCM. [n.d.]. Wikidata Concepts Monitor: Wikidata Usage. http://wmdeanalytics.wmflabs.org/WDCM_UsageDashboard/
  156. a b English Wikipedia. [n.d.]. Wikipedia:Use of Wikidata in Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:Use_of_Wikidata_in_Wikipedia A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft) 33
  157. a b English Wikipedia. [n.d.]. Wikipedia:Wikidata. https://en.wikipedia.org/wiki/Wikipedia:Wikidata
  158. a b c Aleksandar Brezar1 and James Heilman1. 2019. Readability of English Wikipedia’s health information over time. (2019).
  159. a b Taha Yasseri, András Kornai, and János Kertész. 2012. A Practical Approach to Language Complexity: A Wikipedia Case Study. PloS one 7, 11 (Nov. 2012), e48386. https://doi.org/10.1371/journal.pone.0048386
  160. a b Wikimedia Foundation. [n.d.]. Content translation tool. https://en.wikipedia.org/wiki/Special:ContentTranslation
  161. a b English Wikipedia. [n.d.]. Wikipedia:Accessibility dos and don’ts. https://en.wikipedia.org/wiki/Wikipedia:Accessibility_dos_and_don’ts
  162. a b English Wikipedia. [n.d.]. Wikipedia:Manual of Style/Accessibility. https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Accessibility
  163. a b Julia Adams, Hannah Brückner, and Cambria Naslund. 2019. Who Counts as a Notable Sociologist on Wikipedia? Gender, Race, and the “Professor Test”. Socius 5 (Jan. 2019), 2378023118823946. https://doi.org/10.1177/2378023118823946
  164. a b David Bamman and Noah A Smith. 2014. Unsupervised Discovery of Biographical Structure from Text. Transactions of the Association for Computational Linguistics 2 (Dec. 2014), 363–376. https://doi.org/10.1162/tacl_a_00189
  165. a b Young-Ho Eom, Pablo Aragón, David Laniado, Andreas Kaltenbrunner, Sebastiano Vigna, and Dima L Shepelyansky. 2015. Interactions of cultures and top people of Wikipedia from ranking of 24 language editions. PloS one 10, 3 (March 2015), e0114825. https://doi.org/10.1371/journal.pone.0114825
  166. a b Eduardo Graells-Garrido, Mounia Lalmas, and Filippo Menczer. 2015. First women, second sex: Gender bias in Wikipedia. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. 165–174.
  167. a b c Aaron Halfaker. 2017. Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect. In Proceedings of the 13th International Symposium on Open Collaboration. ACM, 19. https://doi.org/10.1145/3125433. 3125475 A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft) 29
  168. a b c Piotr Konieczny and Maximilian Klein. 2018. Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata Human Gender Indicator. New Media & Society 20, 12 (2018), 4608–4633.
  169. a b Amanda Menking, David W McDonald, and Mark Zachry. 2017. Who Wants to Read This?: A Method for Measuring Topical Representativeness in User Generated Content Systems. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’17 (Portland, Oregon, USA). ACM Press, New York, New York, USA, 2068–2081. https://doi.org/10.1145/2998181.2998254
  170. a b Joseph Reagle and Lauren Rhue. 2011. Gender Bias in Wikipedia and Britannica. International Journal of Communication Systems 5, 0 (Aug. 2011), 21.
  171. a b Menno H Schellekens, Floris Holstege, and Taha Yasseri. 2019. Female scholars need to achieve more for equal public recognition. (April 2019). arXiv:1904.06310 [cs.DL]
  172. a b Claudia Wagner, David Garcia, Mohsen Jadidi, and Markus Strohmaier. 2015. It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia. In Ninth international AAAI conference on web and social media.
  173. a b c Claudia Wagner, Eduardo Graells-Garrido, David Garcia, and Filippo Menczer. 2016. Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ Data Science 5, 1 (March 2016), 1–24. https://doi.org/10.1140/epjds/s13688-016-0066-4
  174. a b Amber Young, Ari D Wigdor, and Gerald Kane. 2016. It’s Not What You Think: Gender Bias in Information about Fortune 1000 CEOs on Wikipedia. In ICIS 2016 Proceedings.
  175. a b Olga Zagovora, Fabian Flöck, and Claudia Wagner. 2017. “(Weitergeleitet von Journalistin)”: The Gendered Presentation of Professions on Wikipedia. In Proceedings of the 2017 ACM on Web Science Conference - WebSci ’17 (Troy, New York, USA). ACM Press, New York, New York, USA, 83–92. https://doi.org/10.1145/3091478.3091488
  176. a b c d e Wikimedia Movement. 2020. Moevement Strategy Recommendations - Impact. https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recommendations/Identify_Topics_for_Impact
  177. a b c Denelezh. [n.d.]. Gender Gap in Wikimedia projects. https://www.denelezh.org/
  178. a b Meta-Wiki. [n.d.]. Gender Gap Portal. https://meta.wikimedia.org/wiki/Gender_gap
  179. a b c d e WikiProject Countering systemic bias. [n.d.]. Wikipedia:WikiProject Countering systemic bias. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Countering_systemic_bias
  180. a b WDCM. [n.d.]. Wikidata Concepts Monitor: Biases. http://wmdeanalytics.wmflabs.org/WDCM_BiasesDashboard/
  181. a b c WikiProject Women. [n.d.]. Wikipedia:WikiProject Women. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women
  182. a b c Pablo Beytía. 2020. The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 806–810. https://doi.org/10.1145/3366424.3383569
  183. a b Mark Graham, Scott Hale, and Monica Stephens. 2011. Geographies of the World’s Knowledge. Technical Report. Convoco, London.
  184. a b Mark Graham, Bernie Hogan, Ralph K Straumann, and Ahmed Medhat. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers. Association of American Geographers 104, 4 (July 2014), 746–764. https://doi.org/10.1080/00045608.2014.910087
  185. a b Brent Hecht and Darren Gergle. 2009. Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on Communities and technologies - C&T ’09 (University Park, PA, USA). ACM Press, New York, New York, USA, 11. https://doi.org/10.1145/1556460.1556463
  186. a b English Wikipedia. [n.d.]. Portal:Africa. https://en.wikipedia.org/wiki/Portal:Africa
  187. a b WLA. [n.d.]. Wiki loves Africa. https://www.wikilovesafrica.net/
  188. a b Aniket Kittur, Ed H Chi, and Bongwon Suh. 2009. What’s in Wikipedia?. In Proceedings of the 27th international conference on Human factors in computing systems - CHI 09. ACM Press. https://doi.org/10.1145/1518701.1518930
  189. a b Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz. 2019. Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics. Computers 8, 3 (Aug. 2019), 60. https://doi.org/10.3390/computers8030060
  190. a b Thomas Shafee, Gwinyai Masukume, Lisa Kipersztok, Diptanshu Das, Mikael Häggström, and James Heilman. 2017. Evolution of Wikipedia’s medical content: past, present and future. Journal of epidemiology and community health 71, 11 (Nov. 2017), 1122–1129. https://doi.org/10.1136/jech-2016-208601
  191. a b Denise A Smith. 2020. Situating Wikipedia as a health information resource in various contexts: A scoping review. PloS one 15, 2 (Feb. 2020), e0228786. https://doi.org/10.1371/journal.pone.0228786
  192. a b WikiProject Vital Articles. [n.d.]. Wikipedia:WikiProject Vital Articles. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Vital_Articles
  193. a b emijrp. 2010. All Human Knowledge. https://en.wikipedia.org/wiki/User:Emijrp/All_Human_Knowledge
  194. a b c WikiProject Medicine. [n.d.]. Wikipedia:WikiProject Medicine. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine
  195. a b Marc Miquel-Ribé and David Laniado. 2016. Cultural Identities in Wikipedias. In Proceedings of the 7th 2016 International Conference on Social Media & Society (London, United Kingdom) (SMSociety ’16, Article 24). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/2930971.2930996
  196. a b Marc Miquel-Ribé and David Laniado. 2018. Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers of physics 6 (June 2018), 234. https://doi.org/10.3389/fphy.2018.00054
  197. a b WikiProject French Caribbean Culture. [n.d.]. Wikidata:WikiProject French Caribbean Culture. https://www.wikidata.org/wiki/Wikidata:WikiProject_French_Caribbean_Culture
  198. a b WCDO. [n.d.]. Wikipedia Diversity Observatory. https://meta.wikimedia.org/wiki/Wikipedia_Diversity_Observatory
  199. WPWP. [n.d.]. Wikipedia Pages Wanting Photos. https://meta.wikimedia.org/wiki/Wikipedia_Pages_Wanting_Photos
  200. Isaac Johnson. 2019. Wikidata Tranclusion on English Wikipedia. https://meta.wikimedia.org/wiki/Research:External_Reuse_of_Wikimedia_Content/Wikidata_Transclusion
  201. Marc Miquel-Ribé and David Laniado. 2019. Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 620–629.
  202. Wikimedia Foundation. [n.d.]. ORES. https://www.mediawiki.org/wiki/ORES
  203. Tilman Byer. 2015. How many women edit Wikipedia? https://blog.wikimedia.org/2015/04/30/how-many-womenedit-wikipedia/
  204. Jonathan Morgan. 2019. Research:Content gaps on Wikipedia. https://meta.wikimedia.org/wiki/Research:Content_gaps_on_Wikipedia
  205. Wikimedia Space. 2020. What is the size of the gender gap? Like, actually? https://discuss-space.wmflabs.org/t/whatis-the-size-of-the-gender-gap-like-actually/3066
  206. Morten Warncke-Wang, Vivek Ranjan, Loren Terveen, and Brent Hecht. 2015. Misalignment between supply and demand of quality content in peer production communities. In Ninth International AAAI Conference on Web and Social Media.
  207. Jonathan Morgan and Isaac Johnson. 2019. Research:Explaining the Wikipedia reader gender gap. https://meta.wikimedia.org/wiki/Research:Explaining_the_Wikipedia_reader_gender_gap
  208. Mostafa Mesgari, Chitu Okoli, Mohamad Mehdi, Finn Årup Nielsen, and Arto Lanamäki. 2015. “The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia. Journal of the Association for Information Science and Technology 66, 2 (2015), 219–245. https://doi.org/10.1002/asi.23172
  209. Finn Årup Nielsen. 2012. Wikipedia Research and Tools: Review and Comments. (Feb. 2012). https://doi.org/10.2139/ssrn.2129874
  210. Chitu Okoli, Mohamad Mehdi, Mostafa Mesgari, Finn Årup Nielsen, and Arto Lanamäki. 2012. The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft) 31 (2012).
  211. WRN. [n.d.]. Wikimedia Research Newsletter. https://meta.wikimedia.org/wiki/Research:Newsletter
  212. David R Krathwohl. [n.d.]. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives.