Jump to content

Research:Cover Women

From Meta, a Wikimedia project coordination wiki
Created
1 June
Contact
[[en:Associate Professor at FIMA, Universitat de Barcelona|Associate Professor at FIMA, Universitat de Barcelona]]
Collaborators
Miquel Centelles, Laura Fernández
Duration:  2024-06 – 2025-06
gender, front-page, gate keeping, intersectionalities, diversity
Grant ID: G-RS-2402-15223
This page documents a completed research project.


This proposal presents a research project that will look into the most popular Wikipedia pages. This page, known as the main page, or front page from a communication perspective, will be analyzed across the seven longest-standing Wikipedia editions: English, German, Catalan, French, Portuguese, Italian, and Spanish. Grounded in a gender and intersectional perspective; this study will delve into the daily content, newsroom guidelines (principles and standards that guide the dissemination of information), and volunteer community insights. The examination will employ communication theories like gatekeeping and agenda-setting. Beyond academic research, our goal is to actively contribute to editing communities by addressing the daily challenges and needs in caring front-page content.

Introduction

[edit]

Wikipedia; as a key player in the public sphere, transforms information dissemination.

Still; Wikipedia grapples with persistent gender bias in both editing and content(Antin et al., 2011; Bear and Collier, 2016; Wagner et al., 2016; Hinnosaar, 2019; Minguillón et al., 2021; Ferran-Ferrer, Boté-Vericad, et al., 2023) Alongside additional prejudices (Redi et al., 2021; Beytía et al., 2022); bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020).

Scholars highlight the need for a comprehensive understanding of Wikipedia's knowledge production culture to address these biases and make Wikipedia more robust, reliable, and transparent (Menking and Erickson, 2015).

Reducing the gender and other intersectional biases necessitates more than acknowledging Wikipedia as a mirror of societal biases—it involves addressing the platform's deeper logic embedded in its techno-scientific project (Ford and Wajcman, 2017). We have selected the most popular Wikipedia pages for analysis.

This page, commonly referred to as the main page, or front page from a communication perspective, is accessible in all language editions of the global encyclopedia, and we will conduct our study on it.

We will research into the possible gender and intersectional bias in its daily content, in its news room guidelines (principles and standards that govern the dissemination of information), and in the insights from the volunteer community who decide which information gets disseminated to the public on the main page. This research will utilize communication theories such as gatekeeping which examines the process by which information is filtered, selected, and ultimately presented to the public (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972) which studies the effect. Therefore, the research questions that we address are: Therefore, the research questions that we address are:

RQ1: How prevalent are implicit bias based on language, occupation, ethnic group, religion, country and gender identity in the biographical content featured on daily Wikipedia's front pages?

RQ2: How do the editorial guidelines for volunteer gatekeepers influence the selection of front-page stories on Wikipedia?

RQ3: How do Wikipedia editors organize and curate Main Page?

[edit]

To contrast the feasibility of this proposal with seven language editions of Wikipedia, we have already conducted a micro-project with a sample of the English and Spanish Wikipedia to assess the viability of the global project. That is:

  • a) If there are open and formalized recommendations and guidelines that determine which contents are published on the main page and if the publication criteria can be analyzed.
  • b) At the same time, we were interested in seeing if with data wrangling techniques we could work with the biographies published on all Wikipedia main pages and analyze them from a gender and intersectional perspective using the properties of Wikidata.
  • c) Finally, we highlight the ease of contacting the community that performs gatekeeping tasks, and we begin to prepare the relevant questions to understand the decision-making process, editorial practices, and identify the issues that may be relevant to understanding the phenomenon.

The results of this previous trial work, with two language editions, will be published soon (Ferran-Ferrer et al., 2024). The trend is not encouraging if we take into account that bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020).

To address this, scholars stress the importance of understanding Wikipedia's knowledge production culture to tackle its gender gap (Menking and Erickson, 2015). Addressing this issue requires delving into the foundational principles driving the platform's techno-scientific project (Ford and Wajcman, 2017; Geiger, 2017), necessitating the recognition and dismantling of exclusionary practices (Menking and Rosenberg, 2021).

Communication theories like gatekeeping and agenda-setting provide valuable frameworks for understanding Wikipedia's potential biases. Gatekeeping theory, focusing on information f iltering processes, is applied to scrutinize stories selected for the Front Page, which attracts millions of readers monthly (Barzilai-Nahon, 2009; Wikimedia, 2023). Gatekeeping theory has previously been applied to Wikipedia by researchers to further understand biases in content selection and presentation (Li and Farzan, 2020) and to advocate for a reorganization of online spaces to democratize content and encourage dialectical gatekeeping that could reduce racial and other disparities (Ezell, 2021). Additionally, drawing from agenda-setting theory, we examine how Wikipedia's main page influences viewers and shapes news hierarchy, including its agenda-building power (McCombs and Shaw, 1972; Ren and Xu, 2023). Agenda setting can impact the choices of frames and sentiment adopted by Wikipedia pages regarding a particular issue or event (Lee, 2018) and it can play a role in shaping the focus and intensity of user edit activity in Wikipedia (Mahabir et al., 2018). This study goes beyond affirming Wikipedia's reflection of reality to delve into its systemic challenges (Ford and Wajcman, 2017). It analyses not only main page content selection but also newsroom guidelines, including interviews with gatekeepers, to enhance understanding and address systemic issues.

Methods

[edit]

This research proposal outlines a study on gender representation and biases on Wikipediaʼs main page, the most visited Wikipedia page, the main page (or front page from a communication perspective), which got 46.8 billion visits last November on the English edition (Wikimedia, 2023). We will do a comparative analysis across seven longest-standing Wikipedia editions, English, German, Catalan, French, Portuguese, Italian, and Spanish, all of them born in 2001, employing a mixed-methods approach. Grounded in gender and intersectionality, the study will analyze daily content, editorial/newsroom guidelines, and insights from volunteer communities using communication theories like gatekeeping (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972). Our aim is not only academic research, but also active contribution to editing communities by addressing daily challenges in caring front-page content. Therefore, in the project's work team, we have already included seven working groups of Wikipedia users involved in gender for each language edition and the chapters of all the Wikipedias analyzed in this project (See Table 2).

The first stage of the project will be:

  • a) To conduct a scoping review, a systematic literature review using the SALSA Framework(Grant and Booth, 2009) to analyze the academic publications from 2001 to 2024. This review will concentrate on examining Wikipedia within the framework of a communication ecosystem. Then, we will employ a triangulation methodology.
  • b) In-depth interviews with voluntary editors of the front page from all seven Wikipedia editions to ascertain decision-making processes, biases, and strategies that influence content visibility related to gender and other inter-sectionalities. The interviews will be conducted in person or online and in the native languages of the volunteer participants. We plan to make around five interviews by language edition. Contacts with the volunteers will be obtained through discussion pages related to editing the main page, as well as from user groups participating in the project, such as calls from the same chapters to their networks. The interview transcriptions will be coded and analyzed using qualitative data analysis software, and a specific codebook will be generated to facilitate the coding. This methodological approach will address RQ1andRQ3.
  • c) News room guidelines: We will apply content analysis to main-page, or front-page editorial guidelines, for each language edition, and we will explore what leads the decision-making of the gatekeepers who determine story prominence. The content of these guidelines will be coded and analzed using qualitative data analysis software, and a specific codebook will be generated to facilitate the coding. This research strategy will tackle RQ2. The analysis of the qualitative approach to agenda setting and gatekeeping practices (RQ1-3) will be conducted independently with two codebooks, one for the interviews and one for the editorial policies. However, each codebook will encode elements specific to gatekeeping and agenda setting to obtain evidence that corresponds to the theoretical framework.
  • d) Main-page content quantitative analysis: We will scrutinize the content (biographies) on the front page in each of the seven language editions for ten years, with data wrangling. To do so, first, we will identify the sections of the main page that are consistently present across all Wikipedias and are easily comparable. Wikipedia's front pages regularly feature changing content, offering a snapshot of current events, featured articles, and useful links. It's important to note that volunteers maintain these main pages and may evolve in format and content over time. For each language edition, a unique method will be employed to retrieve the content and data of its main page from the past ten years, as the URLs of previous main pages cannot be obtained from the dumps. Quantitative analysis will begin by scraping through the open-source tool Open Refine to reconcile the URIs found in the sections of Wikipedia covers in both language editions. This process will enrich them with specific properties from Wikidata to obtain values of the selected properties for study: like P21 (sex or gender), P106 (occupation), P172 (ethnic group), P103 (native language) and others. Open Refine, utilized in various contexts and applications, is essential for this research as it enables the preparation and analysis of vast amounts of data. This method will respond to RQ4.

Table 1 offers a comprehensive overview of the research proposal.

Expected output

[edit]

The specific research outputs that we envision for our proposed project include, but are not limited to:

  • Scientific publications: We will draft scientific publications for each research question and assess whether the approach is comparative across all editions or if it is better to separate them by smaller communities, editorial process typologies, etc. This will be determined once the study is completed to decide on the best dissemination approach.
  • The data set, emerging from RQ4, will be made available as downloadable dumps, and will be accessed via public APIs and a SPARQL endpoint.
  • Participation at least at these conferences: ○ Wikiworkshop ○ Wikimania ○ WikiWomenCamp ○ Each user group and chapter will participate in national or regional events with Women Cover results.
  • Tool proposals to support the editorial tasks of gatekeeping, namely:
    • Guidelines for content selection on front pages that are attuned to intersectionality and gender diversity;
    • Bots and AI assistants that facilitate the content selection process for front pages, with a focus on acknowledging intersectionality and gender differences
    • Both tools will be developed with a focus on considering the collaborative environment and consensus-driven approach characteristic of Wikipedia.
  • Resources aimed at enhancing the archiving and curation of main-page content across all Wikipedia editions outlined in this proposal. For each output, explain who the primary intended audience for the output is and what benefit, if more specifics are available, they can gain by receiving the output. If you have specific publication venues, conferences, and so on in mind, please list these.

Community impact plan

[edit]

The project aligns with the Wikimedia Movement's 2030 strategy by focusing on delivering knowledge as a service and addressing equity in knowledge and communities overlooked by structures of power and privilege. Furthermore, Cover Women project will involve a teamof5researchers, professors from the University of Barcelona, one from the UOC, and a PhDstudent, with a multidisciplinary perspective, as we have individuals from the fields of communication, semantic web, digital humanities, and computer science. Additionally, this project proposal has been designed according to the needs of various activist groups regarding gender equality on Wikipedia, as well as with the boards of the chapters involved in each language edition. See Table 2 to anticipate the impact on communities we will reach. These users are groups of Wikipedia users who work to achieve a better Wikipedia by introducing a gender perspective. Since we are working with 7 different editions of Wikipedia, we have considered that having a user group of female editors for each edition and a representation from each chapter's board would be interesting to achieve the project's objectives and meet the real needs of the communities. This project will provide:

  • a) Decade-long insights into gender and intersectionality content representation on Wikipediasʼ front pages. (RQ4)
  • b) Beyond descriptive stats, we'll reveal bias trends. (RQ4)
  • c) Editorial strategies for gatekeeping and agenda setting. (RQ1-3) d) Guidelines for ethical content selection using AI and bots. (RQ3)
  • e) Technical guidance to enhance data archives on main pages. (RQ4) f) Collaborative work with volunteers ensures inclusivity, integrating advocate perspectives for a consensus-driven approach.(RQ1) Built on in-depth interviews and stable and lasting collaboration with Wikipedia chapters and user groups, this work addresses gender identity under-representation. We will utilize Wikipedia's consensus-based decision-making approach to address our research questions. This method prioritizes addressing the legitimate concerns of its editors and finding a middle ground, all while adhering to Wikipedia's established policies and guidelines. In this context, it is crucial to consider that consensus naturally evolves among editors as they make changes, the importance of quality arguments in determining consensus, the allowance for consensus to evolve based on new evidence, and the acknowledgment of decisions beyond the scope of editor consensus. This methodology underscores Wikipedia's emphasis on collaboration, incremental progress, and communal harmony in managing a large crowd-sourced encyclopedia.

Results RQ1: English Edition

[edit]

Gender Men remain dominant among featured biographies, making up 70.4% (20,780) compared to 29% women (8,576). Non-binary and other gender identities are nearly absent (98 cases; 0.3%), with 75 unspecified. Although women’s representation has improved, the gap remains large and persistent.

Temporal intersections Most biographies belong to the Contemporary Age (83.7%, 24,713), followed by the Modern (6.6%), Medieval (3.6%), and Prehistoric/Ancient periods (0.9%). The gender gap widens over time: 2.27 men per woman in the Contemporary Age, 4.70 in the Modern, 6.31 in the Middle Ages, and 5.60 in Prehistory/Antiquity. “Other” and unspecified genders represent only 1.6%, confirming both recency bias and androcentrism in representation.

Geographies of visibility Featured biographies are concentrated in North America and Western Europe. The United States dominates (8,220; ~35%), followed by the United Kingdom (2,687; 11.5%), with Germany (1,106), France (928), Australia (785), and Canada (762) reinforcing the Anglophone bias. Countries from the Global South (e.g., India 753, China 451, Japan 442, Indonesia 390) are underrepresented. Across countries, men outnumber women two to three to one, peaking in Indonesia (4.2:1) and China (3.1:1); only Australia (1.7:1) approaches balance. “Other” genders are extremely rare (102 cases), mostly in Anglophone contexts.

Occupational hierarchies Visibility is concentrated in public-facing and media-related professions—politicians, writers, and performers—followed by academics, journalists, and sports or entertainment figures. Broad occupational fields (UDC) confirm the dominance of politics, performance, music, and literature over technical or scientific areas. Overall, men make up 71.8%, women 26.3%, and others 1.9%. Gaps vary by field:

Politics: 4.4:1 (4,074 men / 925 women) Law: 4.5:1; Business: 4.9:1; Military: 24:1 Arts: actors 1.15:1; singers 0.96:1 (slightly more women); writers 1.6:1; poets 2.4:1 Academia & media: professors 3.25:1; journalists 2.0:1; screenwriters 3.3:1 Composers 5.4:1; painters 2.4:1

Sociocultural identity Metadata remain highly incomplete: 89% lack native language information. Among available data, English speakers dominate (1,388), followed by French (188), German (172), Arabic (116), Russian (96), and Spanish (94). Gender imbalance averages ~2.4 men per woman, highest among Arabic (5.8:1) and Western European languages (French/Dutch 3.4:1, German 2.6:1). English shows 2.2:1, Spanish 1.9:1, while Turkish (0.84:1) and American English (0.93:1) approach parity but represent small samples.

More information Ferran-Ferrer, Núria; Centelles, Miquel; Laura Fernández; (2025). Wikipedia’s Front page ten years evolution: analysis of the representation of gender and intersectionalities on biographic content and its editorial policies. Online Journal of Communication and Media Technologies [In Print]

Centelles, Miquel; Salse, Marina; Pérez-Montoro, Mario; Ferran-Ferrer, Núria (2026). Gender and intersectional bias in featured biographies on the English Wikipedia Main Page (2011-2024), JASIST, [Under Review]

Results RQ1: Catalan Edition

[edit]

Content will be available soon.

More information Salse, Marina; Centelles, Miquel; Ferran-Ferrer, Núria (2025). Dones de portada: estudi descriptiu sobre els biaixos de gènere i la interseccionalitat a la Viquipèdia, II Congrés Càtedra UB de Perspectiva de Gènere i Feminismes Ciutat de Cornellà, 10-12 de desembre de 2025, Citilab de Cornellà.

Results RQ1: French Edition

[edit]

Content will be available soon.

Results RQ1: Italian Edition

[edit]

Gender Men continue to dominate featured biographies on the Italian Wikipedia, representing 72.1% (14,832 entries), while women account for 27.4% (5,639). Non-binary and other gender identities are almost absent (0.5%), with only a handful of cases recorded. Despite gradual progress, the gender imbalance remains structural and persistent, mirroring global patterns of digital visibility.

Temporal intersections The temporal distribution of featured biographies shows a clear recency bias: the Contemporary Age gathers the vast majority of entries (81.5%; 16,772), followed by the Modern Age (7.3%), the Middle Ages (4.2%), and Antiquity (1.1%). The gender gap expands in earlier periods: from 2.5 men per woman in the Contemporary Age to 5.2:1 in the Modern and 6.0:1 in the Medieval era. Only 1.4% of entries fall under “other” or “unspecified” genders, suggesting limited inclusivity and incomplete historical metadata.

Geographical patterns The Italian edition exhibits a strong national and Eurocentric focus. Over 60% of featured individuals are from Italy, followed by other European countries such as France, the United Kingdom, and Germany. Representation from the Global South is minimal, with Latin America, Africa, and Asia collectively contributing less than 10% of entries. Across regions, men outnumber women consistently, though the gap narrows slightly in Anglophone and Nordic countries. Italy itself shows a ratio of 2.8 men for every woman, while the disparity widens in Eastern Europe and non-European regions. Gender-diverse biographies are practically non-existent.

Occupational hierarchies Occupational visibility on the Italian Wikipedia centers around politics, arts, and sports, echoing public visibility and media attention. The most represented categories include politicians, writers, actors, football players, and musicians. Aggregated by broader professional domains, social sciences, arts, and humanities dominate over STEM and technical professions. Gendered differences are marked:

Politics (4.6 men per woman) and business (5.0:1) remain heavily male. Arts and entertainment show more balance — actors 1.2:1, singers 1.0:1, writers 1.8:1. Sports, particularly football, are overwhelmingly masculine (9.3:1). Academia and journalism remain moderately unequal (2.5–3:1). Overall, occupational patterns reveal that symbolic visibility (through culture and media) offers more gender balance than institutional or technical recognition.

Sociocultural identity Demographic metadata are sparse, with language, ethnicity, and religion missing in most records. Among available entries, Italian speakers dominate (over 70%), followed by English, French, and Spanish. Gender disparities persist across all language groups (average 2.3 men per woman), with slightly smaller gaps among English and Spanish speakers. The near-absence of intersectional identifiers (e.g., ethnicity, religion, minority status) highlights the limited inclusiveness of Wikidata metadata for the Italian edition and its strong alignment with national and Western cultural hierarchies.

More information

Results RQ1: Portuguese Edition

[edit]

Content will be available soon.

Results RQ1: Spanish Edition

[edit]

Content will be available soon.

More information

Ferran-Ferrer, Núria; Kuggler, Francisco; Fernández, Laura; Centelles, Miquel (2025). “The effects of outlier data (about gender and intersectionalities) in Wikidata on Wikipedia’s main page: Results of the ‘Cover Women’ project”, Wiki Library Convention, January COLMEX, Mexico.

Ferran-Ferrer, Núria; Laura Fernández; Centelles, Miquel (2024) “Behind the front page: A comparative gender gap study on Wikipedia’s main page through gatekeeping and agenda-setting theories, 10th European Communication Conference, “Communication and social (dis)order, ECREA ECC24-2025 Section Gender, Sexuality and Communication, Faculty of Social Sciences, University of Ljibljana, Slovenia, September 24-26.

Results RQ2 Editorial guidelines: English & Spanish Edition

[edit]

The comparative analysis of editorial guidelines across the English and Spanish editions of Wikipedia encompassed 31 policy and guidance webpages (17 English, 7 Spanish). Within these documents, we identified 210 hyperlinks leading to additional sections or resources that expand on editing practices—175 in the English edition and 35 in the Spanish one. These interlinked structures reflect a high degree of procedural complexity and internal referencing that shapes editorial learning and access to authority within the platform.

Editorial responsibility and governance are formally open to all users but operationally concentrated among experienced volunteers. Routine maintenance of front-page sections is performed primarily by human editors, with minimal automation: only two bots are explicitly referenced—one managing the Featured Articles section in English and another updating the Efemérides template in Spanish. Administrative privileges determine who can implement real-time changes, particularly in the English edition, where only administrators can move content between templates or approve final updates.

In the Spanish edition, experience thresholds are explicitly codified. Users require a minimum of 500 edits and six months of activity to participate in voting on featured resources, 100 edits for CAD or VECAD reviews, and 50 edits to update Wikidata article statuses. These quantitative criteria formalize hierarchies of editorial authority, reflecting how expertise and tenure influence consensus formation. Similar though less formalized mechanisms appear in the English edition through the nomination and review processes for featured and good articles.

Both editions include references to bias mitigation, though these are limited. Templates in sections such as Did You Know and Featured List Candidates explicitly mention efforts to reduce bias, yet their effectiveness depends on user participation and topic diversity. We found 17 prompts inviting reader or editor feedback (12 in English, 5 in Spanish), typically linked to codes of conduct and encouraging constructive engagement, though they remain relatively scarce.

The formal and linguistic dimensions of the guidelines differ markedly. The English guidelines make extensive use of specialized vocabulary and abbreviations—over 90 distinct abbreviations were recorded, compared to 14 in the Spanish version. English abbreviations often refer to sections (DYK, ITN, POTD) or procedural acronyms (FPC, AfD, QPQ), while Spanish ones mainly denote article status (AB, AD, CAD). Similarly, the documents integrate both intelligible and technical code references. We identified 53 intelligible codes (31 EN; 22 ES) such as [citation needed], and 88 unintelligible or advanced codes (62 EN; 26 ES) requiring technical familiarity, such as &feedformat=atom or .

Both editions adhere to Wikipedia’s core editorial principles—neutrality, notability, verifiability, and stability—but differ in implementation and scope.

In the English edition, detailed criteria govern Featured and Good Articles, Pictures, Did You Know, and In the News sections. Featured Articles must be “well-written, comprehensive, neutral, accurate, and stable,” while Good Articles and Featured Pictures emphasize verifiability, quality, and free licensing. The Did You Know section prioritizes factual novelty, and In the News requires recency and editorial consensus, explicitly discouraging commercial or politically driven content.

In the Spanish edition, the main policies concern Artículo Destacado (Featured Article) and Efemérides (On This Day). Featured Articles must meet criteria of verifiability, neutrality, clarity, and multimedia integration. Efemérides guidelines prioritize representativeness within the Spanish-speaking world, recommending preference for women, lesser-known countries, unusual professions, and temporal balance when multiple options are equally valid.

Other editorial recommendations concern visual and practical aspects, such as minimum image resolution or the orientation of portraits to optimize reading flow. However, no explicit mechanisms address systemic content imbalance or diversity targets linked to the Wikimedia 2030 Strategy.

Overall, the findings indicate that while both editions share the same normative pillars, the English edition displays greater procedural complexity and technical specialization, while the Spanish edition formalizes experience-based participation thresholds and diversity-oriented criteria. Together, these patterns reveal how editorial governance combines openness with stratified access, and how technical, linguistic, and procedural factors jointly influence whose contributions reach the Wikipedia Main Page.

Results RQ3 Wikipedia editors insights

[edit]

The interviews reveal that, while the Wikipedia Main Page is formally open to community participation, its actual curation is concentrated among a small group of experienced administrators. Their personal criteria, familiarity with content, and interpretive discretion largely determine what becomes visible, turning editorial authority into a subtle gatekeeping mechanism.

Most participants described the Main Page not as a corrective instrument for gender or intersectional imbalances, but rather as a mirror of existing inequalities within the encyclopedia and in society at large. Efforts to promote more diverse representation are viewed as desirable but secondary to maintaining perceived neutrality and consensus.

Although the Wikimedia 2030 Strategy articulates goals for equity and diversity, no systematic measures have been adopted to operationalize these principles in front-page selection. Attempts to introduce them are often seen as “activist” interventions that risk polarizing the community.

Initiatives addressing gender or intersectional gaps generally emerge from grassroots groups and thematic WikiProjects, which work indirectly to improve the pool of eligible articles rather than influencing selection procedures themselves.

Overall, the interviews portray the Main Page as a symbolic and contested space—a site where tensions between neutrality, community autonomy, and representational equity become most visible, highlighting the need for clearer governance and diversity-aware editorial frameworks.

More information Ferran-Ferrer, Núria; Fernández, Laura; Centelles, Miquel (2026). “Chapter8 Produsage: The Audience as Producer. Cover Women: Uncovering Gender Bias on the Wikipedia Main Page”, In Rebecca Ann Lind (Ed.), Race, Gender, Class and Media: Considering Diversity Across Audiences, Content, and Producers (6th Edition), New York: Routledge. [April/May]

Fernández, Laura; Ferran-Ferrer, Núria (2024). Navigating bias: gender and intersectional insights into wikipedia’s front page through gatekeeping and agenda-setting frameworks, 11th WikiWorkshop, June 20.

Discussion and Conclusions

[edit]

This study offers the first decade-long, intersectional view of visibility on Wikipedia’s English Main Page. Across ~30,000 featured biographies (2011–2024), we find a structural gender imbalance that persists across time, geography, and occupation, indicating that front-page curation operates as distributed gatekeeping that converts attention into epistemic authority. Metadata completeness and semantic choices are not neutral; they determine who can be seen.

Temporality intensifies inequity: a strong recency bias privileges contemporary figures, while the gender gap widens in earlier periods. Geographically, visibility concentrates in North America and Western Europe, amplifying Anglophone centrality and reproducing global asymmetries. Occupationally, public-facing fields (politics, entertainment, culture) dominate, with near parity in performance arts but pronounced gaps in politics, business, the military, and several STEM-adjacent domains. Sociocultural attributes reveal a second mechanism—missingness—as sparse and asymmetric metadata (e.g., native language, ethnicity, religion) constrain intersectional analysis and risk rendering “unmarked” majorities invisible while hyper-marking minorities.

These patterns position the Main Page as a communicative interface, not a neutral surface: prominence signals credibility, and systematic skew reproduces epistemic inequality through agenda-setting and framing effects. Bridging information science and communication studies, our findings show how interface-level curation and ontology design jointly allocate symbolic capital and shape knowledge equity.

Limitations and How to Address Them

a) Descriptive rather than causal design Limitation: The study identifies patterns but cannot determine causality — for instance, whether inequalities stem from editorial selection or from the existing pool of available articles. How to address: Future research could use quasi-experimental designs (e.g., before/after analysis of policy changes on the Main Page), propensity score matching to compare similar biographies that were or were not selected, and interrupted time-series models to detect temporal shifts. Pre-registration of hypotheses and analytical plans would strengthen causal inference.

b) Dependence on incomplete and inconsistent Wikidata metadata Limitation: High levels of missing data and semantic ambiguity (e.g., the conflation of sex and gender in property P21) restrict intersectional precision. How to address:

  • Apply entity reconciliation and controlled vocabularies to standardize occupations and identities.
  • Use multiple imputation with sensitivity analysis to handle missingness.
  • Advocate for clearer property distinctions (e.g., separate sex and gender) and standardized schemas for ethnicity and religion.
  • Launch community campaigns to improve metadata completeness and peer-reviewed quality checks.

c) Opacity of the selection process (gatekeeping) Limitation: The study does not reconstruct deliberations, queues, or acceptance criteria behind Main Page choices. How to address:

  • Collect anonymized decision logs and selection queues for audit.
  • Conduct ethnographic observation or semi-structured interviews with editors.
  • Publish regular representativeness summaries and minimal traceability reports to improve accountability without compromising community autonomy.

d) Recency bias and article survival Limitation: The dominance of contemporary figures may inflate recent differences while obscuring long-term trends. How to address: Apply temporal weighting or age-standardization to balance representation across historical periods, stratify samples by era, and include an “exposure opportunity” variable (e.g., years since potential eligibility).

e) Validity of occupational classification and aggregation Limitation: Mapping professions to broader domains (e.g., UDC) may introduce bias, especially for multi-role individuals or noisy occupational labels. How to address:

  • Use multi-label coding with weighted roles.
  • Cross-validate with external taxonomies (e.g., ISCO, ESCO).
  • Conduct robustness checks using alternative aggregation schemes.

f) Limited generalizability across language editions Limitation: Findings based on the English edition may not generalize to other linguistic or cultural contexts. How to address:

  • Extend the analysis to a multilingual panel (e.g., EN, IT, ES, FR, DE, plus Global South editions).
  • Use hierarchical or multi-level models to compare structures across languages.
  • Apply difference-in-differences analysis where policy or curation changes differ between editions.

g) Visibility measured only as selection, not attention intensity Limitation: Being featured on the Main Page does not capture duration, placement, or click-through engagement. How to address: Integrate available telemetry (e.g., time displayed, section, day/hour of exposure, pageviews), and combine with experiments on layout rotation to approximate effective attention.

h) Unobserved confounders (e.g., article quality, media salience) Limitation: Variation in article quality or public notability may confound gender or geographic effects. How to address: Include detailed control variables (article length, references, FA/GA status, inbound links, media coverage proxies) and perform sensitivity analysis (e.g., Rosenbaum bounds) to assess robustness.

i)Measurement error in gender and group identities Limitation: Gender and identity terms evolve, and recorded labels may not reflect self-identification. How to address: Support explicit self-identification fields, versioned metadata with as-of dates, and harmonized controlled vocabularies with transparent data provenance.

j) Lack of qualitative triangulation Limitation: Quantitative data reveal “what” happens but not “how” or “why” editorial decisions are made. How to address: Combine quantitative trace data with qualitative methods — interviews, content analysis of discussions, and case studies of accepted and rejected nominations — to link decision processes with representational outcomes.

Outputs: Papers

[edit]

Centelles, Miquel; Salse, Marina; Pérez-Montoro, Mario; Ferran-Ferrer, Núria (2026). Gender and intersectional bias in featured biographies on the English Wikipedia Main Page (2011-2024), JASIST, [Under Review]

Ferran-Ferrer, Núria; Fernández, Laura (2026). “Chapter8 Produsage: The Audience as Producer. Cover Women: Uncovering Gender Bias on the Wikipedia Main Page”, In Rebecca Ann Lind (Ed.), Race, Gender, Class and Media: Considering Diversity Across Audiences, Content, and Producers (6th Edition), New York: Routledge.[In Press]

Ferran-Ferrer, Núria; Centelles, Miquel; Laura Fernández; (2025). Wikipedia’s Front page ten years evolution: analysis of the representation of gender and intersectionalities on biographic content and its editorial policies. Online Journal of Communication and Media Technologies [In Press]

Macià, Yessica; Fernández, Laura; Ferran-Ferrer, Núria (2025). Editorial decision-making on Wikipedia: an analysis of gender bias and its impact on discoverability and information retrieval]. Data techonogies and applications [In Press]

Centelles, Miquel; Ferran-Ferrer, Núria (2024). Taxonomies and Ontologies in Wikipedia and Wikidata: An In-Depth Examination of Knowledge Organization Systems, Hipertext.net, 28, 33-48 https://doi.org/10.31009/hipertext.net.2024.i28.04

Centelles, Miquel; Ferran-Ferrer, Núria (2024). Assessing Knowledge Organization Systems from a gender perspective: Wikipedia Taxonomy and Wikidata Ontologies, Journal of Documentation, 80 (7), 124-147 https://doi.org/10.1108/JD-11-2023-0230

Outputs: Conferences

[edit]

Ferran-Ferrer, Núria (2025). "Reptes i resultats dels projectes W&W, HerStory i Cover Women", Wiki GLAM Gender, 28 May, Universtiat de Barcelona https://prezi.com/view/98NT3kwEj1eIRtDtacfv/

Bridges, Laurie; Ferran-Ferrer, Núria (2025). “Bottom-Up Organizing in Wikimedia Projects“, WikiKult network meeting, Wikimedia Germany, 25-26 June.

Ferran-Ferrer, Núria; Laura Fernández; Centelles, Miquel (2024) “Behind the front page: A comparative gender gap study on Wikipedia’s main page through gatekeeping and agenda-setting theories, 10th European Communication Conference, “Communication and social (dis)order, ECREA ECC24-2025 Section Gender, Sexuality and Communication, Faculty of Social Sciences, University of Ljibljana, Slovenia, September 24-26.

Fernández, Laura; Ferran-Ferrer, Núria (2024). Navigating bias: gender and intersectional insights into wikipedia’s front page through gatekeeping and agenda-setting frameworks, 11th WikiWorkshop, June 20.

Ferran-Ferrer, Núria; Fernández, Laura; Centelles, Miquel (2024). Cover women: a multilingual Wikipedia main page comparative proposal of gender and intersectionalities on content, newsroom guidelines, and insights from the community of volunteers, 11th WikiWorkshop, June 20.

Laura Fernández, Núria Ferran-Ferrer, Andrés Bejarano, Miquel Centelles (2024). "La bretxa de gènere a les portades de Wikipedia: desafiaments i oportunitats per a l’ètica periodística", III Jornada d’Ètica Periodística: (UB), Col·legi de Periodistes de Catalunya. https://www.youtube.com/watch?v=jpjdL7kfFJw (00:12:00) and https://www.youtube.com/watch?v=N6I3-vA-nzU (01:23:00)

Salse, Marina; Centelles, Miquel; Ferran-Ferrer, Núria (2025). Dones de portada: estudi descriptiu sobre els biaixos de gènere i la interseccionalitat a la Viquipèdia, II Congrés Càtedra UB de Perspectiva de Gènere i Feminismes Ciutat de Cornellà, 10-12 de desembre de 2025, Citilab de Cornellà, Spain.

Ferran-Ferrer, Núria; Kuggler, Francisco; Fernández, Laura; Centelles, Miquel (2025). “The effects of outlier data (about gender and intersectionalities) in Wikidata on Wikipedia’s main page: Results of the ‘Cover Women’ project”, Wiki Library Convention, January COLMEX, Mexico.

Outputs with students: Revision of editorial/curational practices of the Main Page

[edit]

Students of the Master of Research in Communication and Diversity, Universitat de Barcelona https://meta.wikimedia.org/wiki/Research:Cover_Women/Outputs

Outputs with students: Master thesis

[edit]

Students from the Master in Digital Humanities, Universitat de Barcelona:

Katia Montiel (2025). Análisis de la representación de las identidades de género minorizadas en los contenidos de Wikipedia y de la percepción de personas LGTBIQ+ sobre la inclusión

Serra-Gil, Aisa (2024). “Diversitat i representació de biografies a les portades de Wikipedia. Anàlisis interseccional dels articles de persones de l’edició en anglès. Núria Ferran and Miquel Centelles (direcció), Treball Final de Master d’Humanidades Digitals, Universitat de Barcelona, http://hdl.handle.net/2445/215691

Outputs: Handson

[edit]

Senabre, Enric, Ferran-Ferrer, Núria; Fernández, Kaura (2024). Guia d'edició de la portada de la Viquipèdia amb un bot, Viquitrobada, Cornellà de Llobregat. https://ca.wikipedia.org/wiki/Viquiprojecte:Viquitrobada_2024/Programa/Resums#12:00_Guia_d'edici%C3%B3_de_portada_amb_bot:_Primera_versi%C3%B3_d'un_exercici_de_millora_especulatiu

Resources

[edit]

More information:

Taiwanese translation: https://kids.twreporter.org/article/wikipedia-gender-equality-ch