Jump to content

Research:Cover Women

From Meta, a Wikimedia project coordination wiki
Created
1 June
Contact
[[en:Associate Professor at FIMA, Universitat de Barcelona|Associate Professor at FIMA, Universitat de Barcelona]]
Collaborators
Miquel Centelles, Laura Fernández
Duration:  2024-06 – 2025-06
gender, front-page, gate keeping, intersectionalities, diversity

This page documents a planned research project.
Information may be incomplete and change before the project starts.


This proposal presents a research project that will look into the most popular Wikipedia page. This page, known as the main page, or front page from acommunication perspective, will be analysed across the seven longest-standing Wikipedia editions: English, German, Catalan, French, Portuguese, Italian, and Spanish. Grounded in a gender and intersectional perspective, this study will delve into the daily content, newsroom guidelines (principles and standards that guide the dissemination of information), and volunteer community insights. The examination will employ communication theories like gatekeeping and agenda-setting. Beyond academic research, our goal is to actively contribute to editing communities by addressing the daily challenges and needs in cra ingfront-page content.

Introduction[edit]

Despite Wikipedia being a key player in the public sphere and having a transformative impact on information dissemination, Wikipedia grapples with persistent gender bias in bothediting andcontent(Antin et al., 2011; Bear and Collier, 2016; Wagner et al., 2016; Hinnosaar, 2019; Minguillón et al., 2021; Ferran-Ferrer, Boté-Vericad, et al., 2023) alongside additional prejudices (Redi et al., 2021; Beytía et al., 2022). Bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). Scholars highlight the need for a comprehensiveunderstandingofWikipedia's knowledgeproductionculturetoaddress these biases andmakeWikipediamorerobust, reliable, and transparent (Menking and Erickson, 2015). Reducing the gender and other intersectional biases necessitates more than acknowledging Wikipedia as a mirror of societal biases—it involves addressing the platform's deeper logic embedded in its techno-scientific project (Ford and Wajcman, 2017). Wehaveselected the most popular Wikipedia page for analysis. This page, commonly referred to as the main page, or front page from a communication perspective, is accessible in all language editions of the global encyclopedia, and wewill conduct our study on it. We will research into the possible gender and intersectional bias in its daily content, in its newsroomguidelines (principles and standards 1that govern the dissemination of information), and in the insights from the volunteer community whodecide whichinformation gets disseminated to the public on the main page. This research will utilize communication theories such as gatekeeping which examines the process by which information is filtered, selected, and ultimately presented to the public (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972) which studies the effect. Therefore, the research questions that we address are: ● ResearchQuestion1(RQ1):What insights do interviews with volunteer gatekeepers (editors of the main page of Wikipedia) provide on decision-making, biases, and strategies affecting the visibility of genderandintersectionality-related content onWikipediaʼs front page, particularly regarding how their preferences and interests, shape the topics featured? ● RQ2:Howdoesgatekeepingimpact gendergapsin content representation on digital platforms, specifically in the peer production of knowledge (decision-making system onsuitable content andwhatisnot) within newsrooms or editorial policies, and whyis understanding this phenomenon crucial for addressing gender disparities? ● RQ3:Howdoesagendasetting influencetheselection of framesand sentiment adopted by Wikipedia pages concerning specific issues or events, and howdoesit shape the focus and intensity of user edit activity within Wikipedia? ● RQ4:Howprevalentisgenderand intersectional bias in the content featured on Wikipedia's front pages? This research is necessary to drawfurther attention to the need for systemic change within the platform's newsroom/editorial practices to address disparities in gender and diversity representation in online knowledge and foster a more inclusive and diverse digital information landscape.

Related work[edit]

To contrast the feasibility of this proposal with seven language editions of Wikipedia, we have already conducted amicroprojectwitha sampleoftheEnglishandSpanishWikipedia to assess the viability of the global project. That is: a) Ifthere areopenandformalized recommendationsandguidelinesthat determinewhichcontentsare published on the main page and if the publication criteria can be analysed. b) Atthesametime,wewereinterested in seeing if with data wrangling techniques wecouldworkwiththe biographies published on all Wikipedia mainpagesandanalyse themfromagenderandintersectional perspective using the properties of Wikidata. c) Finally, we highlight the ease of contacting the community that performs gatekeeping tasks, and we begin to prepare the relevant questions to understand the decision-making process, editorial practices, and identify the issues that may be relevant to understanding the phenomenon.

The results of this previous trial work, with two language editions, will be published soon (Ferran-Ferrer et al., 2024). The trend is not encouraging if we take into account that bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). To address this, scholars stress the importance of understanding Wikipedia's knowledge production culture to tackle its gender gap (Menking and Erickson, 2015). Addressing this issue requires delving into the foundational principles driving the platform's techno-scientific project (Ford and Wajcman, 2017; Geiger, 2017), necessitating the recognition and dismantling of exclusionary practices (Menking and Rosenberg, 2021). Communication theories like gatekeeping and agenda-setting provide valuable frameworks for understanding Wikipedia's potential biases. Gatekeeping theory, focusing on information f iltering processes, is applied to scrutinize stories selected for the Front Page, which attracts millions of readers monthly (Barzilai-Nahon, 2009; Wikimedia, 2023). Gatekeeping theory has previously been applied to Wikipedia by researchers to further understand biases in content selection and presentation (Li and Farzan, 2020) and to advocate for a reorganization of online spaces to democratize content and encourage dialectical gatekeeping that could reduce racial and other disparities (Ezell, 2021). Additionally, drawing from agenda-setting theory, we examine how Wikipedia's main page influences viewers and shapes news hierarchy, including its agenda-building power (McCombs and Shaw, 1972; Ren and Xu, 2023). Agenda setting can impact the choices of frames and sentiment adopted by Wikipedia pages regarding a particular issue or event (Lee, 2018) and it can play a role in shaping the focus and intensity of user edit activity in Wikipedia (Mahabir et al., 2018). This study goes beyond affirming Wikipedia's reflection of reality to delve into its systemic challenges (Ford and Wajcman, 2017). It analyses not only main page content selection but also newsroom guidelines, including interviews with gatekeepers, to enhance understanding and address systemic issues.

Methods[edit]

This research proposal outlines a study on gender representation and biases on Wikipediaʼs main page, the most visited Wikipedia page, the main page (or front page from a communication perspective), which got 46.8 billion visits last NovemberontheEnglish edition (Wikimedia, 2023). We will do a comparativeanalysisacross seven longest-standing Wikipedia editions, English, German, Catalan, French, Portuguese, Italian, and Spanish, all of them born in 2001, employing a mixed-methodsapproach. Grounded in genderandintersectionality, the study will analyse daily content, editorial/newsroom guidelines, and insights from volunteer communities using communication theories like gatekeeping (Barzilai-Nahon, 2009) and agenda-setting (McCombsandShaw,1972). Ouraimisnotonlyacademicresearch,but also active contribution to editing communitiesbyaddressingdailychallengesin cra ingfront-pagecontent. Therefore, in the project's work team, we have already included seven working groups of Wikipedia users involved in gender for each language edition and the chapters of all the Wikipedias analysed in this project (See Table 2). The first stage of the project will be: a) Toconductascopingreview, a systematic literature review using the SALSA Framework(Grant and Booth, 2009) to analyse the academic publications from 2001 to 2024. This review will concentrate on examining Wikipedia within theframeworkofa communicationecosystem. Then, wewill employ a triangulation methodology. b) In-depthinterviews with voluntary editors of the front page from all seven Wikipedia editions to ascertain decision-making processes, biases, and strategies that influence content visibility related to gender and other intersectionalities. The interviews will be conducted in person or online and in the native languages of the volunteer participants. Weplantomakearoundfiveinterviews by language edition. Contacts with the volunteers will be obtained through discussion pages related to editing the main page, as well as from user groups participating in the project, such as calls from the samechapters to their networks. The interview transcriptions will be coded and analysed using qualitative data analysis soware, and a specific codebook will be generated to facilitate the coding. This methodological approachwill address RQ1andRQ3. c) Newsroomguidelines: Wewill apply content analysis to main-page, or front-page editorial guidelines, for each language edition, and we will explore what leads the decision-making of the gatekeepers who determine story prominence. The content of these guidelines will be coded and analysed using qualitative data analysis soware, and aspecific codebook will be generated to facilitate the coding. This research strategy will tackle RQ2. The analysis of the qualitative approachtoagendasettingand gatekeeping practices (RQ1-3) will be conducted independently with two codebooks, one for the interviews and one for the editorial policies. However, each codebook will encode elements specific to gatekeeping and agenda setting to obtain evidence that corresponds to the theoretical framework. d) Main-pagecontentquantitative analysis: We will scrutinize the content (biographies) on the front page in each of the seven language editions for ten years, with data wrangling. To do so, f irst, wewill identify the sections of the mainpagethatareconsistently present across all Wikipedias and are easily comparable. Wikipedia's front pages regularly feature changing content, offering a snapshot of current events, featured articles, and useful links. It's important to note that volunteers maintain these main pages and mayevolve in format and content over time. For each language edition, a unique method will be employed to retrieve the content and data of its main page from the past ten years, as the URLs of previous main pages cannot be obtained from the dumps. Quantitativeanalysis will begin byscraping throughthe open-source tool OpenRefineto reconcile the URIs foundinthe sections of Wikipedia covers in both language editions. This process will 4enrich them with specific properties from Wikidata to obtain values of the selected properties for study: like P21 (sex or gender), P106 (occupation), P172 (ethnic group), P103 (native language) and others. OpenRefine, utilized in various contexts and applications, is essential for this research as it enables the preparation and analysis of vast amounts of data. This methodwillrespondtoRQ4. Table 1 offers a comprehensive overview of the research proposal.

Expected output[edit]

The specific research outputs that we envision for our proposed project include, but are not limited to: ● Scientificpublications: Wewill dra scientific publications for each research question and assess whether the approach is comparative across all editions or if it is better to separate them by smaller communities, editorial process typologies, etc. This will be determined once the study is completed to decide on the best dissemination approach. ● Thedataset,emergingfromRQ4,will be madeavailable as downloadable dumps, and will be accessed via public APIs and a SPARQLendpoint. ● Participation at least at these conferences: ○ Wikiworkshop ○ Wikimania ○ WikiWomenCamp ○ Eachusergroupandchapter will participate in national or regional events with Women Cover results. ● Toolstosupporttheeditorialtasksof gatekeeping, namely: ○ Guidelinesforcontent selection on front pages that are attuned to intersectionality and gender diversity; ○ BotsandAIassistantsthat facilitate the content selection process for front pages, with a focus on acknowledging intersectionality and gender differences. Both tools will be developed with a focus on considering the collaborative environment and consensus-driven approach characteristic of Wikipedia. ● Resourcesaimedatenhancingthe archiving and curation of mainpage content across all Wikipedia editions outlined in this proposal. For each output, explain who the primary intended audience for the output is and what benefit, if more specifics are available, they can gain by receiving the output. If you have specific publication venues, conferences, and so on in mind, please list these.

Community impact plan[edit]

The project aligns with the Wikimedia Movement's 2030 strategy by focusing on delivering knowledge as a service and addressing equity in knowledge and communities overlooked by structures of power and privilege. Furthermore, Cover Women project will involve a teamof5researchers, professors from the University of Barcelona, one from the UOC, and a PhDstudent, with a multidisciplinary perspective, as we have individuals from the f ields of communication, semantic web, digital humanities, and computer science. Additionally, this project proposal has been designed according to the needs of various activist groups regarding gender equality on Wikipedia, as well as with the boards of the chapters involved in each language edition. See Table 2 to anticipate the impact on communities we will reach. These users are groups of Wikipedia users who work to achieve a better Wikipedia by introducing a gender perspective. Since we are working with 7 different editions of Wikipedia, we have 6considered that having a user group of female editors for each edition and a representation from each chapter's board would be interesting to achieve the project's objectives and meet the real needs of the communities. This project will provide: a) Decade-longinsightsintogenderand intersectionality content representation on Wikipediasʼ front pages. (RQ4) b) Beyonddescriptive stats, we'll reveal bias trends. (RQ4) c) Editorial strategies for gatekeeping andagendasetting. (RQ1-3) d) Guidelinesforethical content selection using AI and bots. (RQ3) e) Technicalguidancetoenhancedata archives on main pages. (RQ4) f) Collaborative work with volunteers ensures inclusivity, integrating advocate perspectives for a consensus-driven approach.(RQ1) Built on in-depth interviews and stable and lasting collaboration with Wikipedia chapters and user groups, this work addresses gender identity under-representation. Wewill utilize Wikipedia's consensus-based decision-making approach to address our research questions. This method prioritizes addressing the legitimate concerns of its editors and finding amiddle ground, all while adhering to Wikipedia's established policies and guidelines. In this context, it is crucial to consider that consensus naturally evolves amongeditors as they make changes, the importance of quality arguments in determining consensus, the allowance for consensus to evolve based on new evidence, and the acknowledgment of decisions beyond the scope of editor consensus. This methodology underscores Wikipedia's emphasis on collaboration, incremental progress, and communalharmonyinmanagingalarge crowd-sourced encyclopedia.


Resources[edit]

More information: https://openreview.net/attachment?id=dPPPDvAdRE&name=stage_two_submission https://www.ub.edu/wikiwomen/