Grants:Project/WCDO/Culture Gap Monthly Monitoring/Timeline
Jump to navigation Jump to search
Timeline for WCDO
|Publish the Midpoint Report||15 August 2019|
|Publish the Final Report||30 January 2020|
- We debugged the process of collecting Cultural Context Content (CCC).
- We participated in "Edit-a-thon" DHASA (Digital Humanities Association of Southern African) organized by DNdubane_(WMF) at the University of Pretoria with a short online presentation - Wikipedia Cultural Diversity Observatory project (March 27th).
- We started creating the lists of Top CCC articles on several topics (folk, monuments, earth, music creations and organizations, sports and teams, food, paintings, glam, books, clothing and fashion, and industry).
- We adapted the project meta site (https://meta.wikimedia.org/wiki/Wikipedia_Cultural_Diversity_Observatory) for the new phase.
- We located several databases (e.g. ethnologue, wals) including all the world languages and studied their overlap in the territories where they are spoken in order to detect languages with a marginalization status.
- We prepared the organizational documents, Excels, and code in order to tackle the new research and development phase for the project.
- We finished creating the lists of Top CCC articles on several topics (folk, monuments, earth, music creations and organizations, sports and teams, food, paintings, glam, books, clothing and fashion, and industry).
- We created a language territories database (languages_territories.db) extending the file Wikipedia_language_territories_mapping_quality.csv and other files. This is based on the more than 6 thousand languages spoken in the world and computed their overlapping in the same territories.
- We started writing a paper about editor participation on Cultural Context Content in order to explain how important it is to represent the context for the well-functioning of a Wikipedia.
- We studied different possibilities in order to evangelize the Wikimedia movement with cultural diversity and wrote a document about a “Cultural Diversity Maturity Model” for communities.
- We presented the WCDO project at the Seminario DigiDoc abril 2019 (Universitat Pompeu Fabra, Barcelona, Catalonia) as “Wikipedia Cultural Diversity Observatory: un caso de aplicación práctica del análisis de datos para mejorar la diversidad cultural en la Wikipedia” (slides in Commons).
- We made a first version of the code to retrieve, store and process the data related to a) editors, b) images and c) missing CCC.
- We wrote and sent a chapter named “The Sum of Human Knowledge? Not in One Wikipedia Language Edition” for the book “Wikipedia@20”.
- Marc has joined the Diversity Working Group in the 2030 Strategy and became a representative for the group in Wikimania.
- We published the CCC Dataset publicly and for the research community and presented it at the conference ICWSM, Munich June 11-13th (Program). Reference: Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the 13th International AAAI Conference on Web and Social Media (pdf). ICWSM. ACM.
- We did an analysis of the world languages according to their geographical extent, their social status and number of speakers in order to determine both the coexistence in a territory and the situations of language marginalization.
- We were stuck with a bottleneck with MySQL databases replicas and had to code again the functions in multiple ways in order to make it work.
- Marc has been contributing to the Diversity Working Group with recommendations directed to expand the horizons of the observatory and address other problematics in the diversity area.
- We received feedback and made extensive changes and edits to the chapter for “Wikipedia@20” and participated in the reviewing process of other chapters.
- We attended the Wikimedia conference Celtic Knot and presented “Languages Matter to Cultural Diversity: Finding Missing Languages and Bridging the Gaps in Minority Languages” (slides).
- We designed the database and main code for the monthly analysis (stats_generation.py).
- We contacted the Wikitech and Analytics teams to consult the bottleneck and started re-rewriting the code of the whole WCDO framework in order to use the SQL dumps (those concerning the replicas tables).
- We ended the computing of the dataset “Missing CCC”: this dataset contains for every language the articles that should exist because they are in their local context and instead they exist in a language of higher status (e.g. articles on Uganda that do not exist in Luganda Wikipedia but exist in English Wikipedia).
- Marc has been contributing to the Diversity Working Group with the writing of the recommendations, weekly calls and has met in Rome with some members of the WG.
- We finished the coding of the stats (stats_generation.py) and partially debugged it (these stats are explained in the file sets_intersections.xls).
- We measured the language gap in geolocated articles to evaluate the impact of Wikimania 2018 on the creation of geolocated articles in Africa.
- We created the interface in order to retrieve Missing CCC articles (those about the local content of a language that do not exist in that language but in bigger ones).
- We attended the Wikimania conference and disseminated the work of the past months with 4 talks and 2 posters about the Cultural Diversity Observatory.
- Poster: Wikipedia Cultural Diversity Dataset: helping editors to enrich cross-language coverage. This poster explained the dataset.
- Poster: Maturity Levels for Cultural Diversity in Wikipedia Language Communities. This poster explained the different levels.
- Diversity Talk: Wikipedia Cultural Diversity Observatory (WCDO): Empowering Communities to Bridge the Culture Content Gaps. This presentation explained the current state of the project with its new Missing CCC lists and also alerted of the lack of impact of Wikimania 2018 to bridge the African content gap (pdf slides and video).
- Language Talk: Minoritized Languages and Missing Languages in Wikipedia: An Opportunity to Increase Cultural Diversity in Wikipedia. This presentation explained that to make Wikipedia more culturally diverse we need more languages (proposed a method to select them) and help minoritized languages to create their content (suggested a method to propose new articles) (pdf slides).
- Readership Talk: Increasing Wikipedia Readership By Creating Local Content In Language Editions. This presentation explained that local content is vital in order to increase a language edition readership and gave some numerical reasons (pdf slides).
- Research Talk: Cultural Diversity Funnels: A Metaphor To Study Wikipedia Communities and Knowledge Gaps. This presentation explained that there exist different barriers that stop cultural diversity representation and proposed the metaphor of a funnel in order to depict it.
- We attended meetings in order to offer help in projects like GLOW, Intercultur, among others.
- We have generated a new version of the CCC dataset - although, it took three times more than before, and some features are still unavailable (those revision table based: number of editors, number of edits, etc.).
- We have explored the possibility of uploading the Top CCC lists to Wikidata by creating new properties with Alex Stinton and Satdeep.
- We made some specific analyses for the Arabic and Egyptian Arabic Wikipedias in order to prepare the presentation for the WikiArabia conference.
- We started coding some new data visualizations (topical analyses) that are not available yet.