Grants:Project/WCDO/Culture Gap Monthly Monitoring/Timeline

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Project Grants This project is funded by a Project Grant

proposal people timeline & progress finances midpoint report


Timeline for WCDO[edit]

Timeline Date
Publish the Midpoint Report 15 August 2019
Publish the Final Report 30 January 2020


Monthly updates[edit]

March 2019[edit]

  • We debugged the process of collecting Cultural Context Content (CCC).
  • We participated in "Edit-a-thon" DHASA (Digital Humanities Association of Southern African) organized by DNdubane_(WMF) at the University of Pretoria with a short online presentation - Wikipedia Cultural Diversity Observatory project (March 27th).
  • We started creating the lists of Top CCC articles on several topics (folk, monuments, earth, music creations and organizations, sports and teams, food, paintings, glam, books, clothing and fashion, and industry).
  • We adapted the project meta site (https://meta.wikimedia.org/wiki/Wikipedia_Cultural_Diversity_Observatory) for the new phase.
  • We located several databases (e.g. ethnologue, wals) including all the world languages and studied their overlap in the territories where they are spoken in order to detect languages with a marginalization status.
  • We prepared the organizational documents, Excels, and code in order to tackle the new research and development phase for the project.

April 2019[edit]

  • We finished creating the lists of Top CCC articles on several topics (folk, monuments, earth, music creations and organizations, sports and teams, food, paintings, glam, books, clothing and fashion, and industry).
  • We created a language territories database (languages_territories.db) extending the file Wikipedia_language_territories_mapping_quality.csv and other files. This is based on the more than 6 thousand languages spoken in the world and computed their overlapping in the same territories.
  • We started writing a paper about editor participation on Cultural Context Content in order to explain how important it is to represent the context for the well-functioning of a Wikipedia.
  • We studied different possibilities in order to evangelize the Wikimedia movement with cultural diversity and wrote a document about a “Cultural Diversity Maturity Model” for communities.
  • We presented the WCDO project at the Seminario DigiDoc abril 2019 (Universitat Pompeu Fabra, Barcelona, Catalonia) as “Wikipedia Cultural Diversity Observatory: un caso de aplicación práctica del análisis de datos para mejorar la diversidad cultural en la Wikipedia” (slides in Commons).

May 2019[edit]

June 2019[edit]

  • We published the CCC Dataset publicly and for the research community and presented it at the conference ICWSM, Munich June 11-13th (Program). Reference: Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the 13th International AAAI Conference on Web and Social Media (pdf). ICWSM. ACM.
  • We did an analysis of the world languages according to their geographical extent, their social status and number of speakers in order to determine both the coexistence in a territory and the situations of language marginalization.
  • We were stuck with a bottleneck with MySQL databases replicas and had to code again the functions in multiple ways in order to make it work.
  • Marc has been contributing to the Diversity Working Group with recommendations directed to expand the horizons of the observatory and address other problematics in the diversity area.
  • We received feedback and made extensive changes and edits to the chapter for “Wikipedia@20” and participated in the reviewing process of other chapters.

July 2019[edit]

  • We attended the Wikimedia conference Celtic Knot and presented “Languages Matter to Cultural Diversity: Finding Missing Languages and Bridging the Gaps in Minority Languages” (slides).
  • We designed the database and main code for the monthly analysis (stats_generation.py).
  • We contacted the Wikitech and Analytics teams to consult the bottleneck and started re-rewriting the code of the whole WCDO framework in order to use the SQL dumps (those concerning the replicas tables).
  • We ended the computing of the dataset “Missing CCC”: this dataset contains for every language the articles that should exist because they are in their local context and instead they exist in a language of higher status (e.g. articles on Uganda that do not exist in Luganda Wikipedia but exist in English Wikipedia).
  • Marc has been contributing to the Diversity Working Group with the writing of the recommendations, weekly calls and has met in Rome with some members of the WG.

August 2019[edit]

  • We attended meetings in order to offer help in projects like GLOW, Intercultur, among others.

September 2019[edit]

  • We have generated a new version of the CCC dataset - although, it took three times more than before, and some features are still unavailable (those revision table based: number of editors, number of edits, etc.).
  • We have explored the possibility of uploading the Top CCC lists to Wikidata by creating new properties with Alex Stinton and Satdeep.
  • We made some specific analyses for the Arabic and Egyptian Arabic Wikipedias in order to prepare the presentation for the WikiArabia conference.
  • We started coding some new data visualizations (topical analyses) that are not available yet.

October 2019[edit]

November 2019[edit]

December 2019[edit]


Is your final report due but you need more time?