Hi. My name is Martin, I joined the Wikimedia Foundation in September 2019 as a Research Scientist in the Research team. My background is in Physics where I worked on trying to understand the dynamics of complex social systems. I currently live and work (mostly) in Berlin, Germany.
My work focuses on Knowledge Gaps in order to understand and address structural inequalities in Wikimedia projects and the online ecosystem more generally. I have been contributing to this program in three main areas: i) understanding readers and how they are navigating in Wikipedia; ii) developing models for structured tasks to make it easier to newcomer editors to contribute; and iii) developing models to reliably assess the readability of content in Wikipedia. For more details about ongoing and past projects see below.
A collection of things that I have been working on.
Some of the tools that I have (helped) develop.
- List-building models This tool allows one to build a list of related articles to a "seed" based on various models.
- WikiNav This tool provides insights into how readers of Wikipedia explore the content when learning about a given topic using the clickstream dataset. See also our post in the Wikimedia Tech-blog
- Readability tool This tool provides scores about an article's readability in different languages (under development)
- Wiki-Visibility tool This tool provides recommendations to increase the visibility of orphan articles.
Some python packages that make it easier to work with Wikimedia data:
- mwparserfromhtml: parsing Wikipedia HTML (parsoid output). See also our post in the Wikimedia Tech-blog
- mwtokenizer: word / sentence tokenization for text in (almost) all languages in Wikipedia
- Research:Improving multilingual support for link recommendation model for add-a-link task
- Research:Multilingual Readability Research
- Research:Develop a model for text simplification to improve readability of Wikipedia articles
- Research:Recommending links to increase visibility of articles
- Research:Understanding Curious and Critical Readers
- Research:Characterizing Readers Navigation
- Copyediting as a Structured Task
- Link recommendation
- Language-agnostic list-building for ad-hoc topic modeling
- Developing metrics for content gaps in the taxonomy of knowledge gaps
- Metrics for quantifying gender content gaps
- List of covid-related articles based on reader sessions (reader interest)
- New user reading patterns
- Usage of talk pages
- Akhil Arora, Robert West, Martin Gerlach. 2023. Orphan Articles: The Dark Matter of Wikipedia. https://arxiv.org/abs/2306.03940
- Tiziano Piccardi, Martin Gerlach, Robert West. 2023. Curious Rhythms: Temporal Regularities of Wikipedia Consumption. https://arxiv.org/abs/2305.09497
- Tiziano Piccardi, Martin Gerlach, Akhil Arora, and Robert West. 2023. A Large-Scale Characterization of How Readers Browse Wikipedia. ACM Transactions on the Web. https://arxiv.org/abs/2112.11848
- Tiziano Piccardi, Martin Gerlach, Robert West. 2022. Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions. WikiWorkshop 2022: In Companion Proceedings of The Web Conference 2022 (WWW '22). https://arxiv.org/abs/2203.06932
- Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, Robert West. 2022. Wikipedia Reader Navigation: When Synthetic Data Is Enough. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22). https://arxiv.org/abs/2201.00812
- Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, Djellel Difallah. 2021. A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach. 30th ACM International Conference on Information and Knowledge Management (CIKM '21). https://arxiv.org/abs/2105.15110
- Isaac Johnson, Martin Gerlach, Diego Sáez-Trumper. 2021. Language-agnostic Topic Classification for Wikipedia. WikiWorkshop 2021: In Companion Proceedings of The Web Conference 2021 (WWW '21). https://arxiv.org/abs/2103.00068
- Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia. 2021. A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft). https://arxiv.org/abs/2008.12314
- 2023-02: Blogpost titled From hell to HTML: releasing a Python package to easily work with Wikimedia HTML dumps on the Wikimedia Tech Blog with Isaac Johnson and Nazia Tasnim
- 2021-09: Blogpost titled Analyzing the Wikipedia clickstream just got easier with WikiNav on the Wikimedia Tech blog with Muniza A. and Isaac Johnson (WMF)
- 2021-09: Blogpost titled World Suicide Prevention Day and the opportunity to increase access to mental health information on Wikimedia projects on the Diff blog with Cristina Butoiu (WMF) and Leighanna Mixter (WMF)
- 2023-06: Invited talk at the Computational Social Science Seminar at Centre Marc Bloch on Going down the rabbit hole: Understanding information seeking in Wikipedia
- 2023-02: Presentation at FOSDEM on Building open tools to support research on Wikimedia projects
- 2022-04: Presentation at the Wikimedia Foundation April 2022 Staff meeting on 5 Learning from Research on Reader Navigation
- 2021-08: Lecture on Editing with machine learning: a case study on link recommendations at Wikimania 2021 with Kosta Harlan (WMF), Marshall Miller (WMF), Rita Ho (WMF), and Morten Warncke-Wang (WMF)
- 2021-08: Workshop on Indicators for the Wikimedia Projects at Wikimania 2021 with Marc Miquel, Pablo Aragón (WMF), Miriam Redi (WMF), David Laniado, and Cristian Consonni.
- 2021-06: Keynote on The science of knowledge equity at the Wikimedia Foundation at PCNet21 satellite workshop on political communication networks as part of Networks 21: A Joint Sunbelt and NetSci Conference.