NLP for Wikipedia (ACL 2025)/Program
Appearance
| Home | Call for Papers |
|
2nd WikiNLP: Advancing Natural Language Process for Wikipedia
Co-located with ACL 2025. Room 2.31
|
Attending
[edit]Registration
[edit]Workshop registration is handled through the main ACL conference. Details may be found here: https://2025.aclweb.org/
Invited speakers
[edit]- Keynote: Monica Lam (Stanford)
- Title: Accelerating Knowledge Discovery with LLM-Based Research Assistants
- Abstract: The advances of LLM have the potential of greatly accelerating the discovery of knowledge. By methodically automating cognitive processes such as reading, retrieving structured and unstructured data, and pre-writing, we have developed a research assistant that helps historians, journalists, and consumers perform research in general topics. Research on specialized topics requires a more sophisticated knowledge discovery pipeline on sets of documents. This talk presents an overview of the research, the impact of the research, and lessons learned.
- Keynote: Matthias Gallé (Cohere)
- Title: Multilingual Research in 2025: Deeper & Broader
- Abstract: As many other branches of NLP, research in multilinguality has undergone an 'existential crisis' with the advent of powerful LLMs which seem to throw entire research areas into irrelevancy. What are the challenges for multilinguality in 2025? This talk is an argument that there are two main areas we should focus on: On one side multilingual models still need to get substantially better: in any language-parallel benchmark, non-English performance trails English performance, betraying the benefit of multi-task learning we were promised. Similarly, different languages have challenges that are not present when focusing in English only and we will revise a few of those. On the other side the promise of universality is far from reached: many languages lack support in frontier models, and the curse of multilinguality appears to be a blocking hurdle.
- Discussion with a Wikimedian: Ciell
Program
[edit]The schedule below is subject to change. Note that the poster session time changed on 31 July.
- 09:00 – 09:05: Opening
- 09:05 – 10:00: Keynote: Monica Lam, "Accelerating Knowledge Discovery with LLM-Based Research Assistants"
- 10:00 – 10:30: Dataset Panel
- 10:30 – 11:00: Coffee Break & Play Recorded Videos
- 11:00 – 12:00: Keynote: Matthias Gallé, "Multilingual Research in 2025: Deeper & Broader"
- 12:00 – 14:00: Lunch
- 13:15 – 14:00: Informal Conversation about the Checklist (during lunch)
- 14:00 – 15:00: Poster Session
- 15:00 – 16:00: Wikimedian Session: Ciell
- 16:00 – 16:30: Closing
Accepted papers
[edit]Track 2: Datasets (archival)
[edit]- Wikivecs: A Fully Reproducible Vectorization of Multilingual Wikipedia
- WETBench: A Benchmark for Detecting Task-Specific Machine-Generated Text on Wikipedia
- Proper Noun Diacritization for Arabic Wikipedia: A Benchmark Dataset
Track 3: Ongoing or published work on NLP for Wikimedia (non-archival)
[edit]- LLMaEL: Large Language Models are Good Context Augmenters for Entity Linking
- Toxic comments are associated with reduced activity of volunteer editors on Wikipedia
- Women, Infamous, and Exotic Beings: What Honorific Usages in Wikipedia Reflect on the Cross-Cultural Sociolinguistic Norms?
- Mind the Gap: Assessing Wiktionary’s Crowd-Sourced Linguistic Knowledge on Morphological Gaps in Two Related Languages
- Characterizing Knowledge Manipulation in a Russian Wikipedia Fork
- Can LLMs Improve Image Accessibility on Wikipedia?
- Detecting Sockpuppetry on Wikipedia Using Meta-Learning
- Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia
- NEWELL: Never-Ending Wikipedia Editing with LLM Agents
- Wikipedia in the Era of LLMs: Evolution and Risks
- WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts (ACL Main/Findings Track)
- Hierarchical Memory Organization for Wikipedia Generation (ACL Main/Findings Track)
- EvoWiki: Evaluating LLMs on Evolving Knowledge (ACL Main/Findings Track)