Jump to content

Wikimedia CEE Meeting 2025/Submissions/Wikidata-Wikimedia's knowledge graph in a world of Gen AI

From Meta, a Wikimedia project coordination wiki
ID : 105 Wikidata- Wikimedia's Knowledge Graph in a World of Gen AI
Author(s): Alan Ang, Senior Partner Manager, Wikidata Username(s): Alan Ang (WMDE) Type of submission: Lecture
Affiliation: Wikimedia Deutschland Theme(s): Technology
Abstract:

Wikidata is the world’s largest collaboratively created open knowledge graph, providing structured, multilingual, and standardized data for a wide range of applications. Since its launch in 2012 by Wikimedia Deutschland, Wikidata has grown to over 118 million entities, supported by more than 24,000 monthly contributors and available in 300+ languages. Unlike Wikipedia’s natural language articles, Wikidata’s structured format enables seamless use by computers—fueling applications, data analysis, and digital enrichment across diverse domains.

The rapid rise of Large Language Models (LLMs) and generative AI, however, poses new challenges: these systems typically rely on unstructured text and deep semantic representations, making structured data less directly accessible. To bridge this gap, the Wikidata Embedding Project was launched in partnership with Jina.AI and DataStax. By generating vector-based representations of Wikidata entities, the project enables semantic search and smoother integration into AI pipelines.

All models and tools are released openly, empowering developers and researchers to build AI systems that inherit Wikidata’s principles of inclusivity, transparency, and community collaboration. With multilingual support, the project ensures equitable global access, while encouraging active contribution and quality improvement by its community. Ultimately, this initiative strengthens the connection between structured knowledge and advanced AI, opening new opportunities for research, applications, and participation worldwide.

Slides:
Wikidata-Wikimedia's knowledge graph in a world of generative AI (2025)
Level of advancement: Basic
Special requirements:
Recording (Yes/No): Yes
Photography (Yes/No): Yes
How will this session be beneficial for the communities in the region of CEE?

Participants will gain practical insights into how Wikidata's new project can power AI and data-driven applications, and learn how new embedding techniques make structured knowledge accessible to LLMs and generative AI. They will walk away with a better understanding to build more powerful, multilingual, and community-driven AI systems.


Interested participants

[edit]

(register below and ask your questions now to the session organiser)

Documentation

[edit]