Grants:Programs/Wikimedia Community Fund/Rapid Fund/zelph:Wikidata Contradiction Detection and Constraint Integration (ID: 23553409)
Applicant details
[edit]- Main Wikimedia username. (required)
Acrion-dev
- Organization
N/A
- If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)
N/A
- Describe all relevant roles with the name of the group or organization and description of the role. (required)
Main proposal
[edit]- 1. State the title of your proposal. This will also be the Meta-Wiki page title.
zelph: Wikidata Contradiction Detection and Constraint Integration
- 2. and 3. Proposed start and end dates for the proposal.
2025-11-01 - 2025-12-31
- 4. What is your tech project about, and how do you plan to build the product?
Include the following points in your answer:
- Project goal and problem you solve
- Product strategy or project roadmap
- Technical approach (infrastructure, tech stack, key tools and services)
- Integrations or dependencies (if any)
Project goal and problem you solve: Wikidata is a cornerstone of the free knowledge ecosystem, but its vast scale makes it susceptible to hidden logical contradictions and inconsistencies. These issues can undermine data quality and propagate to other projects. The goal of this project is to leverage zelph, a sophisticated semantic network system, to automatically detect and surface these contradictions for Wikidata editors. By providing a clear, prioritized view of inconsistencies, we empower the community to improve the logical integrity of Wikidata's knowledge graph.
Product strategy or project roadmap: This project focuses on two foundational deliverables as the first step toward a deeper integration of zelph with Wikidata:
1. Property Constraint Converter: Develop a tool within zelph to automatically parse and convert Wikidata's property constraints into zelph's native inference rule format. This is a critical step to fully utilize Wikidata's own defined semantics for contradiction detection.
2. Reporting Dashboard: Create a static, publicly accessible web page on zelph.org that presents the detected contradictions. The report will be prioritized based on (a) the simplicity of the deduction chain leading to the contradiction and (b) the usage count of the affected Wikidata item, allowing editors to focus on the most impactful issues first.
This grant will lay the groundwork for future tools, such as the previously discussed real-time browser gadget, by first establishing a robust backend for contradiction analysis.
Technical approach (infrastructure, tech stack, key tools and services): The core of the project is zelph, a high-performance semantic network engine written in C++17 and built with CMake.
1. The Property Constraint Converter will be implemented by extending zelph's existing Wikidata JSON parser. It will read the constraint information for each property (P-entity) and generate corresponding rules in the zelph scripting language (.zph).
2. The Reporting Dashboard will be a static website generated directly by the zelph command-line application (.run-md command). The output will be in Markdown, which is then rendered to HTML using the existing mkdocs setup for zelph.org. This approach ensures minimal maintenance overhead and requires no dynamic backend or database.
The project will be hosted on my existing server infrastructure, which already hosts zelph.org. No additional infrastructure costs are required.
Integrations or dependencies: The project's primary dependency is the publicly available Wikidata JSON dump. The output (the Reporting Dashboard) will be tightly integrated with the Wikidata ecosystem by providing direct hyperlinks to the relevant Wikidata items and properties, enabling editors to quickly navigate to the source and resolve issues. No other external services are required.
- 5. What is the expected impact of your project, and how will you measure success?
Include the following points in your answer:
- Milestones and progress tracking
- Project impact and success metrics
Project impact and success metrics: The primary impact will be a measurable improvement in the data quality and logical consistency of Wikidata. By making contradictions visible and actionable, the project directly supports the work of Wikidata curators and contributes to the reliability of Wikidata as a source of structured data.
Success will be measured by the following:
1. Primary Metric: Successful delivery and public deployment of the two key deliverables: the functional Property Constraint Converter and the live, prioritized Reporting Dashboard.
2. Secondary Metric: The number of unique, high-priority contradictions identified and published on the dashboard upon completion of the initial analysis run. This will serve as a baseline for the tool's effectiveness.
- 6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?
Include the following points in your answer:
- Project demand and target audience description
- Links to interaction(s) with Wikimedia community
- Evidence from community consultation such as the [Community Wishlist]
Project demand and target audience description: The target audience consists of experienced Wikidata editors, data curators, and members of WikiProjects focused on data quality and consistency, such as WikiProject Ontology. These community members are actively working to maintain and improve the logical structure of Wikidata and will benefit directly from a tool that automates the detection of complex inconsistencies.
Links to interaction(s) with Wikimedia community:
Wikimedia Hackathon 2025: Participation and presentation in the official program.
Wikimania 2025 Poster: [1]
Wikidata WikiProject zelph: [2]
Evidence from community consultation: Demand for this project has been confirmed through sustained, proactive engagement with the Wikimedia technical community. I attended the Wikimedia Hackathon 2025 (May 1-5) at my own expense to present zelph and gather feedback.
During the hackathon, I:
(1) Presented zelph in an official speaking slot, explaining its architecture and potential for Wikidata.
(2) Held in-depth discussions with key community members and Wikimedia staff, including Mohammed Sadat, Adam Shorland, and Ollie Hyde.
(3) Engaged in direct technical collaboration, successfully resolving a GitHub issue for zelph that was filed by Ollie Hyde during the event.
This direct engagement demonstrates both community interest and the project's ability to integrate with community development workflows. To further broaden awareness, I created and submitted a poster for Wikimania 2025. The recommendation to apply for a Rapid Fund from Lydia Pintscher (Portfolio Lead for Wikidata) came as a direct result of these discussions and the positive reception of the project. The WikiProject page was created to provide a central hub for these ongoing community efforts.
- 7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?
Include the following points in your answer:
- The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
- How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks
This project presents minimal to no security or privacy risks.
- Data Source: The project exclusively processes the public Wikidata JSON dump, which contains no private user data.
- No User Input: The proposed tools do not accept or store any user-submitted data. There are no user accounts, login systems, or interactive forms that could expose personal information.
- Static Output: The Reporting Dashboard is a collection of static HTML files. This architecture is inherently secure, as it contains no server-side code execution, databases, or APIs that could be vulnerable to common web attacks.
- No PII: The system does not handle, store, or display any Personally Identifiable Information (PII).
- The level of in-house or consulted security and privacy expertise: As the sole developer with over 20 years of experience in software engineering, I am well-versed in security best practices. Given the project's simple, data-processing nature, my expertise is sufficient to manage the negligible risks involved.
- How your development, testing, and deployment processes mitigate risks: The development process inherently mitigates risks by design. The C++ application runs in a controlled, offline environment to process the data dump. The only public-facing component is the static website, which eliminates entire classes of security vulnerabilities. This approach adheres to the principle of least privilege and minimizes the attack surface.
- 8. Who is on your team, and what is your experience?
Include the following points in your answer:
- Your experience as a developer, relevant past projects
- Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
- Other team members, their roles and expertise
Your experience as a developer, relevant past projects: I am Stefan Zipproth, the sole developer for this project. I am the creator and maintainer of zelph. I have extensive experience in C++ development, semantic networks, and processing large-scale datasets. A key demonstration of my capability is that zelph can already process the entire 1.4 TB Wikidata JSON dump on a single machine, which is a non-trivial engineering challenge. My community engagement includes presenting zelph at the Wikimedia Hackathon 2025, demonstrating my ability to communicate complex technical topics to a developer audience.
Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles:
GitHub: [3]
Other team members, their roles and expertise: This is a solo project.
- 9. How will the project be maintained long-term?
Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.
zelph is a long-term personal project driven by my passion for semantic technology and open knowledge. I am fully committed to its maintenance and future development beyond the scope of this grant.
- Maintainer: I, Stefan Zipproth, will be the long-term maintainer.
- Maintenance Plan: The deliverables are designed for near-zero maintenance. The Property Constraint Converter will be integrated into the main zelph codebase and maintained as part of the overall project. The Reporting Dashboard is a static site that is regenerated when a new Wikidata dump is processed; it requires no active maintenance.
- Long-term Expenses: There are no anticipated long-term expenses. The project is hosted on my existing infrastructure, which I maintain for other purposes. The project will continue to be sustained by my volunteer effort after the grant period.
- 10. Under what license will your code be released, and how will you ensure the product is well documented?
Include the following points in your answer:
- Code license and compatibility with Wikimedia projects
- Documentation plan
Under what license will your code be released, and how will you ensure the product is well documented?
- Code license and compatibility with Wikimedia projects: The zelph source code is and will continue to be released under the AGPL v3 or later. This is an OSI-approved open-source license compatible with the Wikimedia ecosystem.
- Documentation plan: The project is already well-documented.
- Primary technical documentation is maintained in the project's README.md file on GitHub, which provides a comprehensive overview, build instructions, and usage examples.
- Community-facing documentation and project updates will be posted on the Wikidata WikiProject page: [4]
- I will follow best practices, such as those outlined in MediaWiki's Documentation Toolkit, to ensure all new features are clearly explained.
- Primary technical documentation is maintained in the project's README.md file on GitHub, which provides a comprehensive overview, build instructions, and usage examples.
- Community-facing documentation and project updates will be posted on the Wikidata WikiProject page: [5]
- 11. Will your project depend on or contribute to third-party tools or services?
The project has minimal dependencies. It relies on the public Wikidata JSON dump as its data source and uses standard, open-source build tools (C++, CMake, Git). It does not depend on any proprietary or third-party online services for its operation. The output is self-hosted and does not require external services.
- 12. Is there anything else you’d like to share about your project? (optional)
I am deeply committed to making zelph a valuable tool for the Wikidata community. This grant would be a significant catalyst, allowing me to dedicate focused time to develop these crucial integration features. To underscore my personal investment in this project's success and my respect for community funds, I have based my budget on a heavily discounted hourly rate (32.50 CHF) compared to my standard commercial rate (120 CHF). This grant is not about financial gain but about enabling a meaningful contribution to the free knowledge movement. This project represents a strategic first step, and its success will pave the way for more advanced tools to further enhance Wikidata's quality and utility.
Budget
[edit]- 13. Upload your budget for this proposal or indicate the link to it. (required)
https://docs.google.com/spreadsheets/d/1wUq4WTDBhjGZGtbThZqeZldE8IQiSmEpmdASASDp1Ls/edit?usp=sharing
- 14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)
3900 CHF
- 16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.
4869.6 USD
- We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.
Yes
Endorsements and Feedback
[edit]Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.
Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:
- Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
- Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
- Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
- Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
- Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.
