Grants:Programs/Wikimedia Community Fund/Rapid Fund/zelph:Wikidata Contradiction Detection and Constraint Integration (ID: 23553409)/Final Report
Application type: Tech project
Parts 1-3: Project and impact
1. What was built or achieved during the project, and how did it align with your original goals, milestones and technical plan? (required)
During the project, zelph advanced from an alpha to a stable beta version (0.9), with significant enhancements to its core architecture. Key achievements include:
(1) Property Constraint Converter Framework: Instead of a fully generalized importer (which proved impossible due to the lack of formalized semantics in Wikidata's property constraints), I implemented a flexible framework using lambda functions to generate zelph rules for individual constraint types. This was applied to two high-impact constraints: conflicts-with (Q21502838) and none-of (Q52558054). The framework processed the Wikidata dump (wikidata-20250127-all.json), generating 8,709 contradiction-detection rules, published at https://zelph.org/constraint-list/. Applying these rules to the imported data revealed millions of violations, with samples published at https://zelph.org/constraints/! (full list available for download at the bottom of the page). This aligns with the original goal of constraint integration but adapted to focus on implementable constraints, avoiding qualifiers (which require future import extensions).
(2) Reporting System: The planned dashboard pivoted from generic deductions to targeted contradiction detection, informed by collaborations with the Wikidata Ontology Cleaning Task Force. Reports were generated for specific violation types: Split Order Class Violations (744,517 cases, published at https://zelph.org/grant-report/), Disjointness Violations (https://zelph.org/disjointness-violations/), and Mereology Violations (https://zelph.org/mereology-violations/). These were enabled by deep technical improvements: a 1000x faster, non-iterative reasoning engine; RAM optimization to 210 GB for the full dump using unordered_dense containers; and binary serialization of the network state, published on Hugging Face ([1]) with a pruned version requiring only 16 GB RAM. A video presentation detailing these outcomes is at https://zelph.org/presentation/.
These deliverables aligned with the milestones by delivering functional tools and reports, though adapted based on community insights (e.g., direct contradictions over deductions). The technical plan was enhanced for scalability, ensuring complete Wikidata processing on a single machine without external dependencies.
2. Share links that demonstrate your project's impact, usage, and technical outcomes. (required)
Required links:
- Project page on relevant Wikimedia spaces (e.g. Phabricator, Wikimedia projects, Toolforge)
- Code repository (e.g. Gerrit, GitHub or GitLab)
- Documentation or user guides
- Dashboards, metrics tools, or analytics used to track usage or contributions
Optional links you may include:
- Diff or mailing list announcements
- Community feedback
- Demos or product presentations
- Survey results or user testing feedback
- Examples of integrations or usage within Wikimedia projects
Required links:
- Project page on relevant Wikimedia spaces: [2]
- Code repository: [3]
- Documentation or user guides: [4] (main site with guides, e.g., [5] for usage)
- Dashboards, metrics tools, or analytics: [6] (includes metrics like 744,517 contradictions detected); no dynamic dashboards due to static design, but full violation lists downloadable (e.g., [7])
Optional links:
- Diff or mailing list announcements: N/A (community engagement via Task Force meetings, see [8] and [9])
- Community feedback: Discussions in Wikidata Ontology Cleaning Task Force (e.g., weekly rounds, no public logs; feedback led to pivots)
- Demos or product presentations: [10] (video demo from Task Force talk)
- Survey results or user testing feedback: Informal feedback from Task Force members (e.g., Peter Patel-Schneider, Ege Atacan Doğan) on violation reports; no formal survey
- Binaries on Hugging Face enable community reuse ([11])
3. What are the key lessons you learned during this project, both technical and non-technical? (required)
Technical Lessons:
- Wikidata's property constraints lack semantic formalization, making generalized rule generation challenging; a modular lambda-based framework proved more practical, successfully implemented for two constraints and scalable for more.
- Initial assumptions about needing deductions for contradictions were incorrect - direct violations are abundant; pivoting to targeted rules (e.g., for Split Order, Disjointness) yielded immediate impact.
- Processing the full 1.7 TB dump required aggressive optimizations: a non-iterative reasoning engine (1000x speedup) and custom containers reduced RAM to 210 GB, enabling single-machine analysis - revolutionary for offline Wikidata work.
- Binary serialization and pruned datasets (16 GB RAM) democratize access, but qualifiers are in not yet imported at this development stage.
Non-Technical Lessons:
- Community collaboration (e.g., Ontology Cleaning Task Force) is crucial - weekly discussions revealed real pain points, leading to pivots that aligned better with Wikidata needs.
- Solo development has limits; recruiting additional developers via future funding will accelerate progress.
- Documentation and transparency (e.g., publishing reports and binaries) build trust and encourage adoption.
- Balancing ambition with feasibility: The project exceeded goals in scalability but adapted scope to data realities, emphasizing iterative community feedback.
4. How did the Wikimedia community or your target audience engage with your project during its development or release? (required)
The target audience - Wikidata editors, Ontology Cleaning Task Force members, and Mereology Task Force participants - engaged actively through weekly virtual meetings. I presented progress, including the video at [12], receiving feedback from experts like Peter Patel-Schneider and Ege Atacan Doğan. This led to pivots, such as focusing on specific violations.
No broad surveys, but informal endorsements confirmed demand. Post-release, binaries on Hugging Face saw initial downloads, indicating early adoption.
5. What risks or challenges did you encounter (related to delivery, safety, or security), and how did you address them? (required)
Delivery Challenges:
- Lack of semantic formalization in constraints delayed the converter; addressed by building a lambda framework and implementing two key types, generating 8,709 rules.
- Scale of Wikidata dump risked incomplete analysis; mitigated by engine optimizations (1000x speedup, RAM reduction).
- Pivot from deductions to direct violations, informed by Task Force; ensured relevance without derailing timelines.
Safety/Security Risks:
- No user data handled - only public dumps; static reports minimize vulnerabilities.
- No privacy issues, as outputs link to public Wikidata items.
All challenges were resolved within the grant period, with no security incidents.
6. Who will maintain the project going forward, and what is your plan for long-term maintenance? (required)
I, Stefan Zipproth, will remain the primary maintainer, continuing volunteer efforts post-grant. Long-term maintenance includes integrating features into zelph's core, with minimal overhead due to static reports and offline design.
To scale, I plan a follow-up Rapid Fund application to recruit additional developers, allocating funds for their contributions. This will focus on essentials like qualifier imports. Ultimately, I aim for a Research Grant for broader impact, building on community ties from the Task Forces. No ongoing costs anticipated; infrastructure is self-hosted.
(questions 7-9 are skipped)
Part 4: Financial reporting
[edit]10. Please state the total amount spent in your local currency. (required)
3900
11. Please state the total amount spent in US dollars. (required)
5087.67
12. Report the funds spent in the currency of your fund. (required)
Provide the link to the financial report https://docs.google.com/spreadsheets/d/138yvQZ3iVZOlGZ0d4VK2S1UvVdn1ArbgbSvFm6VBPjg/edit?usp=sharing
12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)
13. Do you have any unspent funds from the Fund?
No
13.1. Please list the amount and currency you did not use and explain why.
N/A
13.2. What are you planning to do with the underspent funds?
N/A
13.3. Please provide details of hope to spend these funds.
N/A
14.1. Are you in compliance with the terms outlined in the fund agreement?
Yes
14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?
Yes
14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.
Yes
15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)