Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/zelph:Transitive Reasoning, Qualifier Support, and SPARQL-Subset Integration (ID: 23759260)

From Meta, a Wikimedia project coordination wiki
statusUnder review
zelph: Transitive Reasoning, Qualifier Support, and SPARQL-Subset Integration
request or grant IDR-RF-2601-21777
proposed start date2026-04-03
proposed end date2026-06-02
requested budget (local currency)3796 CHF
requested budget (USD)4932.26 USD
grant typeIndividual
funding regionNWE
decision fiscal year2025-26
applicantAcrion-dev
organization (if applicable)N/A

Applicant details

[edit]
Main Wikimedia username. (required)

Acrion-dev

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)

Main proposal

[edit]
1. State the title of your proposal. This will also be the Meta-Wiki page title.

zelph: Transitive Reasoning, Qualifier Support, and SPARQL-Subset Integration

2. and 3. Proposed start and end dates for the proposal.

2026-04-03 - 2026-06-02

4. What is your tech project about, and how do you plan to build the product?

Include the following points in your answer:

  • Project goal and problem you solve
  • Product strategy or project roadmap
  • Technical approach (infrastructure, tech stack, key tools and services)
  • Integrations or dependencies (if any)
Project goal and problem you solve
[edit]

While the previous grant successfully established zelph as a high-performance engine for detecting direct contradictions in Wikidata (e.g., Split Order Class violations), crucial semantic layers remain inaccessible.

  1. The "Disjointness" Gap: Research (Doğan/Patel-Schneider) identified over 14,000 disjointness violations. Detecting these requires two missing features:
    • Qualifiers (The Definition): Wikidata defines disjoint classes using qualifiers on property P2738. Without qualifier support, zelph relies on static, external CSV files to know what is disjoint.
    • Transitive Paths (The Violation): The violations often occur deep in the hierarchy. Without transitive closure support (P279+), zelph cannot traverse the path to find the culprits.
  1. Qualifiers (The Definition): Wikidata defines disjoint classes using qualifiers on property P2738. Without qualifier support, zelph relies on static, external CSV files to know what is disjoint.
  2. Constraint Incompleteness: The majority of standard Property Constraints rely on qualifiers. While zelph handles direct relations, it ignores qualifiers, making it impossible to enforce structural constraints defined by them (e.g., "Property Scope").
  3. Usability: Users currently face a steep learning curve due to zelph's custom syntax compared to SPARQL.

This project aims to close these gaps by importing qualifiers (to read definitions) and implementing transitive reasoning (to detect violations), creating a holistic, autonomous detection engine.

Product strategy or project roadmap
[edit]
  1. Qualifier Import: Extend the C++ JSON parser to import qualifiers as first-class nodes. Consistent with zelph's architecture, qualifiers will be treated as equal nodes in the network, enabling meta-reasoning about them. This enables zelph to natively parse the definition of disjoint sets (via P2738) and unlocks qualifier-dependent structural constraints (such as Property Scope or Item-requires-statement).
  2. Janet & SPARQL Integration: Embed the Janet programming language. This will serve two purposes: providing a Turing-complete scripting environment for zelph and enabling a PEG-based parser that translates a subset of SPARQL syntax directly into zelph rules/queries.
  3. Transitive Path Reasoning: Enhance the unification engine to support transitive closures (syntax P279+). This addresses the specific limitation identified in the "Disjointness Violations" report, allowing zelph to detect contradictions regardless of path length. Combined with Step 1, this enables end-to-end detection of Disjointness Violations from definition to detection.
Technical approach
[edit]
  • Core Engine (C++17): The import mechanism will be refactored to reify statements, allowing qualifiers to attach to relation edges. The inference engine will be updated to handle transitive closures, either via efficient iterative materialization (as discussed with expert Peter Patel-Schneider) or dynamic path traversal, ensuring deep ontological violations are caught.
  • Embedded Scripting &amp; SPARQL Subset: I will embed the Janet interpreter. Instead of writing a standalone SPARQL parser in C++, I will use Janet's Parsing Expression Grammars (PEGs).<br>Scope of SPARQL Subset: The implementation will be strictly scoped to map directly to zelph’s native query capabilities:

Basic Graph Patterns (BGP): Sets of triple patterns (logical AND), mapping to zelph's multi-condition queries.

Transitive Property Paths: Support for the one-or-more operator (e.g., wdt:P279+), mapping to the new transitive closure engine.

Projections: Variable selection (SELECT ?x ?y).<br>(Note: Complex operators like FILTER, OPTIONAL, or Aggregations are explicitly out of scope for this iteration to ensure performance and feasibility.)

  • Basic Graph Patterns (BGP): Sets of triple patterns (logical AND), mapping to zelph's multi-condition queries.
  • Transitive Property Paths: Support for the one-or-more operator (e.g., wdt:P279+), mapping to the new transitive closure engine.
Integrations or dependencies
[edit]

The project depends on the Wikidata JSON dump and the Janet language library (embedded). Output will continue to be integrated into the zelph CLI and generated reports.

5. What is the expected impact of your project, and how will you measure success?

Include the following points in your answer:

  • Milestones and progress tracking
  • Project impact and success metrics
Project impact
[edit]

This project transforms zelph from a "direct contradiction finder" into a deep semantic analysis tool.

  1. Autonomous Deep Detection: By combining qualifier import (to identify disjoint classes dynamically) and transitive closures (to find deep violations), zelph will be able to replicate the deep analysis logic that identified 14,480 disjointness violations in academic research (Doğan/Patel-Schneider). These rely on deep hierarchy checks currently impossible with standard shallow parsing.
  2. Expanded Constraint Coverage: Importing qualifiers allows zelph to enforce context-dependent structural constraints. An example is the Property Scope Constraint (Q53869507), which dictates whether a property allows usage as a main value or only as a qualifier. This is currently impossible to check but becomes feasible with this update.
  3. Accessibility: The SPARQL-subset integration allows the existing community to use zelph without learning a new query paradigm from scratch.
Success Metrics
[edit]
  • Milestone 1: Successful import of Wikidata Qualifiers and demonstration of (a) a structural Property Constraint rule (e.g., Property Scope) that depends on them and (b) querying disjointness definitions (P2738) directly from the graph.
  • Milestone 2: A working REPL where the defined subset of SPARQL (Basic Graph Patterns and Transitive Paths) returns correct results from the zelph network via the Janet integration.
  • Milestone 3: Successful detection of Disjointness Violations using transitive closure syntax (P279+) on the full Wikidata dataset.
  • Validation: Positive evaluation of the new capabilities by the funded "Semantic Logic &amp; SPARQL Consultant" across defined milestones.
6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?

Include the following points in your answer:

  • Project demand and target audience description
  • Links to interaction(s) with Wikimedia community
  • Evidence from community consultation such as the [Community Wishlist]
Target audience
[edit]

The audience includes advanced Wikidata maintainers, the Wikidata Ontology Cleaning Task Force, and the Mereology Task Force. These groups require tools that go beyond timeouts of public SPARQL endpoints to analyze deep structural issues.

Confirmation of demand & Engagement:Demand is confirmed through my active, weekly participation in the specialized Task Forces.

  • The need for Disjointness Violation detection was explicitly highlighted by research presented in the Ontology Cleaning Task Force. My previous grant report confirmed that without transitive closures, zelph cannot detect these specific deep-hierarchy violations.
  • The limitation of missing qualifiers was identified as the primary blocker for implementing the remaining Property Constraints during my previous grant work.
  • Feedback during my video presentation to the Task Force highlighted the need for clearer query syntax and deeper reasoning capabilities.
[edit]
  • Ontology Cleaning Task Force:[1] (Regular participation).
  • Mereology Task Force:[2] (Regular participation).
  • Video Presentation:[3] (My talk presented to the Task Force regarding zelph's current status and future needs).
  • Project Context:[4] (Report highlighting the pivot to contradiction detection).
  • Identified Gap:[5] (My analysis proving the need for transitive closure support).
7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?

Include the following points in your answer:

  • The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
  • How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks
Security and Privacy Expertise
[edit]

As a senior software engineer with 20+ years of experience, I follow standard secure development lifecycles. Since zelph operates offline on public data dumps, the attack surface is minimal.

Mitigation
[edit]
  • No PII: The system processes only public Wikidata JSON dumps.
  • Sandboxing: The new Janet integration is an embedded scripting environment. It will be configured to run within the zelph process context without exposing unsafe system calls to the external interface.
  • Static Output: Reports are static HTML/Markdown, eliminating XSS/SQLi vectors on the client side.
8. Who is on your team, and what is your experience?

Include the following points in your answer:

  • Your experience as a developer, relevant past projects
  • Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
  • Other team members, their roles and expertise

Stefan Zipproth (Acrion-dev):

  • Role: Lead Developer &amp; Maintainer.
  • Experience: I am the creator of zelph. In the previous Rapid Fund grant, I successfully optimized the engine to process the 1.7 TB Wikidata dump on a single machine (reducing RAM usage to 210 GB via custom data structures) and accelerated the reasoning engine by factor 1000x.
  • Handles: GitHub: acrion/zelph, Wikidata: User:Acrion-dev.

Semantic Logic & SPARQL Consultant (To be hired):

  • Role: Domain Expert. This role will verify that the SPARQL subset implementation aligns with standard expectations and that the Disjointness detection logic matches ontological requirements.
  • Plan: I have allocated 20% of the budget to hire a qualified expert from the community (recruitment supported by contacts like Ege Atacan Doğan). Note: This will be a paid contributor distinct from Task Force members who have conflicting funding
9. How will the project be maintained long-term?

Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.

Maintainer: Stefan Zipproth.

Plan: zelph is a long-term personal open-source project. The core C++ engine is designed for stability. The integration of Janet actually reduces long-term maintenance burden by moving complex parsing logic from rigid C++ into flexible scripts.Expenses: No long-term expenses. Infrastructure is self-hosted.

10. Under what license will your code be released, and how will you ensure the product is well documented?

Include the following points in your answer:

  • Code license and compatibility with Wikimedia projects
  • Documentation plan
License
[edit]

AGPL v3 or later (Compatible with Wikimedia).

Documentation
[edit]
  • Technical documentation on GitHub.
  • User tutorials on zelph.org.
  • Specific new features (Janet API, Transitive Syntax) will be documented with examples on zelph.org, [6] and the WikiProject page.
11. Will your project depend on or contribute to third-party tools or services?

The project will integrate the Janet programming language (MIT Licensed) as an embedded library. It continues to rely on Wikidata JSON dumps.

12. Is there anything else you’d like to share about your project? (optional)

This proposal is a direct follow-up to the successful Beta release of zelph. While the first grant proved that we can hold Wikidata in RAM and find direct contradictions (like Split Order classes), this second phase is about depth. By implementing transitive reasoning and qualifier support, we enable zelph to detect the "hard" problems - logical violations hidden deep in the ontology hierarchy - that timeouts prevent standard SPARQL endpoints from finding. The budget includes a paid community role to ensure these advanced features strictly meet the needs of power users.

Budget

[edit]
13. Upload your budget for this proposal or indicate the link to it. (required)

https://docs.google.com/spreadsheets/d/1uEu8e_xnEZgrmafn6d21SzYRkIkdOOtL0kscJjb8-W8/edit?usp=sharing


14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

3796 CHF

16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

4932.26 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback

[edit]

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse


This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.