Jump to content

Talk:Abstract Wikipedia

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 9 days ago by Hogü-456 in topic Integrating Abstract Wikipedia

Request for Architectural Validation: The "Hybrid" Solution to the LLM/Constructor Deadlock

[edit]

@Denny @ATsay-WMF

I am creating a new topic to ensure this technical proposal is not buried in the previous thread during the holiday break.

The Context: In the discussion above, Denny correctly identified that pure LLMs fail on low-resource languages due to hallucinations. However, the alternative—manual constructor writing—does not scale.

The Solution (Abstract Wiki Architect v2.1): I have spent the last month of intense development (backed by a year of specialized training) building a working Neuro-Symbolic Engine that resolves this deadlock.

  • It uses AI Agents to automate the coding (solving the speed issue).
  • It uses Grammatical Framework (GF) to enforce strict morphology (solving the hallucination issue).
  • It implements Weighted Topology (Udiron) to correctly linearize Tier 3 languages like Zulu.

My Request: I am a developer with limited resources who has dedicated significant personal time to solving this engineering hurdle for the community. I am not just offering code; I am asking for a professional review.

I specifically ask that you validate this architecture. Does a Hybrid Neuro-Symbolic engine running Ninai protocols align with the Foundation's roadmap?

If there is no interest, or no will to validate my work, I will have to accept that the Foundation is not the right home for this technology. I will then look to apply this knowledge in the private sector to recover my investment, though my strong preference remains to donate this solution to Abstract Wikipedia.

I await your technical feedback. Réjean McCormick (talk) 16:06, 1 January 2026 (UTC)Reply

Headline: Technical Addendum: Architecture & Capabilities (v2.1)
To facilitate the validation process, here is the specific technical breakdown of the Abstract Wiki Architect engine and why it solves the current roadmap blockers.
1. The "Hybrid Factory" Architecture
The engine does not rely on a single method. It uses a Four-Layer Hexagonal Architecture to separate Logic, Data, and Presentation:
  • Layer A (Lexicon): Usage-based sharding mapped to Wikidata QIDs. It supports lazy-loading of massive dictionaries (380k+ words) to solve the "Cold Start" problem.
  • Layer B (Grammar Matrix): This is the core innovation. It uses a Dual-Tier Strategy:
    • Tier 1 (High Resource): Uses the official GF Resource Grammar Library (RGL) for languages like English/French to guarantee perfect morphology (cases, genders).
    • Tier 3 (Low Resource): Uses Weighted Topology (adapted from the Udiron project) to automate linearization for languages like Zulu or Hausa without needing handcrafted grammars.
2. Native Protocol Support (Ninai & Z-Objects)
Unlike generic LLM wrappers, this engine is built specifically for the Abstract Wikipedia ecosystem:
  • The Ninai Bridge: I implemented a Recursive JSON Object Walker (not brittle regex) that natively parses ninai.constructors.* trees. It translates Z-Object intent directly into Abstract Syntax Trees.
  • Construction-Time Tagging: Since the engine builds the sentence rather than just predicting tokens, it automatically outputs Universal Dependencies (CoNLL-U) tags. This allows us to mathematically verify the output against standard Treebanks, solving the evaluation crisis.
3. The "Self-Healing" Agentic Pipeline
To solve the "speed vs. accuracy" trade-off, I deployed three specialized AI Agents:
  • The Architect: Generates the raw .gf source code for new languages from scratch.
  • The Surgeon: Reads compiler error logs and automatically patches broken grammar files in a loop.
  • The Judge: Performs QA by comparing generated output against a "Gold Standard" dataset and auto-files issues for regression.
Why this is powerful:
This architecture allows us to onboard a new language in minutes (via the Architect Agent) while maintaining the mathematical guarantees of the Grammatical Framework. It creates a "Hallucination Firewall"—if the AI generates invalid logic, the GF compiler rejects it before it reaches the user.
I am ready to demo this pipeline or walk the engineering team through the codebase. Réjean McCormick (talk) 16:09, 1 January 2026 (UTC)Reply
So Architect is built with AI, and use AI for those task:
Fixing the Grammatical Framework data which is still draft (Some are perfect, but some are under development)
Generating reference "good answers" so Architect can be tested, validated by comparing, but this is only for testing phase
Otherway, I did my best to gather every ressources in a comprehensive system, Architect. There, effort from Grammatical Framework, Udiron, Ninai have been orchestrated together.
I'm resolving remaining bugs real fast, always seeking optimal architecture, best practices. It's not patched, it's straight. It's fairly out of the scale of what could have been expected, but here it is, Architect.
I'm not a linguist or a coder, but a conceptor, a debugger, and some other things on the side.
I can't wait to show the result. You'll hear it, for sure ;) Réjean McCormick (talk) 00:05, 4 January 2026 (UTC)Reply
Architect doesn't use AI to generate text. It's deterministic. Grammatical Framework are grammars turned into code. Architect use it. There's quite a lot in Architect, it's new for me also, so I'm getting the hang of navigating it. Réjean McCormick (talk) 02:37, 4 January 2026 (UTC)Reply
Hi Prototyperspective, I appreciate the detailed critique. I will collapse the previous technical details to keep the visual flow of your thread clean, but the points raised there are essential to this engineering challenge.
To directly address your two remaining concerns regarding the Architect solution:
1. On User Complexity (Editing Rules): There is a misunderstanding of the workflow. In the Architect model, the end-user never edits the "super abstract grammatical rules." The User edits the Data/Intent (simple JSON or a visual form). The Agent (Architect) maintains the Grammar (the complex GF code). The user is the driver; the AI is the mechanic. This shields the user entirely from the complexity you are worried about.
2. On AI Errors (The "Factory" Risk): You are right that AI can make errors in grammar generation. However, this is exactly why the Compiler (C-Runtime) exists. If an MT model hallucinates, it produces plausible but false text (hard to detect). If the Architect Agent hallucinates bad grammar, it produces invalid code, which the compiler rejects immediately (easy to detect). The system fails safely at build time, rather than failing silently at read time.
We definitely agree on the core philosophy: "Fix once, scale everywhere." The Architect simply ensures that what we are scaling is mathematically verifiable structure, not probabilistic translation.
Réjean McCormick (talk) 20:25, 5 January 2026 (UTC)Reply
I suggest you to better go to the private sector with the idea. The chances it will be used here for Abstract Wikipedia at least in the near and mid-term future are low from my point of view. It seems to me like it contains a lot of buzz words and I have not already understood how it is possible to find the grammar rules, especially in languages with a small corpus of available digitized texts. Validation of the output is from my point of view very important and it is necessary to have a human how understands the generated code and is validating the output. Using an agent for generation is possible if the result will be validated. For the Wikimedia project Abstract Wikipedia it is from my point of view very important to have a technology for generating the text what is easy to understand. If the rules are too complex it can be difficult to have enough people who understand it. At least I think from now for the next three years content in Abstract Wikipedia will be simple sentences and so there is no need of a complex technology for generating it. Hogü-456 (talk) 19:53, 19 January 2026 (UTC)Reply
If I have any conclusion or opinion on this architect development then it's that I suggest you to better go to the private sector with the idea is a very bad idea and Wikimedia contributors interested in making Abstract Wikipedia a success better look into it and not ignore the work and input by this user which seems to be about a few of the core weaknesses of the AW approach. I'm personally not so interested in AW to be successful since I think a viable alternative approach exists that I'm interested in. However, again without clear simple real-world demonstrations, it's not so likely to get much feedback. For example, if technology for generating the text what[that] is easy to understand can't achieve lots of comprehensive articles then this proposed criteria simply is unfit if AW is to become a success in terms of usefulness and use.
Basically, I would rather conclude that maybe this is something for a nonwikimedia free knowledge-related nonprofit organization (particularly one that is more interested in / open to innovation, development and use of technologies for good) and/or that it needs more developments so that concrete illustrating demonstrations of this become possible. Prototyperspective (talk) 22:31, 19 January 2026 (UTC)Reply
Well, this aint buzz words, this is specific language for a complex architecture.
I'm a bit stubborn, so I still wish to make it a shared ressource. But It seems I can't find motivation or something here. I'll keep it common, but as you pointed out, here might not be the place, even tough the call for it comes from here! Anyway, I don't rely on the community here, I still haven't been able to really discuss with those who made the call, even after my huge contribution. So, I wont go to private sector, but I might just finish it on my side, then share it elsewhere. In the end, I only stumbled here because I understood the power of semantic. I did not developped Architect following Abstract Wiki call, it simply intersect with what I do. I hoped to find a community to develop with, merge effort, but I guess I don't quite get it. Anyway, I did read your invitation to go away.
Still, it looks like you wait for me to say "It all work, job done", instead of finishing building it as a team. *sight*
Response to concerns (buzzwords / low-resource grammars / validation / “simple sentences first”)
:::* This is not “grammar discovery from corpora”.
    • Nothing here depends on “learning grammar rules from large text corpora”.
    • The core is rule-based generation: given a structured input (a frame / function call), a grammar linearizes it into text.
    • For low-resource languages, the *minimum viable mode* is explicitly simple: small slot-templates (e.g. Bio-like “X is a Y from Z”) plus a tiny lexicon. No big corpus required.
  • Low-resource languages: start with explicit defaults, then improve incrementally.
    • The initial goal is not perfect linguistic coverage; it is usable, predictable starter sentences.
    • “Tier-3 / safe-mode” is intentionally boring: deterministic word-order defaults + slot filling, so communities can get correct/simple outputs early, then refine grammar/lexicon over time.
  • Validation and human understanding are mandatory, not optional.
    • Generated text is only useful if it is reviewable. The approach is the same as normal software:
      • human-readable source rules (grammar code + lexicon files)
      • regression tests / “gold outputs” that show diffs when anything changes
      • human review before enabling broader use
    • If any “agent” is used, it is only to scaffold boilerplate faster; it does not remove the need for review. Output remains explicit code and testable results.
  • Complexity control: keep things easy-to-understand by design.
    • The public interface stays simple (frames with named fields). Contributors can work at the “template” level without understanding the whole grammar stack.
    • The internal grammar stack is layered so newcomers can contribute safely:
      • add missing lexicon entries (names/terms)
      • add/adjust one simple template realization
      • expand to richer grammar rules only when there is community capacity
  • Near-term scope (next ~3 years): yes, simple sentences first.
    • I agree that early Abstract Wikipedia will mostly be simple declarative sentences.
    • That is exactly what the current frame/template approach targets: starter sentences that are easy to generate, easy to debug, and easy to review.
    • Longer-term “full articles” is a separate orchestration problem (planning + ordering + grouping many small sentences). That can be added later as an *optional layer*, without making the sentence generator itself complex.
  • About “private sector”: this is precisely the type of tooling Wikimedia needs to keep control.
    • If we want AW to be sustainable, the generation logic must remain transparent, editable, and testable by the community.
    • Keeping the rules in plain files + tests + review workflows is the opposite of a black-box vendor dependency.
  • Why demos matter (and what “success” looks like here).
    • The realistic measure of progress is not big promises, but small repeatable demos:
      • generate 1–3 correct starter sentences for many entities and languages
      • show deterministic rebuilds (same input → same output)
      • show that regressions are caught by tests
    • This is the “boring but scalable” path: small sentences now, richer structures later.
Réjean McCormick (talk) 15:46, 21 January 2026 (UTC)Reply
So, just to be clear, you do suggest me to go with Musk and help him build his private encyclopedia? @Hogü-456 @Prototyperspective
About buzz words, this is not texts I composed. It's made with AI. AI don't throw any words just to make a show! And I'm not telling AI what to say, I ask AI (ChatGPT or Gemini) to analyse and answer. So you can fairly assume there's no bluff, no empty words. Réjean McCormick (talk) 18:34, 21 January 2026 (UTC)Reply
@Réjean McCormick Hello and thanks for reaching out. We discussed your proposal at length in the last couple of weeks internally, and we would like to ask you to give us a demonstration of your project, to ask you more questions about it. Are you ok with that? If yes, I'll contact you privately with a number of potential dates and times to have the demo. Thank you in advance! Sannita (WMF) (talk) 15:03, 20 January 2026 (UTC)Reply
I'm good with it! Please first contact me by email, so we validate the most relevant communication methods. I suggest I provide this: screenshots of the interfaces, updated status (what exactly is built, what remains to adjust or fix, or build). Rest assured, it's so advanced and so well built that from this new milestone we can foresee the end result.
I'm very proactive, so I think you might send me questions right now, privately or public on Architect Wiki, so you can get accurate information about Architect and my activities. I understand my other activities can raise concern. Rest assured that I'm fully willing to adapt to Wikimedia community framework, given minimal adaptation time, so your scope is respected.
Also, notice that given the power of the machine I'm building (did you get the overview?), I'm slowing down development because I believe ethical safeguards are relevant at this point, as well as a coordination structure (human networking). This is one reason I'm happy you are answering my call, so we can asses the situation and move forward with renewed knowledge, safely for all.
So, I can provide reports with tech details, or stay more on the conceptual level for the non-techies. Well, I can even make a song about it, to get the feeling. Let's go! :D Réjean McCormick (talk) 16:09, 20 January 2026 (UTC)Reply
Hi Denny / Hi everyone,
I wanted to share a brief update regarding the progress of Architect.
The prototypes have shown very promising results regarding deterministic content generation and "Safety-by-Design" validation—key challenges I am working to solve for the ecosystem.
To ensure this momentum continues without relying on immediate internal bandwidth, I am moving to professionalize the development of these tools.
I will be submitting funding proposals to both external sources (such as NLnet) and the relevant Wikimedia technical grant channels.
My goal is to secure the resources needed to maintain this infrastructure as a reliable open-source utility for the community.
I will continue providing updates on this channel if I receive confirmation that the project's original call for a semantic engine is still active, and if I can find proper guidance within the community.
After two months of building and attempting to gather support—and having to discover these funding pathways independently—I need to verify that there is actual alignment here before investing further effort in communication.
Best regards,
Réjean McCormick
@Denny Réjean McCormick (talk) 14:51, 25 January 2026 (UTC)Reply
Hello,
I haven't had any hand extended or any show of interest beyond a polite "we'll look into it".
So I guess it's ok to publish the figures, in the eventuality some persons are interested in finishing Abstract Wiki using Architect. Architect is way beyond current state of Abstract Wiki.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Tools/abstract-wiki-architect
I hope the megabyte I took to store those figures on your server doesn't bother you, I can also just go away like I've been invited to, by member of this communitym, on this page. Still, I guess somehow some persons might still be interested by moving Abstract Wiki toward a completion stage, so my invitations remains open.
You must understand the resistance: I've been moving real fast, so it looks like leaders here are stunted or puzzled, I don't know, they don't speak to me. Architect and what surrounds it is definitely gigantic, so they need time to process. I'm left hanging in the meanwhile, but I don't give up, I'm just leaving Architect as-is, so people still can have the opportunity to build a part of it, a part of history of semantic web. There's not much left to do, hurry if you want your name on the machine ;)
@Sannita (WMF) Réjean McCormick (talk) 14:18, 28 January 2026 (UTC)Reply

Boilerplate templates

[edit]

As I wrote before I think at the beginning most sentences in Abstract Wikipedia will be very simple. Such simple sentences can be generated using templates with positions for variables. As far as I know such a thing is also called boilerplate template. From my point of view it is important to have such a option available in Abstract Wikipedia at its launch. A good example of a collection with possible sentences is User:Dnshitobu/Dagbani_Fragments. I wish a way what it makes it easy to generate such sentences at the launch of Abstract Wikipedia. It is a monolingual approach what requires mapping Wikidata statements to boilerplate templates for a specific language. Is there so far a plan to offer such a solution for Abstract Wikipedia at its launch. From my point of view it is maybe possible to derive more complex rules from the simple templates to generate advanced sentences. Hogü-456 (talk) 20:02, 19 January 2026 (UTC)Reply

Architect already supports the “simple first sentence” approach using frame-based templates. A frame like BioFrame is essentially a boilerplate template with placeholders, and each language’s concrete grammar is the template that realizes it correctly in that language. This is designed to be accessible: people can generate these sentences through the UI, or by calling the REST endpoint (e.g. POST /api/v1/generate/{lang_code}) with a JSON frame. What’s still missing compared to your Dagbani fragments idea is a large, language-specific library that maps many Wikidata properties into many different sentence-frame templates automatically. Réjean McCormick (talk) 20:40, 19 January 2026 (UTC)Reply
Can you please send me an example of such a frame like BioFrame. Hogü-456 (talk) 20:47, 20 January 2026 (UTC)Reply
Detailed Explanation
:::Here are the details and examples for BioFrame, the primary semantic structure used for generating biographical sentences.
1. What is the BioFrame?
The BioFrame is a strict, flat JSON object used in the "Strict Path" of the API. It represents the semantic intent to generate a biographical sentence (e.g., "Marie Curie is a French physicist"). In v2.1, it is defined as a Pydantic model in the Core Domain layer.
2. JSON Structure & Fields
The schema is strictly flat (no nested intent objects).
Field
Type
Required
Description
frame_type
Literal["bio"]
Yes
Discriminator field. Must be exactly "bio".
name
str
Yes
The subject's proper name (e.g., "Alan Turing").
profession
str
Yes
Lookup key for the Lexicon (e.g., "computer_scientist").
nationality
str
No
Lookup key for the Lexicon (e.g., "british").
gender
str
No
"m", "f", or n (Critical for correct inflection).
context_id
UUID
No
Used by the Discourse Planner to link sentences (enables "She" vs. "Marie").
3. Usage Examples
A. Standard Request (Perfect Data)
This generates a full sentence using the mkBioFull grammar function.
POST /api/v1/generate/en
JSON
{
"frame_type": "bio",
"name": "Alan Turing",
"profession": "computer_scientist",
"nationality": "british",
"gender": "m"
}

Output: "Alan Turing is a British computer scientist."

B. Partial Request (Missing Nationality)
The v2.1 architecture handles missing data via Overloading. If nationality is omitted, the engine automatically selects mkBioProf instead of failing.
POST /api/v1/generate/fr
JSON
{
"frame_type": "bio",
"name": "Marie Curie",
"profession": "physicist",
"gender": "f"
}

Output: "Marie Curie est une physicienne." (Note the gender agreement)

C. Context-Aware Request (Pronominalization)
If a context_id is provided and matches a previous session where "Marie Curie" was the focus, the system replaces the name with a pronoun.
JSON
{
"frame_type": "bio",
"name": "Marie Curie",
"profession": "chemist",
"context_id": "session_123"
}

Output: "She is a chemist."

4. How it Maps to the Engine (The "Triangle of Doom")
To generate text, the BioFrame travels through three layers defined in the Schema Alignment Protocol1:
  1. Input (API): The user sends the JSON BioFrame.
  2. Logic (Adapter): The GrammarEngine maps the JSON fields to Abstract Grammar functions2:
    • All fields present: Maps to mkBioFull : Entity -> Profession -> Nationality -> Statement
    • Missing Nationality: Maps to mkBioProf : Entity -> Profession -> Statement
  3. Render (GF):
    • Tier 1 (High Road): Uses RGL macros (e.g., mkS (mkCl s (mkVP n p))) for perfect grammar3.
    • Tier 3 (Factory): Uses Weighted Topology (e.g., sorting Subject, Verb, Object weights) to glue strings together4.
5. Ninai Compatibility
While BioFrame is the internal format, the system also accepts Ninai (Abstract Wikipedia) trees and flattens them into a BioFrame automatically via the NinaiAdapter5.
Ninai Input:
JSON
{
"function": "ninai.constructors.Statement",
"args": [
{ "function": "ninai.types.Bio" },
{ "function": "ninai.constructors.Entity", "args": ["Q7186"] }, // Marie Curie
{ "function": "ninai.constructors.Entity", "args": ["Q169470"] } // Physicist
]
}
Internal Conversion:
The adapter extracts the QIDs, looks them up in the Lexicon (e.g., Q169470 -> "physicist"), and constructs the flat BioFrame.

Réjean McCormick (talk) 22:27, 20 January 2026 (UTC)Reply

about the last complexity layer, you can see:
Adding the last complexity layer with SwarmCraft
https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Tools/abstract-wiki-architect#Proposal:_SwarmCraft_as_an_Article-Orchestration_Layer_above_Abstract_Wiki_Architect Réjean McCormick (talk) 15:48, 21 January 2026 (UTC)Reply

Integrating Abstract Wikipedia

[edit]

There are discussions about Integrating Abstract Wikipedia happening in Wikifunctions after a new Status update has been published. If you are interested you can discuss and also write down your thoughts about the place for this discussion. From my point of view discussions about Abstract Wikipedia should happen in Abstract Wikipedia. So I hope it will exist as an own Wiki soon. Hogü-456 (talk) 16:07, 1 February 2026 (UTC)Reply