Jump to content

Talk:Abstract Wikipedia/Archive 4

From Meta, a Wikimedia project coordination wiki
Latest comment: 1 month ago by Hogü-456 in topic Text to Statements
Re: Abstract Wikipedia/Updates/2022-05-05

I read the status update and it is an important question. From my point of view functions are interesting and if I tell people what Wikifunctions is about. I tell that it is a collection of rules, that are used to generate texts from an abstract notation in different languages. The collection includes also other functions that can be used also outside of that for other topics. There the example of computing a volume of a pyramide for example is an interesting description and helps to understand it better. I think it is possible to find a sentence and include based on that sentence the included computions. I think it is important to tell for what functions for generating text are needed and what can be done based on decision tables and predefinition of the words and the forms of the words in a database. This is something where I am sometimes not sure if it is correct when I say that the functions for generating the text are located in Wikifunctions. At the begin I think a more detailed information is more interesting at a technical view. In Germany there are some Podcasts related to the ChaosComputerClub that can be interesting to talk to someone and ask them if they are interested in doing an Episode about Wikifunctions. As far as I know Wikipedia was also promoted at an early phase at the ChaosCommunicationCongress.--Hogü-456 (talk) 20:55, 9 May 2022 (UTC)

Wikimania Hackathon 2022

Will the development team of the Wikimedia Foundation for Wikifunctions attend at the Wikimania or the Hackathon that happens through Wikimania 2022. I am interested in meeting some of the people who work at the Wikimedia Foundation in that team. During the Hackathon my plan is to work further at the conversion of Spreadsheet functions into code. I am interested in creating Graphical User Interfaces for programs in programming language R and in finding ways to create web interfaces and how to connect it with the program in the programming language of choice in the background. Currently I can write programs and can create user interfaces based on Web pages but do not know how to connect these two things. Do you have experience with Graphical User Interfaces for programming language R or how to create a web application that transfers data to the server where data is processed using the programming language of choice and then the data is delivered back. From my point of view enabling people creating user interfaces and connecting these with the code in the background is a big challenge and something that is important for me and useful for Wikifunctions and the users of it. Hogü-456 (talk) 19:37, 20 July 2022 (UTC)

@Hogü-456 Yes! We just had our session proposal accepted. There will also be another related session by Mahir256. Some of the developers will also be around during the Hackathon (depending on timezones, of course).
I'm not sure who is familiar with R, but I'd suggest writing some specific details about your plan/idea/questions for further discussion and detailing (perhaps in a user-subpage to start with, which could then be transferred to phabricator once it's clearer). Hope that helps. Quiddity (WMF) (talk) 21:33, 22 July 2022 (UTC)

FAQ

Abstract Wikipedia/FAQ seems to be outdated: We plan to have this discussion in early 2022. --Ameisenigel (talk) 19:04, 21 August 2022 (UTC)

@Ameisenigel Thanks, I've changed that item to "late 2022" for now, and will take a closer look at the rest of the page later. Quiddity (WMF) (talk) 18:26, 22 August 2022 (UTC)
Thanks! --Ameisenigel (talk) 18:36, 22 August 2022 (UTC)

When will the Code of Conduct be drafted?

I noticed that there is no Code of Conduct yet for Wikifunctions. However, the beta is already out, and editors are starting to come in. Could somebody give details on what a Code of Conduct could look like, and when it would be released? 2601:647:5800:1A1F:CCA8:DCA6:63BA:A30A 01:00, 2 September 2022 (UTC)

There will be discussion about this before launch. We are already planning for it. -- DVrandecic (WMF) (talk) 19:42, 24 October 2022 (UTC)
Please see Abstract Wikipedia/Updates/2022-11-17 for a newsletter post on this topic, and request for input and ideas. Thanks! Quiddity (WMF) (talk) 02:35, 18 November 2022 (UTC)

Translation accuracy?

"In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A particular language Wikipedia can translate this language-independent article into its language. Code does the translation" -> this sounds like machine translation to me. How do we make sure that the translation is 100% accurate? It's impossible for the machine translation to be always correct. X -> machine translation -> Y. X & Y are 2 different languages. Depending on which languages they are, the accuracy could be as low as 50%. Nguyentrongphu (talk) 00:59, 12 November 2022 (UTC)

Hi @Nguyentrongphu. There are many slow ongoing discussions about how the system could/should work. In a nutshell, it will not be using a plain machine-translation system; instead, there will be some kind of system(s) for editors to write "abstract sentences", that use (rely upon) the structured data in Wikidata's Lexemes and Items, to create properly localized sentences. A recent overview of a few aspects, including comparisons to some existing Wikimedia tools, is in Abstract Wikipedia/Updates/2022-06-07. Following the links from that page will lead to many more details and discussions. I hope that helps! Quiddity (WMF) (talk) 19:46, 14 November 2022 (UTC)
The first approach is basically automatic translation using Wikidata items. The end results are almost identical to a typical machine translation.
The second approach looks to me like a version of machine translation with some tweakings: machine translation + some tweakings by humans + a lot of sentence simplification. Even then, it's still flawed in some ways. If translation can be done automatically correctly, the world wouldn't need translator or interpreter anymore. Human tweaking process is labor intensive though. Based on what I read, it's done manually sentence by sentence. Instead of tweaking the function, one can just use that time to just translate the article manually (probably faster), and the article would sound more natural (grammatically) and more correct (quality of translation). Sadly, I don't see any utility in this approach unless AI (artificial intelligence) becomes much more advanced in the future (20 more years, perhaps?).
If understanding the gist of an article is all one needs, Google translation is doing just fine (for that purpose) with 6.5 million articles available in English Wikipedia to "read" in any language. If the articles are 100% machine translated to another Wikipedia language -> they would be deleted. Nguyentrongphu (talk) 23:52, 14 November 2022 (UTC)
Re: Google/machine translation - Unfortunately, those systems only work for some languages (whichever ones a company decides are important enough, versus the 300+ that are supported in Wikidata/Wikimedia), and as you noted above, the results are very inconsistent, with some results being incomprehensible. We can't/won't/don't want to use machine translation, for all the reasons you've described, and more.
Re: "one can just use that time to just translate the article manually" - One benefit of the human-tweaked template-style abstract-sentences, and one reason why it is best if they are simple sentences, is that they can then potentially be re-used in many articles/ways.
E.g. Instead of having a bot that creates thousands of stub articles about [species / villages / asteroids / etc], as occurred at some wikis in the years past (e.g. one example of many) (and some of which have since been mass-deleted, partially because they were falling badly out of date), we can instead have basic info automatically available and updated from a coordinated place (like some projects do with Wikidata-powered Infoboxes). And instead of having to constantly check if a new fact exists for each of those thousands of articles in hundreds of languages (such as a newer population-count for a village, or endangered-classification for a species), it could be "shown if available".
As an over-simplified example: An article-stub about a species of animal could start with just the common-name and scientific-name (if that is all that is available). But then it could automatically add a (human-tweaked/maintained) sentence about "parent taxon", or "distribution", or "wingspan" or "average lifespan" when that info is added to Wikidata for that species. Or even automatically add a "distribution map" to the article, if that information becomes available (e.g. d:Q2636280#P8485) and if the community decides to set it up that way.
I.e. the system can multiply the usefulness of a single-sentence (to potentially be used within many articles in a language), and also multiply the usefulness of individual facts in Wikidata (to many languages).
It also provides a starting point for a manually-made local article, and so helps to overcome the "fear of a blank page" that many new and semi-experienced editors have (similarly to the way that ArticlePlaceholder is intended to work, e.g nn:Special:AboutTopic/Q845189). I.e. Abstract Wikipedia content is not intended as the final perfect state for an article, but rather to help fill in the massive gaps of "no article at all" until some people decide to write detailed custom information in their own language.
If you're interested in more technical details (linguistic and programming), you might like to see Abstract Wikipedia/Updates/2021-09-03 and Abstract Wikipedia/Updates/2022-08-19.
I hope that helps, and apologize for the length! (It's always difficult to balance/guess at everyone's different desires for conciseness vs detail). Quiddity (WMF) (talk) 03:51, 15 November 2022 (UTC)
Thank you! I like your very detailed answer. I think I understand everything now. Abstract Wikipedia is basically an enhanced version of machine translation (plus human tweaking) with the ultimate goal of creating millions of stubs in less developed Wikipedias. While it certainly has its own merits, I'm not so sure if the benefits outweigh the cost (a lot of money + years of efforts invested into it). First, good quality articles can't be composed of just simple sentences. Second, creating millions of stubs is a good seeding event, but bots can do the job just fine (admittedly, one has to check for new information once in a while; once every 5 years is fine). Plus, machine translation can also be fine tuned to focus on creating comprehensible stubs, and that has been done already. Third, it's true that Google translation does not include all languages, but it contains enough to serve 99.99% of the world population. Fourth, any information one can gain from a stub, one can also get from reading Goole translation on English Wikipedia. Stubs are not useful except for being a seeding event. Again, that job has been done by bots for many Wikipedias for more than 10 years. Sadly, with the current utility of Abstract Wikipedia, one can't help to feel that this is a wasteful venture. Money and efforts can be better spent elsewhere to get us closer to "the sum of all human knowledge". I don't know the solution myself, but this is unlikely the solution we've been looking for. Nguyentrongphu (talk) 22:30, 17 November 2022 (UTC)
@Nguyentrongphu Thanks, I'm glad the details were appreciated! A few responses/clarifications:
Re: stubs - The abstract articles will be able to go far beyond stubs. Long and highly detailed articles could be created, with enough sentences. And then when someone adds a new abstract-sentence to an abstract-article, it will immediately be available in all the languages if/when they have localized the elements in that sentence's structure. -- I.e. Following on from my example above: Most species stubs start off with absolutely minimal info (e.g. w:Abablemma bilineata), but if there was an abstracted sentence for "The [animal] has a wingspan of [x] to [y] mm." (taken from w:Aglais_io#Characteristics), they could then add it to the Abstract Wikipedia article for "Abablemma bilineata", and the numerical facts into Wikidata (via d:Property:P2050), and suddenly the articles in hundreds of languages are improved at once!
Re: bots - Bots are good at simple page creations, or adding content to rigid structures, but not so good at updating existing pages with new details in specific ways, because us messy and inconsistent humans have often edited the pages to change things around.
Re: machine-translation and Enwiki - The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias, which don't have machine-translation support. It also excludes monolingual speakers from contributing to a shared resource. And they have to know the English (etc) name for a thing in order to even find the English (etc) article. -- E.g. the article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades, with our current system. See for example this image.
Re: "good quality articles can't be composed of just simple sentences" - I agree it probably won't be "brilliant prose" (as Enwiki used to refer to the Featured Article system (w:WP:BrilliantProse)), but simple sentences can still contain any information, and that is vastly better than nothing.
I hope that helps to expand how you see it all, and resolve at least some of your concerns. :) Quiddity (WMF) (talk) 00:00, 18 November 2022 (UTC)
"It also provides a starting point for a manually-made local article, and so helps to overcome the 'fear of a blank page'" + "far beyond stubs" -> you're contradicting yourself. It can't be that far beyond stubs.
Abstract sentences can only work if all articles involved share similar basic structure. For example, species, village, asteroid or etc. For example, all species share some basic information structure, but things quickly diverge afterward (after the introduction). With this constraint in mind, it's impossible to go far beyond stubs (introduction level at best). It does sound like wishful thinking to me, which is not practical.
"Because us messy and inconsistent humans have often edited the pages to change things around" -> Abstract Wikipedia will also face this problem too. "Adding a new abstract-sentence to an abstract-article" -> what if someone has already added that manually or changed an article in some ways beforehand? It's impossible for machine to detect whether or not an abstract sentence (or a sentence with similar meaning) has been added since there are infinite different ways that someone else may have already changed an article. Plus, the adding location is also something of concern. If an article has been changed in some ways beforehand, how does machine know where to add? Adding in randomly will make the article incoherent.
Far beyond stubs + the fact that Abstract Wikipedia is only possible with simple sentences -> sounds like Abstract Wikipedia is trying to create Simple French Wikipedia, Simple German Wikipedia, Simple Chinese Wikipedia and etc (similar to Simple English Wikipedia). This is a bad idea. Nobody cares about Simple English Wikipedia; non-native English speakers don't even bother with it. This is an encyclopedia, not Dr Seuss books.
"And they have to know the English (etc) name for a thing in order to even find the English (etc) article" -> Google translation does come in handy in these situations (help them find out the English name). Again, Google translation supports enough languages to serve 99.99% (estimation) of the world population.
"The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias" -> we need more man power for this huge goal and task. Abstract Wikipedia is unlikely to solve this problem. Local knowledge is unlikely to fit the criteria to utilize abstract sentences. Local knowledge is not simply a species, village, asteroid or etc.
"The article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades" -> Google translation works so far. Local language version -> Google translation -> translate to a reader's native language. That's good enough to get the gist of an article.
I'm not talking about Featured Article system. I'm talking about this. It's impossible to even reach this level with Abstract Wikipedia. We need a human to do the work to actually achieve Good Article level.
"It also excludes monolingual speakers from contributing to a shared resource" -> this shared resource is heavily constrained by abstract sentences. The criteria to utilize abstract sentences is also quite limited. Also, each Wikipedia language needs someone (or some people) to maintain abstract sentences. Plus, building and maintaining abstract sentences requires a very intensive process (manually translating it is easier, much more efficient and sound better instead of just simple sentences). It won't make a big impact as one would hope for, not any more impact than articles created by bots. Even today, many Wikipedias still retain millions of articles created by bots as seeding events.
Abstract Wikipedia is useful only for the languages that are not supported by Google translation. Spending too much money, time + efforts to serve the 0.01% of the world population is not a good idea. I'm not saying to ignore them all together, but this is not a good, efficient solution. It ultimately comes down to benefit vs cost analysis that I mentioned earlier. There is no easy solution, but we (humanity) need to discuss a lot more and thoroughly to move forward.
P/S: this is a good scholarly debate, which is very stimulating and interesting! I like it! Nguyentrongphu (talk) 23:45, 18 November 2022 (UTC)

I've tried to connect to https://wikifunctions.beta.wmflabs.org/wiki/Wikifunctions:Main_Page but it doesn't work. Is there a new url? PAC2 (talk) 04:08, 25 July 2023 (UTC)

Hi @PAC2. It should be working fine, and currently loads for me. Perhaps it was a temporary problem? Please could you check again, and if you still see a problem, then let me know what specific error message it provides (or what the browser does and how long it takes). Please also share any other relevant details about your connection (e.g. if you use a VPN, or many browser-extensions that affect connections). Also, check one of the other beta-cluster sites, such as https://wikidata.beta.wmflabs.org/, to see if it's all of them or just one. Thanks! Quiddity (WMF) (talk) 19:59, 25 July 2023 (UTC)
It works again. Thanks for the feedback PAC2 (talk) 05:18, 26 July 2023 (UTC)

Questions/thoughts about components of WikiLambda

Hi all, first of all really excited for this project, I think this can have a lot of potential. I have some questions and thoughts that came up to me while implementing an executor for Rust, and I wanted to share them here. Here they are:

  • What is the role of the function orchestrator? What inputs does it take, and what does it output? I assume an API definition exists somewhere, but it would be better if explained in the README or somewhere on-wiki. (I'm also interested in efforts to rewrite the function orchestrator)
  • What is the input format and output format of an evaluator? This is for the evaluator components that read stdin and print to stdout, not the function evaluator API. Am I right in assuming the input should be like this?
  • What does the API of the function evaluator look like? Does it accept a Z7 object and needs to evaluate some values before proceeding or is it just responsible for dispatching evaluation based on programming language requested?
  • Since the function evaluator runs on Node.js, it would be nice if evaluator impls for languages are allowed to compile to wasm and return the value returned by wasm (i don't think wasm has introspection abilities, so it would have to be a self-contained binary that handles serialization of return values itself)

0xDeadbeef (talk) 06:18, 29 July 2023 (UTC)

What is the role of the function orchestrator? You can find an overview of the WikiLambda architecture described here, with more detail on the architecture page. The main goal of separating the orchestrator from the evaluator is: the evaluator runs user provided code and will run in a very restricted setting, whereas the orchestrator is under developer control and can do things like calling out to Websites such as Wikidata. If you are implementing your own version, there is no need to follow that architecture, you can also have just one system that doesn't split (in fact my previous implementation of a similar system was monolithic like that). -- DVrandecic (WMF) (talk) 20:28, 8 August 2023 (UTC)
WASM instead of Node. Yes, I agree, that is a very interesting path, and it is one of our potential ways forward. We are experimenting right now. If you have thoughts or experiences, they would be very welcome. Some early thoughts are on T308250, I want to write up my own experiments about this there too. -- DVrandecic (WMF) (talk) 20:31, 8 August 2023 (UTC)

Questions/thoughts on the function model

  • It looks like Z881/typed lists are currently implemented as nested tuples (or in RustSpeak, type List<T> = Option<(T, List<T>)>), according to this test. There seems to be only a built-in impl. Are there plans to make impls based on sum-types and composition of type constructors? Specifically, an Optional/Maybe monad seems helpful.
  • What is the situation for optional fields? It looks like Z14/implementations can have omitted fields. Are there any plans to formalize the model so that fields are required? (those fields can use the optional monad)
  • is it okay for my executor to reject Z881/typed lists where elements are not of the same type? I saw test cases for the javascript and python executor that pass parameters like this. It seems beneficial to enforce elements are of the same type for a List like this, for better correspondence with more strongly typed programming languages.
  • from that point, since types are first class values, we could have type constructors that take a list of types as input. This could help building n-ary tuples for example.
  • Why is Z883/typed map implemented as an inline Z4/type when passed to the function evaluator? Both Z881/typed list, Z882/typed pair are implemented as Z7/function calls. It would be cleaner to parse if Z883/typed map is also a Z7/function call.

0xDeadbeef (talk) 06:18, 29 July 2023 (UTC)

"Are there plans to make impls based on sum-types and composition of type constructors?"
yes, eventually, that is the plan to have something like Option or Maybe or Either. It might already partially work. You can try things out on the Beta. Agreed that this would be helpful in several places, just for now, we don't promise yet that it works. See also phab:T287608. Thoughts are appreciated.
"Are there any plans to formalize the model so that fields are required?"
yes. Currently, on user-defined types, all keys are required (for the internal types, we cheat a bit). We want to somehow allow this declaratively. There are a number of possibilities to make it so, e.g. by having an either, or a maybe, or something like that. See also phab:T282062. We need to fix this before we can start using types with optional keys (or functions with optional arguments). The same here, we have not yet really landed on a solution, and we are looking for a good solution that ideally vibes well with out current implementation. --DVrandecic (WMF) (talk) 21:56, 29 September 2023 (UTC)
"is it okay for my executor to reject Z881/typed lists where elements are not of the same type?"
almost. If the typed list says "typed list of Z1/Object", then this means that the elements can be anything, i.e. it is not typed at all. So ["hello", "this", "is", Z41/true] is OK if it is a typed list of Z1. For a typed list of any other type, e.g. a typed list of Z6, yes, in that case all elements have to be instances of the given type, e.g. Z6. I don't think typed lists of Z1 will be easily serializable and deserializable into native code.
"from that point, since types are first class values, we could have type constructors that take a list of types as input. This could help building n-ary tuples for example."
yes, that should eventually be possible, even though I feel squeamish about it. Currently, generic types are not well supported in the front end though, so it will not work right now. But eventually it should.
"Why is Z883/typed map implemented as an inline Z4/type when passed to the function evaluator? Both Z881/typed list, Z882/typed pair are implemented as Z7/function calls. It would be cleaner to parse if Z883/typed map is also a Z7/function call."
good question, I need to play around with that part myself a bit. I will for now refer to @CMassaro (WMF). --DVrandecic (WMF) (talk) 21:56, 29 September 2023 (UTC)
Hmm, can you expand on the question? The evaluator accepts either Z4 or Z7 for inputs of any of these types. If it produces one of these types as output, it always makes the type a Z7. Have you found a case where this isn't true? Or is this related to what the orchestrator passes to the evaluator? CMassaro (WMF) (talk) 18:42, 2 October 2023 (UTC)

Abstract Wikipedia/User stories

Is this still a thing? It's been tagged as a draft since January 2021. * Pppery * it has begun 03:54, 31 July 2023 (UTC)

Ditto with Abstract Wikipedia/Smoke tests - are these drafts going to be finished some day? * Pppery * it has begun 03:55, 31 July 2023 (UTC)
I'm looking into this. Sorry for the delay in replying, and clarifying those pages. Quiddity (WMF) (talk) 21:56, 9 August 2023 (UTC)

Login

Hi, may I suggest supporting Wikipedia login in wikifunctions.org, please. Fgnievinski (talk) 16:50, 31 July 2023 (UTC)

Wikipedia login worked for me there (and I have wikifunctions in Special:CentralAuth/Lockal now). Lockal (talk) 17:51, 31 July 2023 (UTC)
You're right, it worked for me, too. I just didn't realize the login credentials would be the same. I've opened a ticket at phabricator suggesting mentioning "global login" in the login page: https://phabricator.wikimedia.org/T343153 Fgnievinski (talk) 18:40, 31 July 2023 (UTC)

Clarification required: what is the meaning of "abstract"

What is the intended meaning of "abstract" in "Abstract Wikipedia"? Is it the "abstract" as in "abstract art", or does it mean "summary" like in scientific publications? MilkyDefer 08:02, 9 August 2023 (UTC)

You can read the definition in the glossary. --Ameisenigel (talk) 16:23, 10 August 2023 (UTC)

wiktionary forms

hi do wikifunction have plan to creat wiktionary forms to each wiktionary example en wiktionary Amirh123 (talk) 18:10, 11 September 2023 (UTC)

Hi. The plan is for functions to be able to be used within all other Wikimedia wikis (see the related FAQ entry and the related documentation). It will be up to the related communities to determine where and how functions are used. Quiddity (WMF) (talk) 20:44, 14 September 2023 (UTC)

Outdated?

It seems to me that a content page entitled "Abstract Wikipedia" should include the current state of the project. The page describes the project schedule, but does not describe what been done and what remains to be done. For example, did the "Abstract Wikipedia part of the project ... start in roughly 2022"? Did the Wikifunctions Beta launch in 2022? Finell (talk) 04:17, 3 October 2023 (UTC)

@Finell Thanks for reporting this, and sorry for taking this long to answer. Yes, the main page needs to be updated, and we will do it soon. We just have been caught in a lot of work behind the scenes since the launch of the project. Sannita (WMF) (talk) 15:17, 8 February 2024 (UTC)

Outdated?

It seems to me that a content page entitled "Abstract Wikipedia" should include the current state of the project. The page describes the project schedule, but does not describe what been done and what remains to be done. For example, did the "Abstract Wikipedia part of the project ... start in roughly 2022"? Did the Wikifunctions Beta launch in 2022? Finell (talk) 04:17, 3 October 2023 (UTC)

@Finell Thanks for reporting this, and sorry for taking this long to answer. Yes, the main page needs to be updated, and we will do it soon. We just have been caught in a lot of work behind the scenes since the launch of the project. Sannita (WMF) (talk) 15:17, 8 February 2024 (UTC)

Suggestions

I have some criticisms. In this article, these examples are given:

subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, English)

English : Wikipedias are encyclopedias.

subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, German)

German : Wikipedien sind Enzyklopädien.

Subclassification(Wikipedia, Encyclopedia)

English : Wikipedias are encyclopedias.

German : Wikipedien sind Enzyklopädien.

However, what bothers me is the claim that this is language-neutral, when it's so obviously Anglocentric. If it were German-centred instead, for example, the code would look like this instead:

Zeichenfolge_für_die_Unterklassifizierung_von_n_n_Sprache(n_Wikipedia, n_Enzyklopädie, Englisch)

Zeichenfolge_für_die_Unterklassifizierung_von_n_n_Sprache(n_Wikipedia, n_Enzyklopädie, Deutsch)

Unterklassifikation(Wikipedia, Enzyklopädie) (thanks, google translate!)

So, what's the point of lying? Also, I think that the functions should be stratified (to avoid nasty things like self-referential paradoxes), like predicate logic or set theory. A predicate is a special case of a function whose output is T or F. Logical operators are similar in this regard, except that the input is also T or F.

If we're mirroring predicate logic, then we could instead have:

Encyclopedia(Wikipedia) or Enzyklopädie(Wikipedia)

Similarly, the set {x|Encyclopedia(x)} (or {x|Enzyklopädie(x)}) can be defined.

Quantifiers should also be included, so that one could state that for example, all humans are mortal. Now, I have an idea for stratification (if you dislike it, feel free to modify it- this is purely illustrative. If I'm unclear, feel free to ask for clarification- I'm $h!t at explaining things :))

0th order: Objects

1st order: Functions (incl. predicates) between objects (we'll state that these functions quantify over objects, in analogy to predicates in predicate logic), statements that only quantify over objects, and sets defined by said functions. All of these collectively will be referred to as first-order objects.

2nd order: Functions between first-order objects (but no higher), statements quantifying over first order objects (but no higher), and sets defined by said functions. All of these collectively will be referred to as second-order objects.

3rd order: Functions between second-order objects (but no higher), statements quantifying over second-order objects (but no higher), and sets defined by said functions. All of these collectively will be referred to as third-order objects.

etc.

Transfinite induction is obvious:

ω-order: Functions between nth-order objects (for n<ω), statements quantifying over nth-order objects (for n<ω), and sets defined by said functions. All of these collectively will be referred to as ω-order objects. (this can be applied to other limit ordinals too)

All functions, statements, and sets will be assigned the lowest ordinals consistent with the definitions provided Username142857 (talk) 17:34, 2 January 2024 (UTC)

Change project name to "Machine Wikipedia"

Hi, according to this discussion, I propose to change this project name to "Machine Wikipedia" to match Tim Berners-Lee's vocabulary proposal about Web 3.0 and making the web machine-readable.

I also propose to make fully textual articles called "Machine articles" written in RDF, like other editions of Wikipedia, these fully textual articles can be filled by humans and AI (by NLP).

The implementation of "Machine Wikipedia" is very fast and bots and machines can benefit from this edition very much. Cheers. Hooman Mallahzadeh (talk) 12:33, 7 November 2024 (UTC)

@DVrandecic (WMF): Feeglgeef (talk) 00:51, 17 November 2024 (UTC)
I would just like to add here that this project is more about giving more humans access to more knowledge, not giving robots a way to get knowledge. I'm also not quite sure what you mean by "machine-readable". All the AI scraper bots (see f:wf:Status_updates/2024-09-13#Site_reliability_issues for our own struggles with this) already can read and comprehend (to some capacity) all of the Wikimedia projects. "Abstract" more accurately describes the end goal of the project, being able to generate articles from every language in an Abstract language, and any impact on AI/machine use of Wikipedia is not really the goal. We plan to use Wikidata based objects, instead of RDF for the generation of content, which you can read more about at f:wf:Status_updates/2024-10-17. Generally, the Wikimedia community is skeptical about AI involvement in content creation; this project is not intended to change that. Thanks! Feeglgeef (talk) 00:59, 17 November 2024 (UTC)
@Feeglgeef Besides its benefits for chatbots and machines, encoding editions of Wikipedia to RDF can help to make some symmetry or harmony between existing human-readable Wikipedias.
For example: If English Wikipedia lacks some data about Steve Jobs (for example his father name), but French Wikipedia has, we can detect that deficiency of English edition by using "Machine Wikipedia" because after encoding process of English edition to RDF, the result lacks this data. Then we can alarm editors of English edition to add a sentence for that from the sentence existing in French.
I should note that Machine Wikipedia can be an accumulation of all human-readable Wikipedia (English, French etc.) containing all its data, but without any redundant information. Hooman Mallahzadeh (talk) 04:36, 17 November 2024 (UTC)
Actually, this problem would really be fixed by Abstract Wikipedia. The ideal ending for the project is for every Wikipedia to disappear in favor of one Abstract one, where f: code will be used to convert to a user's language. An RDF Wiki, separate from this, might be useful, but I would redirect you to Requests for new languages. You can create a proposal there. Feeglgeef (talk) 06:27, 17 November 2024 (UTC)
@Feeglgeef I proposed the idea here. Please inspect that. Thanks. Hooman Mallahzadeh (talk) 06:59, 17 November 2024 (UTC)
I'm not so sure that's quite the "ideal ending" for Abstract Wikipedia, for what it's worth: there will always be something that the Wikifunctions code can't capture. I don't think an Abstract Wikipedia-generated could compare with a Featured Article on enwiki, for example. My vision is more like "allowing smaller wikis to have broader coverage than they otherwise would". —‍Mdaniels5757 (talk • contribs) 19:43, 10 January 2025 (UTC)

Add DSantamaria-WMF to the team

Please add myself to the team:


DSantamaria-WMF (talk) 12:49, 3 April 2025 (UTC)

Done --Sannita (WMF) (talk) 13:30, 3 April 2025 (UTC)

RfC? Nur auf Englisch???

Was soll das hier sein? Nicht übersetzbar, also offensichtlich allein für Projekte von MuttersprachlerInnen in Englisch gedacht, aber mit Massenbenachrichtigungen auch in der deWP, die ein solches Ansinnen eines Maschinen-Wikimüll komplett ablehnen wird, vermutlich selbst die irgendwie mögliche Kontaminierung eines echten Wikis durch ein solche Zeug in der Nähe. Hier ist alles nur auf Englisch, also ist eine internationale Zusammenarbeit offensichtlich unerwünscht. Bitte kommt nicht auf die Idee, irgendwer außerhalb dieser Blase hier würde dieses Zeug haben wollen.

What's this supposed to be? Untranslatable, so obviously only for projects with only native english speakers, but with mass messages even for the deWP, that will reject such machine-wikijunk from even coming in the vicinity of a real wikipedia. Here everything is just in english, so en international collaboration is obviously unwanted. Please don't even start thinking, anybody outside of this bubble here wants this stuff. Grüße vom Sänger ♫(Reden) 13:21, 23 May 2025 (UTC)

Inzwischen übersetzt worden. Wir haben es uns auch zu Herzen genommen, und haben beim nächsten Mal, von Start weg sichergestellt, Übersetzungen zu haben. --DVrandecic (WMF) (talk) 16:06, 21 October 2025 (UTC)

Wikifunctions Fragment experiments

There is a page in Wikifunctions about experiments in generating fragments. If you are interested you can look at this page. It is a demonstration how generating sentences can work. From my point of view it is important to get many views about how to generate them and try to understand what can help make it easier. I write it down here as an announcement so maybe some people will read it after reading the page of the naming contest. Hogü-456 (talk) 20:13, 21 October 2025 (UTC)

Abstract Wiki Architect: family-based NLG toolkit

I have developed a toolkit called Abstract Wiki Architect, a family-based NLG system that explores an architecture for industrial-scale renderers for Abstract Wikipedia and Wikifunctions (shared engines per language family, per-language JSON configs, a lexicon subsystem, and QA test suites). The code and documentation are available at https://github.com/Rejean-McCormick/abstract-wiki-architect.

The core architecture is in place and I am currently working through some remaining issues and tests before running broader cross-language suites. I would be interested in feedback from people working on constructors, renderers, lexica, or related tools, especially on how this architecture might align (or conflict) with current Abstract Wikipedia and Wikifunctions designs.

Réjean McCormick (talk) 19:07, 4 December 2025 (UTC)

Implementing the project using LLMs and renaming project to "Machine Wikipedia"

Hi, the idea of this project is originally from Tim Berners-Lee in 1999 who proposed Semantic Web. So I propose some renaming for this project:

  1. Rename project to Machine Wikipedia
  2. Use RDF for structured data
  3. Use RDF-Schema for Constructors

Nowadays, LLMs can extract RDF (Resource Description Framework) from text by a high accuracy. So using LLMs we can implement that fast and with high accuracy. You can test it. Just open https://chatgpt.com/ and apply such prompt:

Extract RDF triples from the following text.

Text:
"Albert Einstein was born in Ulm in 1879."

Output the result as subject–predicate–object triples.

I had some proposal about it at: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(idea_lab)#Synchronizing_and_removing_inconsistencies_of_all_human-written_editions_of_Wikipedia_by_implementing_Machine_Wikipedia and they refer me here.

Thanks, Hooman Mallahzadeh (talk) 14:22, 14 December 2025 (UTC)

@Quiddity (WMF)@Jdforrester (WMF)@Denny @Syunsyunminmin Hi, please read the above ideas. I really think implementing Web 3.0 for Wikipedia is as a piece of cake. If we implement that, we can benefit dramatically from it. That is, promoting "Wikipedia project" to a "Machine readable project". Hooman Mallahzadeh (talk) 08:33, 15 December 2025 (UTC)
Also @ATsay-WMF. The only problem is with hallucination of LLMs, but I think its probability is not high, and many RDFs are extracted accurately. Nowadays LLMs are getting more and more accurate. Hooman Mallahzadeh (talk) 08:45, 15 December 2025 (UTC)
I seem to have been pinged. But I don't know why I'm here.
I have very little knowledge about Abstract Wikipedia. How can I help you? Syunsyunminmin 🗨️talk 16:17, 15 December 2025 (UTC)
Hi @Hooman Mallahzadeh! Thank you for your enthusiasm. I discussed these approaches and their limitations in the paper introducing the Abstract Wikipedia approach. Whereas machine learned models have become impressively better than they have been in 2020, when the paper was published, the triple-based approach has fundamental limitations, and current language models are still far away from supporting many of the 340 languages Wikipedia supports. Do you have some metrics regarding the statement that "nowadays LLMs are getting more and more accurate"? --denny (talk) 13:27, 15 December 2025 (UTC)
@Denny Hi, and thanks for your response. You said

triple-based approach has fundamental limitations

I think RDF schema nowadays is really strong to capture main facts from sentences. It has been under intense investigation by main companies. It is now in a stable version. Additionally we can add new classes to it.
I propose to see the original paper of Tim Berners Lee at here and also Semantic web on Wikipedia.
Do you agree that Semantic web and machine readability is the aim of Abstract Wikipedia? That is my first question. Is that true? Hooman Mallahzadeh (talk) 13:50, 15 December 2025 (UTC)
The main goal of Abstract Wikipedia is to provide high-quality encyclopedic articles to fill knowledge gaps in languages where this knowledge is currently not available.
You might enjoy my talk at the International Semantic Web Conference this year, where I went into this in more detail. The talk should be available online in the next few weeks. --denny (talk) 15:43, 15 December 2025 (UTC)
@Denny Do you think that LLMs like ChatGPT could implement semantic web? I propose this scenario for "to fill knowledge gaps in languages where this knowledge is currently not available."
  1. Encode source language to RDF where this knowledge is currently available
  2. Decode RDF to the target language where this knowledge is currently not available
Is that possible? Hooman Mallahzadeh (talk) 15:51, 15 December 2025 (UTC)
I think I understood your proposal. The answers to your question is in the paper and in the talk I pointed you to. --denny (talk) 11:33, 16 December 2025 (UTC)
@Denny Your article is for 2020, when LLMs were not progressed enough, but nowadays, LLMs are so powerful. I propose to give a try using LLMs to implement "Abstract Wikipedia" conveniently and fast. Then assess its accuracy against human written one, for at least 2 or 3 articles. If its accuracy is low, then leave my idea, but if its accuracy is significant, please consider that. Thanks again. Hooman Mallahzadeh (talk) 06:54, 18 December 2025 (UTC)
@Denny In my opinion, the architecture proposed in this article is a layer above RDF. So I propose
  1. Convert Article to RDF
  2. Convert RDF to "Architecture for a multilingual Wikipedia"
Do you agree? Hooman Mallahzadeh (talk) 07:16, 18 December 2025 (UTC)
Topic: Technical Proof of Concept: Neuro-Symbolic Architecture (Abstract Wiki Architect)
@Hooman Mallahzadeh @Denny
I have implemented a working prototype ("Abstract Wiki Architect") that technically resolves the conflict between the LLM-based approach (Machine Wikipedia) and the Constructor-based approach (Abstract Wikipedia) discussed here.
The implementation uses a Neuro-Symbolic architecture that addresses both the automation needs Hooman raised and the accuracy/structure concerns Denny raised.
1. Solving "Hallucination" (LLM as Architect, not Author) The codebase demonstrates that we do not need to choose between "risky LLMs" and "manual coding."
  • Implementation: The system uses an LLM (Gemini) via an ArchitectAgent, but not to generate the final text.
  • Mechanism: The LLM generates Grammatical Framework (GF) source code. This code is then compiled. If the LLM hallucinates a rule, the compiler throws an error (caught by a SurgeonAgent in the pipeline).
  • Result: The final output is generated by the deterministic GF engine, guaranteeing grammatical correctness and fidelity to the data, while still benefiting from the speed of LLM automation.
2. Data Structure: Semantic Frames vs. RDF Regarding the debate on RDF Triples:
  • Implementation: My system replaces flat RDF triples with Hierarchical JSON Semantic Frames (defined in schemas/frames/).
  • Reasoning: As noted in the discussion, triples are often too flat for complex narratives. The JSON Schema approach allows for nested structures (capturing Causality, Temporal Scope, and Roles) which are necessary for generating encyclopedic quality text, which the system validates programmatically before generation.
3. Multilingual Scaling For "Tier 3" languages (complex morphology), the system utilizes the GF Resource Grammar Library. The LLM handles the abstract logic, while the pre-existing mathematical rules of GF handle the morphology. This avoids the "decoding" errors common in pure LLM translation for agglutinative languages.
You can view the architecture as AI writing Symbolic Code rather than AI writing text directly. Réjean McCormick (talk) 15:21, 28 December 2025 (UTC)

Not putting all eggs in one basket: the alternative to Abstract Wikipedia

I doubt Abstract Wikipedia could ever be used to write long comprehensive articles. It doesn't have the potential to be used to write articles like en:Lymphatic system or en:Expressionism or en:Cosmology and is quite limited in what can be achieved with it.

This is partly because it takes much longer for people to write anything in its functions syntax; it's much more difficult to create and edit; and it's much more text – for an example see what's needed for the short sentence in en:Abstract Wikipedia#Example which is data in Wikidata turned into natural language which is what most sentences I think will probably be like (existing WD data and new data inserted into it).

How useful having a number of multilingual articles that are just a few paragraphs or less about datapoints like, simply speaking, "x is the capital of y and as of year has z inhabitants" is of course debatable as is whether AW can ever achieve much more than that (I'm not saying it definitely can't).

Outline (not as simple as it sounds but also quite feasible)

Especially with all this in mind, it's not wise to put all eggs in this one basket and not explore and work on alternative approaches. Doing so is not a good idea in general: if you just do one thing, put all resources and focus onto it, and it turns out to be far inferior to other approaches or more limited than these, then you may miss out on important innovations.

Thus, I recommend people interested in this project check out the proposal below. It could feasibly double Wikipedia readers/reads in a short time if implemented like proposed so I think it's important to consider or start some work on it. Two key aspects here are that articles can be found by people searching the Web (with DuckDuckGo, Google, etc) in their own language and that human editors can effectively correct flaws in translations. The image on the right summarizes the architecture but more details, better explanations, and basically a FAQ can be found on the page:

W78: Wikipedia Machine Translation Project
Prototyperspective (talk) 14:00, 19 November 2025 (UTC)

Technical details regarding Abstract Wiki Architect
Topic: Automation Solves the "Verbosity" Problem (vs. Machine Translation)
@Prototyperspective
You hit the nail on the head regarding the bottleneck of Abstract Wikipedia: "...it takes much longer for people to write anything in its functions syntax."
If we rely on humans to manually write constructors for every sentence, you are absolutely right—it won't scale to complex articles like "Lymphatic System."
However, the solution isn't to retreat to Machine Translation (your W78 proposal), which introduces "Translationese" and propagates English biases. The solution is to automate the Abstract layer.
I have implemented a Neuro-Symbolic Architecture (Abstract Wiki Architect) that solves exactly the issues you raised:
  1. Solving the "Too Difficult to Write" Problem: You mentioned that writing functions is hard. In my implementation, humans don't write the functions.
    • I use an LLM (Architect Agent) to convert high-level intent (JSON data) into the complex Grammatical Framework (GF) code required by Abstract Wikipedia.
    • This removes the manual toil while keeping the mathematical precision of the Abstract layer.
  2. Complex Articles (Beyond "X is capital of Y"): You incorrectly assume Abstract Wikipedia is limited to simple data points.
    • By using Recursive Semantic Frames (schema-based structures for narratives, causality, and time), we can represent complex concepts.
    • The system generates deep nested structures (e.g., "The war started because X, despite Y attempting to intervene..."), which are then rendered natively into target languages.
  3. Why this is better than Machine Translation (W78): Your proposal suggests translating English articles.
    • The MT Flaw: If you translate English -> Zulu, you get "English grammar with Zulu words."
    • The Abstract Solution: My system generates Zulu from abstract meaning using Zulu morphology rules (via GF RGL). The result is native-sounding text, not a translation.
We shouldn't put all eggs in the "Manual Abstract" basket, nor the "Machine Translation" basket. The third way is AI-Architected Abstract Wikipedia. Réjean McCormick (talk) 15:24, 28 December 2025 (UTC)
It's not meant as a solution to any of the problems of Abstract Wikipedia. What you described may or may not be a solution to one of the problems of Abstract Wikipedia, but it's not addressing what would be addressed in the proposed project. I did think about whether something like you described may be possible but it remains to be seen if and how well that works but even if it works well, correcting issues would be extremely cumbersome with the functions syntax. Also if it worked well, machine translation systems would make use of this approach.
And no, if you read the proposal in full you'd know it wouldn't only use English Wikipedia articles as the source articles but the article which is identified to be the most high-quality. The largest fraction of these would be in English Wikipedia but articles in other WPs would also be used as foundation/source articles. Moreover, it doesn't propose to machine translate also to languages where MT does not work well (it does for English<->Spanish) such as "Zulu".
Other than these things, you're ideas are interesting and I'll try to keep up to date with how it goes; thanks for your work on it. Prototyperspective (talk) 21:21, 3 January 2026 (UTC)
Technical details regarding Abstract Wiki Architect
Thanks for the feedback, but your critique relies on a misunderstanding of how v2.0 works. You are looking for manual files, but this is a neuro-symbolic engine.
1. The "Zulu Gap" is Solved via Automation You claimed this approach fails for languages like Zulu. That is incorrect. My system defines Zulu as a Tier 3 (SVO) Target and procedurally generates the grammar structure without human intervention.
  • Evidence: The grammar isn't missing; it is auto-generated at generated/src/WikiZul.gf by the Weighted Topology Factory.
  • Why this matters: We don't need a massive corpus or manual authors to bootstrap a language. We only need the topology configuration, which the system already has.
2. Editing is Not "Cumbersome" You assume humans are writing the function syntax (ninai.constructors). They aren't.
  • The Workflow: Users interact with high-level Semantic Frames (JSON). The Architect Agent (LLM) compiles this into the rigorous GF code, and the Surgeon Agent fixes compilation errors automatically.
  • The Result: We get the mathematical precision of Abstract Wikipedia without the manual toil you are worried about.
3. Verifiability vs. Hallucination Machine Translation is statistical and prone to hallucination (especially for numbers and facts). My engine is deterministic. If it linearizes, it is grammatically valid and factually identical to the source data. For an encyclopedia, that verifiable truth is non-negotiable.
This isn't a theoretical proposal; the factory is already built. Réjean McCormick (talk) 21:31, 3 January 2026 (UTC)
You misunderstood my reply. I replied to your comment about my proposal and didn't critique your proposal much. 1. is a misunderstanding and with 2. I meant editing functions after the tool has generated the article's functions. For example, to correct errors. Re 3. MT is not "prone to hallucinations", that's something LLMs do which aren't used by MT like DeepL but are used by your proposed tool. So if anything this is an issue / potential issue of your tool. I don't see how it's deterministic but wish to not discuss this here further. Please use the thread below to discuss your proposal and this thread here to discuss my proposal, thanks. And I had understood that it has already been built (I just haven't tested it or seen a demo of it etc). Prototyperspective (talk) 21:38, 3 January 2026 (UTC)
Technical details regarding Abstract Wiki Architect
Thanks for the discussion. I respect your request to separate the threads, so I will keep this strictly technical regarding the architecture's determinism and the specific role of AI here.
There are three key architectural distinctions in Abstract Wiki Architect v2.1 that address your concerns:
1. The "Hallucination" Misconception (Compile Time vs. Run Time)
You noted that LLMs hallucinate. You are correct, but in my architecture, the LLM is not writing the article.
  • The Architect Agent (LLM) is only used at Compile Time to write the grammar rules (the software code).
  • The Compile Loop: If the LLM "hallucinates" invalid code, the Two-Phase Build System detects the compilation error immediately and rejects it.
  • The Runtime: When a user requests an article, the system uses the binary AbstractWiki.pgf. This binary is purely deterministic and rule-based. It cannot hallucinate a date or name because it is rendering structured data (JSON), not predicting the next token.
2. Determinism vs. Probability
Neural Machine Translation (NMT) is probabilistic—it guesses the most likely translation vector. My system uses Grammatical Framework (GF), which is a formal logical framework4444.
  • Data: {"name": "Marie Curie", "prop": "physicist"}
  • Rule: mkBio p n = mkS (mkCl p n)
  • Output: The output is mathematically guaranteed to be "Marie Curie is a physicist" (English) or "Marie Curie est une physicienne" (French). There is zero "temperature" or randomness in the rendering pipeline.
3. The Editing Workflow (Intent vs. Syntax)
You mentioned editing functions is cumbersome. In v2.1, we implemented Semantic Frames (e.g., BioFrame).
  • Editors do not touch the raw Ninai constructors or GF code. They edit a flat JSON object (or a UI form representing it).
  • The Ninai Adapter handles the recursive complexity automatically.
  • If an error exists, you don't "fix the translation" (as in MT); you fix the source data (Wikidata QID) or the underlying grammar rule once, and it fixes it for all future generations.
In short: I use AI to build the factory (the grammar), but the assembly line (the engine) is rigorous code. This ensures the 100% verifiability required for an encyclopedia, which statistical MT cannot guarantee.
----
Réjean McCormick (talk) 23:49, 3 January 2026 (UTC)
Re 2 if you're not proposing abstract WP articles are locked from editing the issue remains and additionally editing super abstract grammatical rules is even more complicated to end-users than editing the function that generates the sentence. Maybe you have sth to say to address that but again the thread topic is not your tool and also it remains to be seen how this works in practice which is what I'm interested in more than theoretical things. My main issue here is that the thread about another proposal is now cluttered with walls of text about another idea; maybe it would be best to collapse all this – the thread is about sth else and you have enough threads below to discuss your proposal. Re 1 and relating to the prior point, the errors the AI system can make – whether called hallucinations or not – could also be in the grammar rules etc (what you call factory).
Lastly, you don't "fix the translation" (as in MT); you fix […] or the underlying grammar rule once – this is quite similar to the key part of my proposal – the error correction system of it is about correcting flaws (and suboptimal text) across articles at scale, quite similar to editing grammar rules. Prototyperspective (talk) 19:14, 5 January 2026 (UTC)
I answered in my thread. But, quickly said, as a conclusion to my intervention here:
Regarding complexity: You are right, editing grammar rules manually is too hard. That is exactly why the system includes a Frontend UI (Next.js). This interface allows users to edit simple Data (facts/intent), while the AI Agent writes the complex Grammar code in the background. The user drives via the dashboard; they never touch the engine code. Also, unlike MT which fails silently, if the AI writes bad code, the compiler rejects it immediately (safety by design). Réjean McCormick (talk) 20:35, 5 January 2026 (UTC)

Planned Launch Date

What is the planned launch date for the Abstract Wikipedia as a own wiki. From my point of view it is possible to create a Wiki before the whole software for generating text is ready. I prefer having discussions about a new Wikimediaproject after acceptance and naming decision what is the case here at its own Wiki. As I wrote before I really like boilerplate templates and so I think maybe the more advanced way of generating sentences will be not used so much and it is possible to start without it. So far I still see challenges in getting people involved. It seems to be too complicated to contribute to it or see advantages in doing so. To get more people involved proposals are welcome and I hope other people will propose ways to make it easier to contribute to it. For me as I like Spreadsheet functions it helps when I can contribute to it using Spreadsheet functions. I wish a own Wiki for Abstract Wikipedia will be created very soon. What do you think about this. Hogü-456 (talk) 22:23, 7 January 2026 (UTC)

Hi @Hogü-456, we still don't have a planned date for Abstract Wikipedia. We are still working on that, but the community will be informed first here about the date, once we have it. Sannita (WMF) (talk) 14:49, 12 January 2026 (UTC)
What do you think about setting up a wiki early before the software for generating texts is ready. This can maybe help in bringing discussions to one place. At the moment I am not where I can discuss topics related Abstract Wikipedia and will reach at least some people. Hogü-456 (talk) 18:35, 12 January 2026 (UTC)
Hi, I'm definitly the person who is the further in Abstract Wiki development. You can validate, this is not bragging, that's a fact I'm informing you in the most polite way I can imagine!
So, Altough the question is not adressed to me since I've been given no credibility or leadership so far, let me tell you candidly:
Abstract Wiki project is completed over 99%, exceeding hopes in many ways. Since I can't seem to rally people around it (I must admit I have no skill in this domain), well, next month I might be able to finish it 100% and close it here.
I don't know the adress I'll use.
So, anyone, you are welcome to validate how what I built solved Semantic Web. You are welcome to doubt me, please stay relevant and I'll answer politely. If you want to start a big discussion about it, please start another discussion to not hack this thread.
So, @Hogü-456, altought I'm not associated with WikiMedia howsoever, Architect is located there https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Tools/abstract-wiki-architect
The other tools for Semantic web (SenTient, Kristal file format, and Konnaxion) are on my github. Otherway, to simply answer your question: Altough I'm pushed aside in this community, the latest discussions and most advanced topics on Abstract Wiki actually happens under Architect name, even though it do feel lonely there! We can all excuse WikiMedia to have ignored me, I came all happy with my acheivements, it might have frightened them. Réjean McCormick (talk) 14:16, 16 January 2026 (UTC)
@Réjean McCormick I think what is needed for Abstract Wikipedia is a Wiki in the beginning already before the software is ready. My thoughts about your tool wil be answered in another section. I want to discuss here if it is possible to start a new Wiki available at its own domain and hosted by the Wikimedia Foundation soon. Hogü-456 (talk) 19:39, 19 January 2026 (UTC)
I think leaders should step up and consolidate efforts. I do have a very comprehensive wiki standing. A Gentleman pointed me toward Udiron and Ninai which I included in my solution. But networking and consolidation efforts failed so far. There's a need for validation of solutions and progress. Réjean McCormick (talk) 23:04, 19 January 2026 (UTC)

Request for Architectural Validation: The "Hybrid" Solution to the LLM/Constructor Deadlock

@Denny @ATsay-WMF

I am creating a new topic to ensure this technical proposal is not buried in the previous thread during the holiday break.

The Context: In the discussion above, Denny correctly identified that pure LLMs fail on low-resource languages due to hallucinations. However, the alternative—manual constructor writing—does not scale.

The Solution (Abstract Wiki Architect v2.1): I have spent the last month of intense development (backed by a year of specialized training) building a working Neuro-Symbolic Engine that resolves this deadlock.

  • It uses AI Agents to automate the coding (solving the speed issue).
  • It uses Grammatical Framework (GF) to enforce strict morphology (solving the hallucination issue).
  • It implements Weighted Topology (Udiron) to correctly linearize Tier 3 languages like Zulu.

My Request: I am a developer with limited resources who has dedicated significant personal time to solving this engineering hurdle for the community. I am not just offering code; I am asking for a professional review.

I specifically ask that you validate this architecture. Does a Hybrid Neuro-Symbolic engine running Ninai protocols align with the Foundation's roadmap?

If there is no interest, or no will to validate my work, I will have to accept that the Foundation is not the right home for this technology. I will then look to apply this knowledge in the private sector to recover my investment, though my strong preference remains to donate this solution to Abstract Wikipedia.

I await your technical feedback. Réjean McCormick (talk) 16:06, 1 January 2026 (UTC)

Headline: Technical Addendum: Architecture & Capabilities (v2.1)
To facilitate the validation process, here is the specific technical breakdown of the Abstract Wiki Architect engine and why it solves the current roadmap blockers.
1. The "Hybrid Factory" Architecture
The engine does not rely on a single method. It uses a Four-Layer Hexagonal Architecture to separate Logic, Data, and Presentation:
  • Layer A (Lexicon): Usage-based sharding mapped to Wikidata QIDs. It supports lazy-loading of massive dictionaries (380k+ words) to solve the "Cold Start" problem.
  • Layer B (Grammar Matrix): This is the core innovation. It uses a Dual-Tier Strategy:
    • Tier 1 (High Resource): Uses the official GF Resource Grammar Library (RGL) for languages like English/French to guarantee perfect morphology (cases, genders).
    • Tier 3 (Low Resource): Uses Weighted Topology (adapted from the Udiron project) to automate linearization for languages like Zulu or Hausa without needing handcrafted grammars.
2. Native Protocol Support (Ninai & Z-Objects)
Unlike generic LLM wrappers, this engine is built specifically for the Abstract Wikipedia ecosystem:
  • The Ninai Bridge: I implemented a Recursive JSON Object Walker (not brittle regex) that natively parses ninai.constructors.* trees. It translates Z-Object intent directly into Abstract Syntax Trees.
  • Construction-Time Tagging: Since the engine builds the sentence rather than just predicting tokens, it automatically outputs Universal Dependencies (CoNLL-U) tags. This allows us to mathematically verify the output against standard Treebanks, solving the evaluation crisis.
3. The "Self-Healing" Agentic Pipeline
To solve the "speed vs. accuracy" trade-off, I deployed three specialized AI Agents:
  • The Architect: Generates the raw .gf source code for new languages from scratch.
  • The Surgeon: Reads compiler error logs and automatically patches broken grammar files in a loop.
  • The Judge: Performs QA by comparing generated output against a "Gold Standard" dataset and auto-files issues for regression.
Why this is powerful:
This architecture allows us to onboard a new language in minutes (via the Architect Agent) while maintaining the mathematical guarantees of the Grammatical Framework. It creates a "Hallucination Firewall"—if the AI generates invalid logic, the GF compiler rejects it before it reaches the user.
I am ready to demo this pipeline or walk the engineering team through the codebase. Réjean McCormick (talk) 16:09, 1 January 2026 (UTC)
So Architect is built with AI, and use AI for those task:
Fixing the Grammatical Framework data which is still draft (Some are perfect, but some are under development)
Generating reference "good answers" so Architect can be tested, validated by comparing, but this is only for testing phase
Otherway, I did my best to gather every ressources in a comprehensive system, Architect. There, effort from Grammatical Framework, Udiron, Ninai have been orchestrated together.
I'm resolving remaining bugs real fast, always seeking optimal architecture, best practices. It's not patched, it's straight. It's fairly out of the scale of what could have been expected, but here it is, Architect.
I'm not a linguist or a coder, but a conceptor, a debugger, and some other things on the side.
I can't wait to show the result. You'll hear it, for sure ;) Réjean McCormick (talk) 00:05, 4 January 2026 (UTC)
Architect doesn't use AI to generate text. It's deterministic. Grammatical Framework are grammars turned into code. Architect use it. There's quite a lot in Architect, it's new for me also, so I'm getting the hang of navigating it. Réjean McCormick (talk) 02:37, 4 January 2026 (UTC)
Hi Prototyperspective, I appreciate the detailed critique. I will collapse the previous technical details to keep the visual flow of your thread clean, but the points raised there are essential to this engineering challenge.
To directly address your two remaining concerns regarding the Architect solution:
1. On User Complexity (Editing Rules): There is a misunderstanding of the workflow. In the Architect model, the end-user never edits the "super abstract grammatical rules." The User edits the Data/Intent (simple JSON or a visual form). The Agent (Architect) maintains the Grammar (the complex GF code). The user is the driver; the AI is the mechanic. This shields the user entirely from the complexity you are worried about.
2. On AI Errors (The "Factory" Risk): You are right that AI can make errors in grammar generation. However, this is exactly why the Compiler (C-Runtime) exists. If an MT model hallucinates, it produces plausible but false text (hard to detect). If the Architect Agent hallucinates bad grammar, it produces invalid code, which the compiler rejects immediately (easy to detect). The system fails safely at build time, rather than failing silently at read time.
We definitely agree on the core philosophy: "Fix once, scale everywhere." The Architect simply ensures that what we are scaling is mathematically verifiable structure, not probabilistic translation.
Réjean McCormick (talk) 20:25, 5 January 2026 (UTC)
I suggest you to better go to the private sector with the idea. The chances it will be used here for Abstract Wikipedia at least in the near and mid-term future are low from my point of view. It seems to me like it contains a lot of buzz words and I have not already understood how it is possible to find the grammar rules, especially in languages with a small corpus of available digitized texts. Validation of the output is from my point of view very important and it is necessary to have a human how understands the generated code and is validating the output. Using an agent for generation is possible if the result will be validated. For the Wikimedia project Abstract Wikipedia it is from my point of view very important to have a technology for generating the text what is easy to understand. If the rules are too complex it can be difficult to have enough people who understand it. At least I think from now for the next three years content in Abstract Wikipedia will be simple sentences and so there is no need of a complex technology for generating it. Hogü-456 (talk) 19:53, 19 January 2026 (UTC)
If I have any conclusion or opinion on this architect development then it's that I suggest you to better go to the private sector with the idea is a very bad idea and Wikimedia contributors interested in making Abstract Wikipedia a success better look into it and not ignore the work and input by this user which seems to be about a few of the core weaknesses of the AW approach. I'm personally not so interested in AW to be successful since I think a viable alternative approach exists that I'm interested in. However, again without clear simple real-world demonstrations, it's not so likely to get much feedback. For example, if technology for generating the text what[that] is easy to understand can't achieve lots of comprehensive articles then this proposed criteria simply is unfit if AW is to become a success in terms of usefulness and use.
Basically, I would rather conclude that maybe this is something for a nonwikimedia free knowledge-related nonprofit organization (particularly one that is more interested in / open to innovation, development and use of technologies for good) and/or that it needs more developments so that concrete illustrating demonstrations of this become possible. Prototyperspective (talk) 22:31, 19 January 2026 (UTC)
Well, this aint buzz words, this is specific language for a complex architecture.
I'm a bit stubborn, so I still wish to make it a shared ressource. But It seems I can't find motivation or something here. I'll keep it common, but as you pointed out, here might not be the place, even tough the call for it comes from here! Anyway, I don't rely on the community here, I still haven't been able to really discuss with those who made the call, even after my huge contribution. So, I wont go to private sector, but I might just finish it on my side, then share it elsewhere. In the end, I only stumbled here because I understood the power of semantic. I did not developped Architect following Abstract Wiki call, it simply intersect with what I do. I hoped to find a community to develop with, merge effort, but I guess I don't quite get it. Anyway, I did read your invitation to go away.
Still, it looks like you wait for me to say "It all work, job done", instead of finishing building it as a team. *sight*
Response to concerns (buzzwords / low-resource grammars / validation / “simple sentences first”)
:::* This is not “grammar discovery from corpora”.
    • Nothing here depends on “learning grammar rules from large text corpora”.
    • The core is rule-based generation: given a structured input (a frame / function call), a grammar linearizes it into text.
    • For low-resource languages, the *minimum viable mode* is explicitly simple: small slot-templates (e.g. Bio-like “X is a Y from Z”) plus a tiny lexicon. No big corpus required.
  • Low-resource languages: start with explicit defaults, then improve incrementally.
    • The initial goal is not perfect linguistic coverage; it is usable, predictable starter sentences.
    • “Tier-3 / safe-mode” is intentionally boring: deterministic word-order defaults + slot filling, so communities can get correct/simple outputs early, then refine grammar/lexicon over time.
  • Validation and human understanding are mandatory, not optional.
    • Generated text is only useful if it is reviewable. The approach is the same as normal software:
      • human-readable source rules (grammar code + lexicon files)
      • regression tests / “gold outputs” that show diffs when anything changes
      • human review before enabling broader use
    • If any “agent” is used, it is only to scaffold boilerplate faster; it does not remove the need for review. Output remains explicit code and testable results.
  • Complexity control: keep things easy-to-understand by design.
    • The public interface stays simple (frames with named fields). Contributors can work at the “template” level without understanding the whole grammar stack.
    • The internal grammar stack is layered so newcomers can contribute safely:
      • add missing lexicon entries (names/terms)
      • add/adjust one simple template realization
      • expand to richer grammar rules only when there is community capacity
  • Near-term scope (next ~3 years): yes, simple sentences first.
    • I agree that early Abstract Wikipedia will mostly be simple declarative sentences.
    • That is exactly what the current frame/template approach targets: starter sentences that are easy to generate, easy to debug, and easy to review.
    • Longer-term “full articles” is a separate orchestration problem (planning + ordering + grouping many small sentences). That can be added later as an *optional layer*, without making the sentence generator itself complex.
  • About “private sector”: this is precisely the type of tooling Wikimedia needs to keep control.
    • If we want AW to be sustainable, the generation logic must remain transparent, editable, and testable by the community.
    • Keeping the rules in plain files + tests + review workflows is the opposite of a black-box vendor dependency.
  • Why demos matter (and what “success” looks like here).
    • The realistic measure of progress is not big promises, but small repeatable demos:
      • generate 1–3 correct starter sentences for many entities and languages
      • show deterministic rebuilds (same input → same output)
      • show that regressions are caught by tests
    • This is the “boring but scalable” path: small sentences now, richer structures later.
Réjean McCormick (talk) 15:46, 21 January 2026 (UTC)
So, just to be clear, you do suggest me to go with Musk and help him build his private encyclopedia? @Hogü-456 @Prototyperspective
About buzz words, this is not texts I composed. It's made with AI. AI don't throw any words just to make a show! And I'm not telling AI what to say, I ask AI (ChatGPT or Gemini) to analyse and answer. So you can fairly assume there's no bluff, no empty words. Réjean McCormick (talk) 18:34, 21 January 2026 (UTC)
@Réjean McCormick Hello and thanks for reaching out. We discussed your proposal at length in the last couple of weeks internally, and we would like to ask you to give us a demonstration of your project, to ask you more questions about it. Are you ok with that? If yes, I'll contact you privately with a number of potential dates and times to have the demo. Thank you in advance! Sannita (WMF) (talk) 15:03, 20 January 2026 (UTC)
I'm good with it! Please first contact me by email, so we validate the most relevant communication methods. I suggest I provide this: screenshots of the interfaces, updated status (what exactly is built, what remains to adjust or fix, or build). Rest assured, it's so advanced and so well built that from this new milestone we can foresee the end result.
I'm very proactive, so I think you might send me questions right now, privately or public on Architect Wiki, so you can get accurate information about Architect and my activities. I understand my other activities can raise concern. Rest assured that I'm fully willing to adapt to Wikimedia community framework, given minimal adaptation time, so your scope is respected.
Also, notice that given the power of the machine I'm building (did you get the overview?), I'm slowing down development because I believe ethical safeguards are relevant at this point, as well as a coordination structure (human networking). This is one reason I'm happy you are answering my call, so we can asses the situation and move forward with renewed knowledge, safely for all.
So, I can provide reports with tech details, or stay more on the conceptual level for the non-techies. Well, I can even make a song about it, to get the feeling. Let's go! :D Réjean McCormick (talk) 16:09, 20 January 2026 (UTC)
Hi Denny / Hi everyone,
I wanted to share a brief update regarding the progress of Architect.
The prototypes have shown very promising results regarding deterministic content generation and "Safety-by-Design" validation—key challenges I am working to solve for the ecosystem.
To ensure this momentum continues without relying on immediate internal bandwidth, I am moving to professionalize the development of these tools.
I will be submitting funding proposals to both external sources (such as NLnet) and the relevant Wikimedia technical grant channels.
My goal is to secure the resources needed to maintain this infrastructure as a reliable open-source utility for the community.
I will continue providing updates on this channel if I receive confirmation that the project's original call for a semantic engine is still active, and if I can find proper guidance within the community.
After two months of building and attempting to gather support—and having to discover these funding pathways independently—I need to verify that there is actual alignment here before investing further effort in communication.
Best regards,
Réjean McCormick
@Denny Réjean McCormick (talk) 14:51, 25 January 2026 (UTC)
Hello,
I haven't had any hand extended or any show of interest beyond a polite "we'll look into it".
So I guess it's ok to publish the figures, in the eventuality some persons are interested in finishing Abstract Wiki using Architect. Architect is way beyond current state of Abstract Wiki.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Tools/abstract-wiki-architect
I hope the megabyte I took to store those figures on your server doesn't bother you, I can also just go away like I've been invited to, by member of this communitym, on this page. Still, I guess somehow some persons might still be interested by moving Abstract Wiki toward a completion stage, so my invitations remains open.
You must understand the resistance: I've been moving real fast, so it looks like leaders here are stunted or puzzled, I don't know, they don't speak to me. Architect and what surrounds it is definitely gigantic, so they need time to process. I'm left hanging in the meanwhile, but I don't give up, I'm just leaving Architect as-is, so people still can have the opportunity to build a part of it, a part of history of semantic web. There's not much left to do, hurry if you want your name on the machine ;)
@Sannita (WMF) Réjean McCormick (talk) 14:18, 28 January 2026 (UTC)

Boilerplate templates

As I wrote before I think at the beginning most sentences in Abstract Wikipedia will be very simple. Such simple sentences can be generated using templates with positions for variables. As far as I know such a thing is also called boilerplate template. From my point of view it is important to have such a option available in Abstract Wikipedia at its launch. A good example of a collection with possible sentences is User:Dnshitobu/Dagbani_Fragments. I wish a way what it makes it easy to generate such sentences at the launch of Abstract Wikipedia. It is a monolingual approach what requires mapping Wikidata statements to boilerplate templates for a specific language. Is there so far a plan to offer such a solution for Abstract Wikipedia at its launch. From my point of view it is maybe possible to derive more complex rules from the simple templates to generate advanced sentences. Hogü-456 (talk) 20:02, 19 January 2026 (UTC)

Architect already supports the “simple first sentence” approach using frame-based templates. A frame like BioFrame is essentially a boilerplate template with placeholders, and each language’s concrete grammar is the template that realizes it correctly in that language. This is designed to be accessible: people can generate these sentences through the UI, or by calling the REST endpoint (e.g. POST /api/v1/generate/{lang_code}) with a JSON frame. What’s still missing compared to your Dagbani fragments idea is a large, language-specific library that maps many Wikidata properties into many different sentence-frame templates automatically. Réjean McCormick (talk) 20:40, 19 January 2026 (UTC)
Can you please send me an example of such a frame like BioFrame. Hogü-456 (talk) 20:47, 20 January 2026 (UTC)
Detailed Explanation
:::Here are the details and examples for BioFrame, the primary semantic structure used for generating biographical sentences.
1. What is the BioFrame?
The BioFrame is a strict, flat JSON object used in the "Strict Path" of the API. It represents the semantic intent to generate a biographical sentence (e.g., "Marie Curie is a French physicist"). In v2.1, it is defined as a Pydantic model in the Core Domain layer.
2. JSON Structure & Fields
The schema is strictly flat (no nested intent objects).
Field
Type
Required
Description
frame_type
Literal["bio"]
Yes
Discriminator field. Must be exactly "bio".
name
str
Yes
The subject's proper name (e.g., "Alan Turing").
profession
str
Yes
Lookup key for the Lexicon (e.g., "computer_scientist").
nationality
str
No
Lookup key for the Lexicon (e.g., "british").
gender
str
No
"m", "f", or n (Critical for correct inflection).
context_id
UUID
No
Used by the Discourse Planner to link sentences (enables "She" vs. "Marie").
3. Usage Examples
A. Standard Request (Perfect Data)
This generates a full sentence using the mkBioFull grammar function.
POST /api/v1/generate/en
JSON
{
"frame_type": "bio",
"name": "Alan Turing",
"profession": "computer_scientist",
"nationality": "british",
"gender": "m"
}

Output: "Alan Turing is a British computer scientist."

B. Partial Request (Missing Nationality)
The v2.1 architecture handles missing data via Overloading. If nationality is omitted, the engine automatically selects mkBioProf instead of failing.
POST /api/v1/generate/fr
JSON
{
"frame_type": "bio",
"name": "Marie Curie",
"profession": "physicist",
"gender": "f"
}

Output: "Marie Curie est une physicienne." (Note the gender agreement)

C. Context-Aware Request (Pronominalization)
If a context_id is provided and matches a previous session where "Marie Curie" was the focus, the system replaces the name with a pronoun.
JSON
{
"frame_type": "bio",
"name": "Marie Curie",
"profession": "chemist",
"context_id": "session_123"
}

Output: "She is a chemist."

4. How it Maps to the Engine (The "Triangle of Doom")
To generate text, the BioFrame travels through three layers defined in the Schema Alignment Protocol1:
  1. Input (API): The user sends the JSON BioFrame.
  2. Logic (Adapter): The GrammarEngine maps the JSON fields to Abstract Grammar functions2:
    • All fields present: Maps to mkBioFull : Entity -> Profession -> Nationality -> Statement
    • Missing Nationality: Maps to mkBioProf : Entity -> Profession -> Statement
  3. Render (GF):
    • Tier 1 (High Road): Uses RGL macros (e.g., mkS (mkCl s (mkVP n p))) for perfect grammar3.
    • Tier 3 (Factory): Uses Weighted Topology (e.g., sorting Subject, Verb, Object weights) to glue strings together4.
5. Ninai Compatibility
While BioFrame is the internal format, the system also accepts Ninai (Abstract Wikipedia) trees and flattens them into a BioFrame automatically via the NinaiAdapter5.
Ninai Input:
JSON
{
"function": "ninai.constructors.Statement",
"args": [
{ "function": "ninai.types.Bio" },
{ "function": "ninai.constructors.Entity", "args": ["Q7186"] }, // Marie Curie
{ "function": "ninai.constructors.Entity", "args": ["Q169470"] } // Physicist
]
}
Internal Conversion:
The adapter extracts the QIDs, looks them up in the Lexicon (e.g., Q169470 -> "physicist"), and constructs the flat BioFrame.

Réjean McCormick (talk) 22:27, 20 January 2026 (UTC)

about the last complexity layer, you can see:
Adding the last complexity layer with SwarmCraft
https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Tools/abstract-wiki-architect#Proposal:_SwarmCraft_as_an_Article-Orchestration_Layer_above_Abstract_Wiki_Architect Réjean McCormick (talk) 15:48, 21 January 2026 (UTC)

Integrating Abstract Wikipedia

There are discussions about Integrating Abstract Wikipedia happening in Wikifunctions after a new Status update has been published. If you are interested you can discuss and also write down your thoughts about the place for this discussion. From my point of view discussions about Abstract Wikipedia should happen in Abstract Wikipedia. So I hope it will exist as an own Wiki soon. Hogü-456 (talk) 16:07, 1 February 2026 (UTC)

Text to Statements

From my point of view there are use cases where expressing statements in a sentence can help understanding what to express here. The following form, in parts created with the help of the AI model GPT-OSS 120B available for example at duck AI enables adding statements to Wikidata using a sentence template to express the statement. The form creates a QuickStatementsuploadlink and it is available at my PAWS-profile. From my point of view collecting such sentences can be helpful for Wikidata and Abstract Wikipedia. What can be done in one way through adding statements through sentences can be done the other way round too. So converting a statment pair into a sentence. If there are enough sentence templates it can cover the most important facts available at Wikidata. From my point of view there should be a mapping of such statement pairs to their language specific template. Then it is possible display a specific statement pair in each available language. I hope Abstract Wikipedia will be available soon and there will be enough activity. When I look at this page it seems as the community interested in contributing to it is so far very small. At least the number of people who are part of the discussions is very small. Hogü-456 (talk) 22:53, 13 February 2026 (UTC)