Talk:Abstract Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
This page is for discussions related to the Abstract Wikipedia page.

  Please remember to:


  Archives: 1 2

Wikimedia Community Logo.svg


Sub-pages[edit]

Articles[edit]

Talk pages[edit]

Google's involvement[edit]

Moved from #Statement of opposition

This project is proposed by Google (see here). It is very shocking that Wikimedia is becoming a lobby organisation for this company, which stands for monopolisation and against data protection on the Internet. Habitator terrae (talk) 11:36, 9 May 2020 (UTC)

@Habitator terrae: I am sure it won't change your mind, but just to correct the facts: this is a proposal by me in my personal capacity, and it is not a proposal by my employer. Also, to bring receipts: I made a previous version of this proposal in 2013, before I was employed by Google. I have been working on this for many years. --denny (talk)
@Denny: This statement is contradictory to your paper: Why did you write "Denny Vrandečić Google" there??? Habitator terrae (talk) 14:39, 9 May 2020 (UTC)
@Habitator terrae: Because that is my name and my employer. What else would I write on a research publication? --denny (talk) 14:56, 9 May 2020 (UTC)
OK, I think I get it: I make a distinction between an official Google proposal and a proposal by someone employed by Google. You seem not to make that distinction. I guess that's the point where we can agree to disagree here. Sounds right? --denny (talk) 15:04, 9 May 2020 (UTC)
The point is, that this isn't only your personal idea. This paper is obvious written and published by your role as employee by Google. This isn't only your hobby. It's your work. Right? Habitator terrae (talk) 15:27, 9 May 2020 (UTC)
@Habitator terrae: I don't understand what you mean with "this isn't only my personal idea". I want to make sure we have a common understanding of that before it leads to more misunderstandings, particularly since you say that this is the point, and I don't get it yet. I would appreciate some clarification on this.
And yes, research is my work, it is not just a hobby of mine, and publishing research results is an integral part of research and thus of my work as a researcher. --denny (talk) 15:47, 9 May 2020 (UTC)
To clarify: You was paid by Google to research on this topic. Because of this research you propose this project you propose this project in you're role as a researcher at Google. Therefore it is obviously, that this is a proposal "paid" by Google and not only because of your personal intention. In conclusion, the proposal is from Google. Habitator terrae (talk) 21:00, 9 May 2020 (UTC)
Denny has claimed the exact opposite just a few lines above (“this is a proposal by me in my personal capacity, and it is not a proposal by my employer”), and I don’t see any reason not to trust him. This is a really bizarre conversation here. —MisterSynergy (talk) 22:48, 9 May 2020 (UTC)
The reason is, that his whole role at Google is about Wikimedia. I see no reason to believe, that is uninfluential on this proposal. Habitator terrae (talk) 22:26, 10 May 2020 (UTC)
@Habitator terrae: Yes. As I described in that mail you link, my current role is to 'facilitate the improvement of Wikimedia content by the Wikimedia communities', 'improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.' I think with this proposal I am doing exactly that. --denny (talk) 00:50, 11 May 2020 (UTC)
So we could conclude, developing the concept proposed here was part of your work at Google. Habitator terrae (talk) 15:14, 11 May 2020 (UTC)
@Habitator terrae: if in was propousal from Google, it would be better, becouse Google would put some effort into it. Not to tell about the support from multimillion corporation is not very clever, if this support implied. So, as I see, you just being unethical toward Denny, in fact calling his words a lie.Carn (talk) 06:43, 18 May 2020 (UTC)
@Carn: Two questions:
  1. Why did you ping me, if you already see I'm "just being unethical"?
  2. Where I "in fact calling his words a lie"? My whole argumentation is based upon his words.
"if in was propousal from Google, it would be better, becouse Google would put some effort into it."
Google uses Wikipedia-Information for example in en:Knowledge Graph. The Community of this Wiki would transfer the information in a much more useful form for Google. I guess it is possible such Community would be much more effective, compared to bots. Further with your argumentation there would be no justifications for projects like mw:Content translation/Machine Translation/Google Translate, in the wake which Wikimedia called Wikimedia and Google partners.
Habitator terrae (talk) 07:59, 18 May 2020 (UTC)
This your words - Therefore it is obviously, that this is a proposal "paid" by Google are the implimentation that Denny lie to you. Sorry did ping you, but this project is not main for the most people, I myself prefer to know then someone talk to me. Thank you that you admit that Google and Wikimedia are helpful to each other. Carn (talk) 08:07, 18 May 2020 (UTC)
Denny already stated, what you call "the implimentation that [he] lies to [me]", only with other words:
"As I described in that mail you link, my current role [paid by Google] is to 'facilitate the improvement of Wikimedia content by the Wikimedia communities', 'improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.' I think with this proposal I am doing exactly that [what I'm paid for]."
Further many actions by Google fight the mission of a free Internet, by monopolizing the use of knowledge, collecting personal data from every person possible or personalizing humans into filter bubbles with unopen algorithms. Therefore it is contradictory to Wikimedias mission to be a partner with Google.
Habitator terrae (talk) 13:04, 18 May 2020 (UTC)
I perceived this words as saying that a he should do socially useful work for 20% of his time in Google. And he himself decided that this proposal was such a socially useful work, and not his boss said - "We have a plan to conquer the Internet, you are responsible for sub-clause C of this plan, start work on lobbying our interests on Wikipedia!" - this impression is created from what you say about the situation.
If Wikipedia putted itself against corporations, then we would forbid the commercial use of the created content, which is not, so it seems to me your windmill war is not common for wikipedia society.Carn (talk) 08:10, 19 May 2020 (UTC)
Hello Carn, first I must correct some of your statements:
  1. "I perceived this words as saying that a he should do socially useful work for 20% of his time in Google."
    No, the 20% Project (as linked by Denny) is time of work, for example AdSense for advertisement (not social) was created with this 20%.
  2. "And he himself decided that this proposal was such a socially useful [=20%, as stated before this is incorrect] work".
    Yes, but now it is his fulltime job, decided by himself and Google, see statement here.
  3. "and not his boss said - ' We have a plan to conquer the Internet, you are responsible for sub-clause C of this plan, start work on lobbying our interests on Wikipedia! ' - this impression is created from what you say about the situation."
    Sorry, the intended expression was: Google already "conquered" the Internet.
In my view Wikipedia should put itself for corporations, that do not fight against the freedom of internet. This isn't question of commercial or not.
Also the Copyleft-licenses of Wikipedia express its will, that the freedom of knowledge should be protected, also if it is used commercial.
Habitator terrae (talk) 17:20, 27 May 2020 (UTC)
Denny: I think you need to clarify whether you are being paid by Google to develop Wikilambda and Abstract Wikipedia. What you're saying gives the impression that you're trying to muddle the waters. On the one hand you're trying to distinguish "an official Google proposal and a proposal by someone employed by Google" to give the impression that you're working on it outside your Google working hours in your Wikimedia volunteer capacity, but on the other hand you stated your affiliation as Google on your paper (why not use, say, "Wikidata" or "Croatian Wikipedia" if you were doing it as a volunteer? I certainly make that distinction when I make my Wikimania submissions depending on whether my university or a WM affiliate is paying for my attendance that year) and you said your role at Google is to improve connections with Wikipedia and Wikidata. We need to know whether Google will be contributing your working hours to Wikilambda (in which case there will be resource negotiations with Google regarding hiring additional staff developers) or not (in which case Denny will not be able to manage all of it himself). Deryck C. 13:40, 27 May 2020 (UTC)
I'll try to clarify. Please keep asking if I don't clarify sufficiently.
I worked on this mostly in my 20% time. Googlers are being paid while working on their 20% time. I publish the paper with my Google affiliation because I was paid by Google while working on the paper.
The project is not an official Google project / product. There are many publications and Open Source projects that are written by Googlers with this status Check out this search. An official Google product has organized company support, usually with a team, etc. This project doesn't.
I hope this makes the situation clearer, and doesn't muddle the waters. --denny (talk) 17:12, 27 May 2020 (UTC)

Denny: Thank you, your explanation about your current position with this project is very clear. I think the next question that the Wikimedia community ought to know is: How much support should Wikilambda expect from Google if this goes ahead? Will we get some of Denny's 80% from Google to work on Wikilambda? How many Google employees in addition to Denny should we expect to have on the project if WMF and the WM community approve this? Deryck C. 22:02, 27 May 2020 (UTC)

@Deryck Chan: Sorry for not answering this question earlier, but I guess now you know why :) For the record: I left Google and joined the Wikimedia Foundation in order to lead the development of this project. --DVrandecic (WMF) (talk) 03:33, 15 July 2020 (UTC)
@DVrandecic (WMF): Will Google continue to contribute to Abstract Wikipedia in any way? Deryck C. 11:09, 15 July 2020 (UTC)
@Deryck Chan: I honestly don't know. --DVrandecic (WMF) (talk) 17:23, 15 July 2020 (UTC)

Distillation of existing content[edit]

I wonder whether we have hold of the right end of the stick. From the little I have so far read of this project, the big idea seems to be to help turn data into information. The goal, in practice, seems to be a satisfactory natural language presentation of previously defined subjects and predicates (content). Or, rather, the goal is to develop the capability to present content in any natural language. That, I say, is a worthy ambition.

As a contributor who mainly edits articles written by others, I feel the need to look at this from the opposite direction. Missing content is, of course, a problem. But so are inconsistent content, misleading content and confusing content. For me, an "abstract wikipedia" distils the content of an article, identifying the subjects, their qualities and the more or less subtle inter-relationships between them. That is, I imagine we begin with natural language content and automatically analyse it into an abstract form. I believe this is one way that non-technical contributors can really help us move towards our goal. If the person editing an article can see a "gloss" of the article in its target language, it is easy to imagine how they might adjust the original text so as to nudge the automated abstract towards a more accurate result. At the same time, the editor could be providing hints for more natural "rendering" (or re-rendering) of the abstract content into the natural language. In practice, this is what we already do when we provide alternative text in a link.

In my view, this sort of dialogue between editor and machine abstraction will help both. The editor gets objective feedback about what is ambiguous or unclear. I imagine some would prefer to "explain" to the machine how it should interpret the language ("give hints") while others might prefer to change the language so that it is more easily interpreted by the machine. Either way, the editor and the machine converge, just as collaborative editors already do, over time.

The crucial point here, I suppose, is that the re-rendering of abstracted content can be more reliably assessed at the editing stage. To the editor, it is just another tool, like "Show preview" or "Show changes" (or the abstract results appear with the text when those tools are used). Giving hints becomes as natural to the editor as fixing redlinks; how natural taking hints might become, time alone can tell.

Congratulations on getting this exciting project started.--GrounderUK (talk) 01:21, 5 July 2020 (UTC)

tldr: Once the content become abstract one, it will be a technical burden for community to contribute.
That is why the content should be maintained as part of the wikipedia editing process, with the editor only (and optionally) guiding the way the content is mapped back into Wikidata or verifying that it would be rendered correctly back into the source language WikiText (which is still fully concrete and always remains so).--GrounderUK (talk) 22:37, 5 July 2020 (UTC)

I do not think automated abstract is a good thing to do - this means we introduced a machine translation system, which generated interlingua with unclear semantics and can not be reused simply. I am also very skeptic for making all articles fully abstract.--GZWDer (talk) 21:22, 5 July 2020 (UTC) [signature duplicated from new topic added below]

I would be sceptical too. In fact, I would strongly oppose any such proposal. My suggestion (in case you have misunderstood me) is for a human-readable preview of the machine's interpretation of the natural language WikiText.--GrounderUK (talk) 22:37, 5 July 2020 (UTC)
@GrounderUK: Yes! Thank you for the congratulations, and yes, I agree with the process you describe. The Figure on Page 6 sketches an UI for that: the contributor enters natural language text, the system tries to automatically guess the abstraction, and at the same time displays the result in different languages the contributor chooses. The automatic guess in the middle can be directly modified, and the result of the modification is displayed immediately.
I also very much like how you describe the process of contributors fixing up ambiguities (giving hints). I also hope for such a workflow, where failed renderings get into a queue and allow contributors to go through them and give the appropriate hints, filtered by language.
But in my view nothing of this happens fully automated, it always involves the human in the middle. We don't try to automatically ingest existing articles, but rather let the contributors go and slowly build and grow the content.
Thank you for the very inspiring description of the process, I really enjoyed reading it. --DVrandecic (WMF) (talk) 05:15, 14 July 2020 (UTC)
@DVrandecic (WMF): You are very welcome, of course! Thank you for the link to your paper. You say, "The content has to be editable in any of the supported languages. Note that this does not mean that we need a parser that can read arbitrary input in any language." I think we agree. I would go further, however, and suggest that we do need a parser for every language for which there is a renderer. I invite you to go further still and view a parser as the precise inverse function of a renderer. If that sounds silly, it may be because we think of rendering as a "lossy" conversion. But the lesson from this should be that rendering (per se) should be constrained to be "lossless", meaning neither more nor less than that its output can be the input to the inverse function (the "parser"), returning as output the exact original input to the renderer. Such an approach implies that there will be subsequent lossy conversion required to achieve the required end result, but we need to think about ways in which the "final losses" can be retained (within comments, for example) so that even the end result can be parsed reliably back into its pre-rendered form. More importantly, an editor can modify the end result with a reasonable expectation that the revised version can be parsed back into what (in the editor's opinion) the pre-rendered form should have been.
To relate this back to your proposed interaction, where you say, "The automatic guess in the middle can be directly modified", I hesitate. I would prefer to think of the editor changing the original input or the implied output, rather than the inferred (abstract) content (although I doubt this will always be possible). We can explore the user experience later, but I would certainly expect to see a clear distinction between novel claims inferred from the input and pre-existing claims with which the input is consistent. Suppose the editor said that San Francisco was in North Carolina, for example, or that it is the largest city in Northern California.
I agree about always involving the human in the middle. However... If we develop renderers and parsers as inverses of each other, we can parse masses of existing content with which to test the renderers and understand their current limitations.--GrounderUK (talk) 23:11, 19 July 2020 (UTC)
@GrounderUK: Grammatical Framework has grammars that are indeed bi-directional, and some of the other solutions have that too. To be honest, I don't buy it - it seems that all renderings are always lossy. Just go from a language such as German that uses gendered professions to a language such as Turkish or English that does not. There is a necessary loss of information in the English and Turkish translation. --DVrandecic (WMF) (talk) 00:47, 5 August 2020 (UTC)
@DVrandecic (WMF): Yes, that's why I said "we think of rendering as a "lossy" conversion. But the lesson from this should be that rendering (per se) should be constrained to be "lossless" [...which...] implies that there will be subsequent lossy conversion required to achieve the required end result, but we need to think about ways in which the "final losses" can be retained ... so that ... an editor can modify the end result with a reasonable expectation that the revised version can be parsed back..." [emphasis added]. In your example, we (or she) may prefer "actor" to "actress", but the rendering is more like "actor"<f> (with the sex marker not displayed). In the same way, the renderer might deliver something like "<[[Judi Dench|Dame >Judi<th Olivia> Dench< CH DBE FRSA]]>" to be finally rendered in text as "Judi Dench", or "<[[Judi Dench|>Dame Judi<th Olivia Dench CH DBE FRSA]]>" for "Dame Judi", or just "<[[Judi Dench|>she<]]>" for "she". (No, that's not pretty, but you get the idea. "...we need to think about ways in which the "final losses" can be retained..." ["she", "[[Judi Dench|", "]]"]?)--GrounderUK (talk) 20:48, 6 August 2020 (UTC)
@GrounderUK: Ah, OK, yes, that would work. That seems to require some UX to ensure that these hidden fields get entered when the user writes the content? --DVrandecic (WMF) (talk) 22:11, 6 August 2020 (UTC)
@DVrandecic (WMF):Not sure... I'm talking about rendered output that might be tweaked after it is realised as wikitext. If the wikitext is full of comments, a quick clean-up option would be great. But if the final lossy conversion has already happened, then it's a question of automatically getting back the "final losses" from wherever we put them and merging them back into our wikitext before we start the edit. You might even create the page with the heavily commented wikitext (which looks fine until you want to change it) and then "replace" it with the cleaned up version, so an editor can go back to the original through the page history, if the need arises.
If you're talking about creating the language-neutral content from natural-language input in the first place, that's a whole other UX. But with templates and functions and what-have-you, we could go quite a way with just the familiar wikitext, if we had to (and I have to say that I do usually end up not using the visual editor when it's available, for one reason or another).
Either way, there might be a generic template wrapping a generic function (or whatever) to cover the cases where nothing better is available. Let's call it "unlossy" with (unquoted) arguments being, say, the end-result string and (all optional) the page name, the preceding lost-string, the following lost-string and a link switch. So, {{unlossy|she|Judi Dench| | |unlinked}} or {{unlossy|Dame Judi|Judi Dench| |th Olivia Dench CH DBE FRSA|linked}}. (There are inversions and interpolations etc to consider, but first things first.)
In general, though, (and ignoring formatting) I'd expect there would be something more specific, like {{pronoun|Judi Dench|subject}} or {{pronoun|{{PAGENAME}}|subject}} or {{UKformal|{{PAGENAME}}|short}}. If {{PAGENAME}} and subject are the defaults, it could just be {{pronoun}}. Now, if humans have these templates (or function wrappers) available, why wouldn't the rendering decide to use them in the first place? Instead of rendering, say, {"type":"function call", "function":"anaphora", "wd":"Q28054", "lang":"en", "case":"nominative"} as "she", it renders (also?) as "{{pronoun|Judi Dench}}", which is implicitly {{pronoun|Judi Dench|subject}}, which (given enwiki as the context) maps to something like {"type":"function call", "function":"anaphora", "wd":"Q28054", "lang":"en", "case":"nominative"} (Surprise!). As an editor, I can now just change "pronoun" to "first name" or "UKformal"... and all is as I would expect. That's really not so painful, is it?--GrounderUK (talk) 03:24, 7 August 2020 (UTC)

Hybrid article[edit]

My proposed solution is a "hybrid article", where every part of the article may contain two parts:

  • A "abstract" or "semantic" part, which is a rendered result of abstract content.
  • A "opaque" part, which is language-specific text.

Parts of article may be intergrated with abstract content in these modes:

  • Having "abstract" part only - this will be the default for most short articles especially bot-generated ones, and some parts of article like the infobox.
  • Having both "abstract" and "opaque" part - the "opaque" part is shown, overriding the "abstract" one. We may have some way to indicate whether the override is permanent (usually it should not). If it is not, eventually the abstract content should be edited and the override removed.
  • Having "opaque" part - all existing articles can be easily transformed to this mode. This will generate some placeholder abstract content like {{#switch:{{int:lang}}|en=Text1|de=Text2}} where the content is automatically updated with local Wikipedias (if exists and local Wikipedias uses them), and may be converted to real abstract content in the future.

For editing a article:

  • Sections containing "abstract" part only - you may create an opaque content to it (with rendered result as default text), which may be converted to abstract one by technically-competent users; or more recommended, you may modify the abstract content, however it requires some technical skill (until we developed a rather advanced high-level editor).
  • Sections containing both "abstract" and "opaque" part - you may modify either the opaque or abstract content, though modifying the abstract content have no effect on rendered text (therefore, it may be a challenge on vandalism detection if some content is orphan); you may also remove the opaque content, making the abstract one rendered.
  • Sections containing only "opaque" part - You may modify the "opaque" part (which automatically updates the placeholder abstract content; translate the "opaque" part to another language even if there is no (explicit) article in that language; or create a real abstract content (recommended but maybe not user friendly).

BTW: we may expect there're very few languages that the Abstract article can be fully rendered by completing all renderers the article uses; the distribution of constructors - if no opaque content is involved - will follow Zipf's law and Heaps' law. --GZWDer (talk) 21:22, 5 July 2020 (UTC)

[I have created a new topic since what follows does not appear to relate to "Distillation of existing content"]
I think you are right to identify two separate functional components. Here, I shall call them InfoText and ArticleSkeleton.
InfoText is like an InfoBox, only it is natural language text. The values in the InfoText come from Wikidata and the editor is free to remove or replace these. The text around the values initially comes from a rendering of a language-neutral structure, also from Wikidata (This, I think, is your "semantic" part). I imagine this will be natural language WikiText which will be changed by the editor in the current way (your "opaque" part). However, I can see advantages in being able to mark sections of text as needing to remain automated (for the time being or indefinitely: your "hybrid").
ArticleSkeleton is the framework for an entire Article. Most simply, it is a sequence of article components (Infoboxes, media, categories and so on, and, of course, InfoTexts). It is probably inevitable that there will be optional and conditional components, but I see this as a requirement common to each level. From the top down, you get (skeleton) Articles, sections and sub-sections; from the bottom up, you get Wikidata values, properties and items, InfoText and nested compositions of InfoText. (Articles might also come in sets, the nested composition of which would ultimately be a Wikipedia or WikiWhatever.)
I don't see any need for an article having only an "abstract" part; it may be an optimisation but it looks to me exactly like a re-rendered ArticleSkeleton (a transclusion or "virtual article"). But if there is a benefit to transcluded InfoText at any level, you might get the option to keep the article virtual by default. The major benefit of the hybrid article, as I see it, is that the editor does not have to take responsibility for the entire article at the outset; multiple edits can be used to convert the instantiated virtual article into a fully adopted project article. In practice, I think the editor would instantiate a particular InfoText by changing its WikiText. Or the virtual article would be fully instantiated at outset and the editor would have the option to "re-virtualize" any particular InfoText (which marks it up for bot-editing).--GrounderUK (talk) 14:34, 6 July 2020 (UTC)
@GZWDer and GrounderUK: thanks for these descriptions, and yes, I agree. The assumption that an article is either coming from the Abstract Wikipedia or written locally is too simple, and really might be much more interesting, and you describe a number of interesting workflows and interplays that might happen. Whether there are skeletons or not is something that should be entirely within the hands of the community, not decided by the development team, and the system should support their creation but also work well without.
Having this more fine-grained model, where abstract and local parts are arbitrarily mixed, allows indeed further interesting workflows, as GWZDer points out, where a local editor, without having to show any regard for the abstract content, can simply materialize a rendering as text and then change it manually. If we manage to keep track of these changes, we can have queues where other contributors can then try to migrate these changes back into the abstract content so it can propagate into the other languages. Or not. As said, the workflows and processes regarding editorial decisions should always remain with the community, and never with the algorithms.
Again, thanks for the vivid description of use cases, I very much agree with them! --DVrandecic (WMF) (talk) 05:27, 14 July 2020 (UTC)
@ABaso (WMF): suggested continuing our discussion here. What I am envisioning is probably 3 or more layers of article content: (1) Language-specific, topic-specific content as we have now in the wikipedias, (2) Language-independent topic-specific content that would be hosted in a common repository and available to all languages that have appropriate renderers for the language-independent content (abstract wikipedia), (3) Language-independent generic content that can be auto-generated from Wikidata properties and appropriate renderers (generic in the sense of determined by Wikidata instance/subclass relations, for instance). I'm not entirely sure how the existing Reasonator and Article Placeholder functionalities work, but my impression was they are sort of at this 3rd or maybe even a lower 4th level, being as fully generic as possible. That probably makes them a useful starting point or lowest-functionality-level for this at least. The question of how to mix these various pieces together, and even harder how to present a useful UI for editing them, is definitely going to be tricky! ArthurPSmith (talk) 20:38, 28 July 2020 (UTC)
@ArthurPSmith:: (squeezing in) Yes, that is a great model to view it! Fully agreed to all three levels (what is the fourth level?) I totally expect those three levels to develop and to support all of them. And additionally to that have more or less generic functions for certain types of contents, e.g. taxons, researchers, locations, etc. similar to LSJbot. --DVrandecic (WMF) (talk) 00:52, 5 August 2020 (UTC)
@ArthurPSmith: I like to think of it from the bottom up, as you suggest. Every statement in Wikidata should have a fairly straightforward natural language representation. This is to convey the information that Article Placeholder pops into boxes at the moment (except that it leaves out some kinds of statement, based on a simple "blacklist", as I understand it, but that may just be the final filter). Let's look at a non-existent nynorsk article. Slightly less boring than most is Grammatical Framework (in Norwegian). You can probably work out what it says without knowing Norwegian. If you compare it with its Wikidata item you'll see the same information in the language of your choosing anyway. Notice the P348 statement. In Wikidata English it says "software version identifier 3.10, publication date 2 December 2018. 1 reference: reference URL https://www.grammaticalframework.org/download/release-3.10.html retrieved 4 May 2020". One way to render that into natural English would be "Version 3.10 was published on 2 December 2018 (according to information here on 4 May 2020). The nynorsk page puts the reference in a footnote, as you might expect. So its box says "versjon 3.102 utgjevingstidspunkt: 2. desember 2018" and the footnote says "2.↑https://www.grammaticalframework.org/download/release-3.10.html, 4. mai 2020". It's a fairly simple, real-world example, but notice that there is a little hierarchy there already: the value (3.10), the value's date, and the source (logically, for both value and date) and when that was accessed. Of course, that is just the start (as I see it), but I thought it was worth going through just to see what sort of questions it raises.
The next level is: when and how to combine two Wikidata statements (or more). Here, the first three look like they would go into a sentence beginning with the name of the Wikidata Item. We can see that, but how does an automated process achieve that? I'm pretty sure it can make good progress by considering a hierarchy of properties. "The one to beat" is instance of (P31) but some of these can be rather artificial or a bit weak; we want to be perhaps a bit more general, perhaps a bit more specific. So, yes, Grammatical Framework is a programming language (according to their website, accessed...) but it's more specifically "A programming language for multilingual grammar applications". That's the sort of "enriching" that human contributors to Wikipedias probably can't help wanting to do, once we hear it expressed in our own language. It's the difference between form-filling and writing. "Grammatical Framework is a programming language developed by Xerox Research Centre Europe, starting in 1998. The latest release, 3.10, came out at the end of 2018." Who uses it? What for? And so on...
Maybe there's some evidence from Norway or Wales or wherever the other Article Placeholder Wikipedias are of contributors changing Wikidata or creating articles to replace the placeholder. Neither of those is easy or likely to be undertaken lightly. Even if we could just insert some persistent text, associated with the Wikidata statement, you might be looking at a tenfold increase in interaction (almost certainly more or less than that). That's a signal. I'd hope that, like translators, some contributors would be able to check these annotations and see how they should be translated into Wikidata. That seems like a workflow that might be used more widely across WMF projects, but the particular challenge for the Article Placeholder is that there is no page to change. Denny talks about materialising a rendering as text (above) and that's a nice place to end up. But clicking on an Article Placeholder box or a Statement in Wikidata and typing a note as text in your own language, a note that pings itself off to a Wikidata translator... I don't know, how hard can it be?It's a start.--GrounderUK (talk) 02:17, 29 July 2020 (UTC)
I feel like you're trying to keep things at my "level 3" (or 4) - the key to what we're trying to do here though is I think "level 2" - how to express an interesting article ABOUT (say) Grammatical Framework (Q5593683) in a way that's independent of language, but specific to that particular topic. So there's no need to automatically select what things to say, the abstract content does that selecting for that specific topic: it could use P31 and P348 and ... together in this first sentence/paragraph, use these other properties in second and third etc., leave out certain properties, add additional information from other QItems that are related, add references here or there but not everywhere, etc. ArthurPSmith (talk) 14:30, 29 July 2020 (UTC)
@ArthurPSmith: I mostly agree with what you say, except that I'm not trying to keep things at any particular level. What I am suggesting is that the problem is self-similar at any conceivable level. I sum that up as "a nested composition of infoTexts". Well, theoretically infinite recursions are fine by me, but they must bottom out in the real world. And where it must bottom out, for our purposes, is the "value in context", the triple that represents... well... anything. So, what you're calling "level 2" is what I called "ArticleSkeleton": the things on a page (article), each represented by its identifier (a value) and the identifier's link to the identifier of the "Article". You can create that explicitly or you can derive it from your Wikidata Items. And that is true at the next level up (Category, as just one example) and it is true at the next level down (section, sub-section, paragraph, sub-paragraph, sentence... it really doesn't matter).
That's why I invented the term infoText (which I'm not attached to). An infoText is (defined to be) a composition of infoTexts "from the bottom up, you get Wikidata values, properties and items, InfoText and nested compositions of InfoText". An "elementary" infoText contains only values, properties and item identifiers; loosely, it corresponds to a Wikidata statement but, as the example above shows, you could/would decompose the statement into more fundamental infoTexts. Then, the statement's infoText is a composition of infoTexts, and the composition is fairly shallow and fairly narrow. We would expect some rendition of that infoText in the context of an article about Grammatical Framework, but also in the context of a paragraph about GF on a natural-language generation page or in a list of products on the Xerox page or in any other context (in any available language). But there we would expect a link to the GF page (and we may not care whether such a page exists) but no link to the page we're on.
Enough for now. Yes, given a particular Q, we can return a derived or pre-defined infoText (speaking hypothetically) but I doubt we'll have a separate pre-defined result for every conceivable use case. We can imagine what a "full" set might look like, and maybe a "minimal" set ("short description"), but less-than-full or more-than-minimal...? Pre-defined, explicitly, yes: just go ahead and define it as if it were an article. Derived from style guidelines and editorial policy but not specific to a particular Q, maybe: I guess our "full" set would respect some express cross-topic guidelines, which should be "adjustable" (level of language, level of subject expertise etc). Let's see what people want.--GrounderUK (talk) 17:19, 29 July 2020 (UTC)
I really don't think it's self-similar. Not every Wikidata item will have an "ArticleSkeleton" (level 2 content) - for those that don't presumably something like Article Placeholder is probably fine. For those that do, though, it will in my view be a fully defined page of content, perhaps calling in other things like templates and infoboxes but those are just like images, not really part of the main content of the page. I don't see how sections, paragraphs etc. can be defined separately from the main content on a specific topic (a single Wikidata item). ArthurPSmith (talk) 18:01, 29 July 2020 (UTC)
@ArthurPSmith: I agree that not every Wikidata item will need a bespoke ArticleSkeleton. Some probably shouldn't even have an Article Placeholder (Q22665571, for example). So we have a Requirement [a] to be able to identify these, perhaps with a specific Property statement about the QItem, perhaps according to rules relating to the QItem's properties (or values within some QItem properties or combinations of values and properties for the QItem and/or other QItems related in one way or another...).
The sub-class of QItems that might have an ArticleSkeleton, we might as well call encyclopedic QItems. One Wikipedia has a page on the subject, another might just have a section or an infobox. They have their view on the subject and they don't want Wikidata's. We might say it's all or nothing, or we might say you can opt in to some categories and opt out of others, and if you have categories you opt into, you might want to opt out of particular sub-categories, including individual QItems. So that's another possible Requirement [b]. And, self-similar or not, there might be certain images that certain Wikipedias would object to, and there might be certain claims... So this might be another possible Requirement [c].
Well, if the language-neutral ArticleSkeleton is going to include templates and images and infoboxes and anything else that might be present on a current Wikipedia page (and why would it not?), then they absolutely must be considered to be "really part of the main content of the page". Maybe the inclusion of a particular image or one of a category of images would, by implication, lead to the whole article being unacceptable, as in Requirement [b], but I think we should consider whether implied unacceptability is a different Requirement [d]
So, we should consider:
  • Requirement [a]: Implicit and/or explicit exclusion of a Wikidata Item (explicit could be a special case of implicit but not vice versa)
  • Requirement [b]: Wikipedias can opt out of sets of Items (including a single Item)
  • Requirement [c]: Wikipedias can opt out of sets of images, templates etc (including sets of one), and of specific Wikidata claims (or types of claim, loosely defined...)
  • Requirement [d]: Inclusion of anything subject to [c], if unconditional, implies opt-out by Wikipedias opting out under [c] (opt-outs are inherited upwards by unconditional inclusion)
Do those requirements make sense? Feel free to suggest amendments and additions, obviously. It might be worth splitting [c] up.--GrounderUK (talk) 20:23, 29 July 2020 (UTC)

I just wanted to say, that yes, I agree that we should have the possibility to create hybrid articles. Have the lead written locally, include an infobox and a section, etc., and it should be a seamless experience for the reader - and it will be difficult to get the experience for the contributors just right (but that's already true today with the mix of templates and module calls, etc.). In general I see that there will be some types of articles which will be very much generated with a simple call and assembled from data on Wikidata (Level 3, as described by ArthurPSmith), and this will be an important contribution from the new system - but I also really want us to get to the point where we have Level 2 content. I am convinced that many very interesting topics (such as photosynthesis or World War II or language) will be really hard to do with an approach rooted in Level 3, and will be only accessible to a Level 2 approach (hopefully - it would be sad if we learn that only a Level 1 approach works for them). But thanks for this interesting discussion! The important thing is to ensure, and I think we are doing that, to allow for all the different levels mentioned here - and also for a mix of these per article. --DVrandecic (WMF) (talk) 01:07, 5 August 2020 (UTC)

Transcluded talk pages[edit]

Non-Wikipedia Content[edit]

It appears that the abstract content will be stored on Wikidata? Is that correct? I'm curious where non-wikipedia content would be stored? For instance, let's say we hypothetically wanted to create an "Abstract Wikivoyage" would that be a new wiki or would that abstract content be stored on Wikidata as well? --DBarratt (WMF) (talk) 19:27, 2 July 2020 (UTC)

This can easily be achieved by creating a specific part of content. See Talk:Abstract_Wikipedia#Two_small_questions. An article for Paris will have a "main" part and a "Wikivoyage" part, where the "Wikivoyage" may transclude "Wikivoyage-Getaround" part and "Wikivoyage-See" part. (In my opinion, the "Wikivoyage" part will simply be a GeneralWikivoyageArticle(), which automatically transclude other parts; the "Wikivoyage-See" part will be a WikivoyageSee(), which query information from Wikidata andd format them to prose. Both will no longer require manual maintanence. The "Wikivoyage-Getaround" part will still be a complex Content.)--GZWDer (talk) 14:21, 3 July 2020 (UTC)
That's a great question we don't have an answer for yet. What GZWDer suggest could work. Also it could be stored in Wikilambda directly. Or we could add another namespace in Wikidata? (Probably not). If I see that correctly, Wikivoyage would be, fortunately, the only other project that has that issue, as it overlaps so much regarding items with Wikipedia but has so much different content. So yes, once we get to that point we would need to figure out a solution together with the Wikivoyage communities. Great question! --DVrandecic (WMF) (talk) 01:14, 5 August 2020 (UTC)
Motivated by this question, I don't think the other sisters will have the Abstract Wikipedia functionalities, no? I'm an admin at SqQuote and I've been wanting to ask for a long time but I thought it was Wikipedia only in the beginning. This question made me curious now. - Klein Muçi (talk) 09:56, 7 August 2020 (UTC)
Adding on the above question: What about help pages in Wikipedia located in Wikipedia/Help namespaces? - Klein Muçi (talk) 12:23, 7 August 2020 (UTC)

From Talk:Abstract_Wikipedia/Plan

Primary goals[edit]

"Allowing more people to read more content in their language"

Rather than "in their language", I think it should be "in the language they choose". Some people have more than one language. Some people may want to read content in a language that is not one of their own, either because they are learning that language, or because they are studying in that language, or for other reasons.--GrounderUK (talk) 15:58, 6 July 2020 (UTC)

@GrounderUK: Good point, and I've tweaked it to your suggestion. That seems to keep it concise, whilst being a bit clearer. Thanks. Quiddity (WMF) (talk) 05:51, 21 July 2020 (UTC)

From Talk:Abstract_Wikipedia/Goals

Renderers[edit]

"Solution 4 has the disadvantage that many functions can be shared between the different languages, and by moving the Renderers and functions to the local Wikipedias we forfeit that possibility. Also, by relegating the Renderers to the local Wikipedias, we miss out on the potential that an independent catalog of functions could achieve."

It seems that we are not neutral on this question! It is not obvious that we cannot have the functions in Wikilambda, the data in Wikidata and the implementation in the local Wikipedias. I don't propose that we should, but would such an architecture be "relegating" Renderers? Or is it not a form of Solution 4 at all?
I think we need a broader view of the architecture and how it supports the "Content journey". In this context, Content means the Wikipedia (or other WikiCommunity) articles and their components. Consumption of Content is primarily by people reading articles in a language of their choice, in accordance with the project's first Primary Goal. I take this to be the end of the "Content journey". I tend to presume that it is also the start of the journey: Editors enter text into articles in a language of their choice. In practice, this means that they edit articles in the Wikipedia for their chosen language. This seems to be the intent of the project's second Primary Goal but it is not clear how the architecture supports this.

"We think it is advantageous for communication and community building to introduce a new project, Wikilambda, for a new form of knowledge assets, functions, which include Renderers. This would speak for Solution 2 and 3."

Clearly, Renderers are functions and, as such, they should reside with other global functions in what we are calling Wikilambda. However, Renderers are not pure function; they are function plus "knowledge". Since some of that knowledge resides within editors of the natural language Wikipedias, whose primary focus may well be the creation and improvement of content in their chosen language, I am inclined to conclude that natural language knowledge should be acquired for the Wikilambda project from the natural language Wikipedias and their editors' contributions. As with encyclopedic content, there may well be a journey into Wikidata, with the result that the Renderers are technically fully located within Wilkilambda and Wikidata (which is not quite Solution 2).

"Because of these reasons, we favor Solution 2 and assume it for the rest of the proposal. If we switch to another, the project plan can be easily accommodated (besides for Solution 4, which would need quite some rewriting)."

I'd like to understand what re-writing Solution 4 would demand. I take for granted that foundational rendering functions are developed within Wikilambda and are aligned to content in Wikidata, but is there some technical constraint that inhibits a community-specific fork of their natural language renderer that uses some of the community's locally developed functionality?--GrounderUK (talk) 18:31, 6 July 2020 (UTC)

Hmm. Your point is valid. If we merely think about overwriting some of the functions locally, then that could be a possibility. But I am afraid that would end up in difficulties in maintaining the system, and also possibly hampering external reuse. Also, it would require to add the functionality to maintain, edit, and curate functions to all existing projects. Not impossible, but much more intrusive than to add it to a single project dedicated to it. So yes, it might be workable. What I don't see though, and maybe you can help me with that - what would be the advantage of that solution? --DVrandecic (WMF) (talk) 01:22, 5 August 2020 (UTC)

@DVrandecic (WMF): I'm happy to help where I can, but I'm not sure it's a fair question. I don't see this as a question of which architecture is better or worse. As I see it, it is a matter of explaining our current assumptions, as they are documented in the main page. What I find concerning is not that Solution 2 is favored, nor the implication that Solution 4 is the least favored, it is the vagueness of "quite some rewriting". It sounds bad, but I've no idea how bad. This is more about the journey than the destination, or about planning the journey... It's like you're saying "let's head for 2, for now, we can always decide later if we'd rather go to 1 or 3; so long as we're all agreed that we're not going to 4!"
Not to strain the analogy too much(?), my reply is something like, "With traffic like this we'll probably end up in 4 anyway!" The "traffic" here is a combination of geopolitics and human nature, the same forces that drove us to many Wikipedias, and a Wiktionary in every language for all the words in all the languages ...translated. Wikidata has been a great help (Thank You!) and I'm certainly hoping for better with added NLG. But NLG brings us closer to human hearts and may provoke "irrational" responses. If a Wikipedia demands control over "its own" language (and renderer functions), or a national government does so, how could WMF respond?
In any event (staying upbeat), I'm not sure that "Renderer" is best viewed as an "architectural" component. I see it as more distributed functionality. Some of the more "editorial" aspects (style, content, appropriate jargon...) are currently matters of project autonymy. How these policies and guidelines can interact with a "renderer's" navigation of language-neutral encyclopedic and linguistic content is, of course, a fascinating topic for future elaboration.--GrounderUK (talk) 13:49, 5 August 2020 (UTC)

@GrounderUK: Ah, to explain what I mean with "quite some rewriting", I literally mean that the proposal would need to be rewritten in some part, because the proposal is written with Solution 2 in mind. So, no, that wouldn't be a real blocker and it wouldn't be that bad - we are talking merely about the proposal. So no, it wouldn't be that bad.

I think I understand your point, and it's a good one, and here's my reply to that point: if we speak of the renderers, it sounds like this is a monolithic thing, but in fact, they are built from many different pieces. And the content as well, is not a monolith, but a complex best with parts and sections. The whole thing is not a all-or-nothing thing: a local Wikipedia will have the opportunity either to pull in everything from the common abstract repository, or they can choose to pull in only certain parts. They can also create alternative renderers and functions in Wikilambda, and call these instead of the (standard?) renderers. In the end, the local Wikipedia decides which renderer to call with which content and with which parameters.

So there really should be no need to override individual renderers in their local Wikipedia, as they can create alternative renderers in Wikilambda and use those instead. And again, I think there is an opportunity to collaborate: if two language community have a common concern around some type of content, they can develop their alternative renderers in Wikilambda and share the work there. I hope that makes sense. --DVrandecic (WMF) (talk) 22:01, 6 August 2020 (UTC)

@DVrandecic (WMF):What a terrible tool this natural-language can be! Thanks, Denny. That is a whole lot less worrying and more understandable (well, I think we basically agree about all of this except, perhaps, "standard?" would be "existing").--GrounderUK (talk) 22:29, 6 August 2020 (UTC)

From Talk:Abstract_Wikipedia/Architecture

-- end of transclusions --[edit]

Wikilambda vs. rebranding[edit]

The project seems to be called Wikilambda, which is in line with the other projects. Then why is it called Abstract Wikipedia almost everywhere? Is this related to the (senseless) rebranding of WikiMedia to Wikipedia? Is this an attempt for this project to get a headstart and capitalize on the reputation of Wikipedia? Don't rebrand, don't call things that aren't wikipedia Wikipedia and don't behave like a commercial company. Zanaq (talk) 07:43, 13 July 2020 (UTC)

This project will be more releated to Wikipedia than other sister projects. Imagine this as supplement to existing content of Wikipedias or another language version. So I think the name may be good. Note this name may be changed. Wargo (talk) 20:49, 13 July 2020 (UTC)
If you asked 100 people to explain the idea behind the project to a journalist from their local newspaper, how many would use the word "Wikipedia" in the first sentence? --Gnom (talk) 22:03, 13 July 2020 (UTC)
Abstract Wikipedia is a development project for all the ideas. There will likely not be an actual Wikimedia project with this name.--GZWDer (talk) 03:22, 14 July 2020 (UTC)

@Zanaq: Both Wikilambda and Abstract Wikipedia are distinct parts of the proposal and have always been (Wikilambda is the wiki for functions, and Abstract Wikipedia the repository for abstract content), and in communications it was just easier to use one name. Both names are preliminary and will be replaced within the year anyway, and the community will be choosing the final name. If you have good proposals for the name, feel free to add them on this page, and join us on the mailing list to discuss the names later this summer! Thanks! --DVrandecic (WMF) (talk) 03:31, 15 July 2020 (UTC)

  • @DVrandecic (WMF): That leaves the question of why you didn't chose the less controversial name and called it Wikilambda. It would have been easy to make a choice that produces less community pushback. How about taking care of not wasting some of your political capital on fights that aren't important to you? Especially going forward? ChristianKl❫ 13:55, 15 July 2020 (UTC)
    @ChristianKl: I thought the name 'Abstract Wikipedia' would be more evocative of what it is doing than 'Wikilambda' for this particular communication. Since it was explicitly stating that it would change, I thought let's use the name that makes the explanation easier to understand. --DVrandecic (WMF) (talk) 17:29, 15 July 2020 (UTC)
  • I'd personally keep a lambda in the logo for geekiness/simplicity, regardless of what the eventual name will be. John M Wolfson (talk) 03:36, 16 July 2020 (UTC)

Wikispore Day July 19[edit]

You're welcome to join us at Wikispore Day on Sunday July 19, it starts at 1pm Eastern / 17:00 UTC, and it will run for about 2 hours. You can RSVP on that page and possibly give a lightning talk on any Wikispore-adjacent topic. You will also be able to participate and ask questions via the YouTube livestream here.--Pharos (talk) 12:33, 19 July 2020 (UTC)

Wikidata VS Wikipedia - General structure[edit]

Hello everyone! I'm a strong supporter of using multiprojects infrastructures. (Something like that has been long needed on templates and modules but that's another subject.) I like the idea behind this new project but I had a somewhat technical dilemma that I believe it does arise from me not being tech-savvy enough. While the idea of auto-generating articles from Wikidata is good on itself, why not use the already-built multilingual structure we have on Wikipedia itself? Wouldn't it be easier, from a technical point of view, transforming Wikipedia in a multilingual project Wikidata/Commons/Meta-alike where articles appear in different languages according to the users preferences?

I understand that 2 problems may arise from this:

  1. Unlike Meta/Wikidata pages, there are no homologous articles for every article in every language;
  2. The same article, can change in POV regarding the culture (especially in Small Wikis where neutrality is not strong enough);

But I believe there are enough articles in Wikipedias in different language to start a transition like this and fix it along the way. And maybe after Wikipedia, other Wiki projects can join too in this style.

So, the question I ask is why is there a need to autogenerate articles from Wikidata when we already have a multilingual structure we can use?

I must make it clear that I'm not against the idea per se. I just want to better understand what is the drive behind the need of auto-generating articles from Wikidata. (I'm not even against the auto-generation itself.) - Klein Muçi (talk) 14:27, 21 July 2020 (UTC)

@Klein Muçi: Before I reply, please note my Symbol strong support vote.svg Strong support for this project. But I Support Support the goal much more than the Abstract Wikipedia/Plan. My reply to you is: Let's do this! I'll ping @Pharos: here because I'm not sure Wikispore is 100% ready for multi-lingual collaboration with heavy Wikidata integration (even on a small scale). I suggest a few rules.
  1. Every translingual Wikipedia article page (Q50081413) must take its MediaWiki page title (Q83336425) from 1 Wikidata item.
  2. The Wikidata item must have 1 Wikipedia article page in >1 Wikipedia.
  3. The translingual Wikipedia article page must contain >1 claim (Q16354754), which should be embedded in wiki markup text in 1 natural language that can be (or has been) the successful target of (input/output to/from) machine translation.
  4. Words (of the natural language) which do not come from Wikidata must be present in the controlled vocabulary, where each word links to a lexeme in Wikidata (ideally one which has the required sense (Q54285715), i.e. meaning).
Try viewing this with a language other than English selected. If the Wikipedia links don't work, there could be data missing in Wikidata or pages absent from the Wikipedia in that language. It would be nice if it would fall back to the Wikidata Item, but even blanking the site or entering "wikidata" will not do that; you have to copy the Item ID into the search field.--GrounderUK (talk) 19:40, 21 July 2020 (UTC)
@GrounderUK: I'm really sorry but I can barely understand what you have written. First of all, I think you haven't answered my question. Maybe you haven't understood what I was asking or maybe I'm not technically informed enough to understand how your answer relates to it. Secondly, using the Wikidata items as words is SO CONFUSING to me. I can literally not understand one single sentence. Maybe it's just with Albanian (my language) but most of the translations (if they exist) are really bad. - Klein Muçi (talk) 00:08, 22 July 2020 (UTC)
@Klein Muçi: I'm sorry too. I don't speak Albanian but I asked Google to translate its Albanian back into English, French and German and it seemed to manage. I hope this helps.
@Klein Muçi: Para se të përgjigjem, ju lutem vini re votën time të mbështetjes së fortë të Symbol.svg Mbështetje e fortë për këtë projekt. Por unë mbështes mbështesin qëllimin shumë më tepër sesa Wikipedia / Plani Abstrakt. Përgjigja ime për ju është: Le ta bëjmë këtë! Unë do të ping @Pharos: këtu sepse nuk jam i sigurt se Wikispore është 100% gati për bashkëpunim shumë-gjuhësor me integrimin e rëndë të Wikidata (madje edhe në një shkallë të vogël). Unë sugjeroj disa rregulla.
  1. Everydo faqe artikull transkriptues i Wikipedia (Q50081413) duhet të marrë titullin e faqes së saj MediaWiki (Q83336425) nga 1 artikull Wikidata.
  2. Artikulli Wikidata duhet të ketë 1 faqe të artikullit të Wikipedia në >1 Wikipedia.
  3. Faqja e artikujve ndërgjuhësorë të Wikipedia duhet të përmbajë >1 kërkesë (Q16354754), e cila duhet të futet në tekstin e markup-it në 1 gjuhë natyrore që mund të jetë (ose ka qenë) shënjestra e suksesshme e përkthimit të makinës (hyrje / dalje nga / nga).
  4. Fjalët (të gjuhës natyrore) të cilat nuk vijnë nga Wikidata duhet të jenë të pranishme në fjalorin e kontrolluar, ku secila fjalë lidhet me një leksemë në Wikidata (në mënyrë ideale ajo që ka kuptimin e kërkuar (Q54285715), d.m.th. kuptimin).--GrounderUK (talk) 00:37, 22 July 2020 (UTC)
@GrounderUK: Oh, thank you for taking the long way to make it easier for me to understand what you meant. Unfortunately I'm still not sure how that answer relates to what I asked in the beginning. I'll try to rewrite my question in a very short way.
@Klein Muçi: I did not give a direct answer to your question. My point was that we can do what you suggest right now ("Let's do this!").
What I think Abstract Wikipedia will bring (maybe I've got it wrong): A new project on a new domain populated by articles written automatically on different languages by the information that already exists on Wikidata.
I don't think so. The articles written automatically would be in the existing Wikipedias, if they were asked for. But, yes, the information would come from Wikidata.
What I think could maybe be a better idea (since we're talking innovations): Don't create a new project on a new domain. Transform Wikipedia so it works on the same domain populated by the articles that already do exist in different languages not written automatically. A bit similar to how Meta works. The content's language gets determined by users' preferences.
I'm not sure how this is different from what we have now. (Well, I can see it's not the same, but maybe we could already be doing this). Meta works for content because people translate content that is marked up for translation. People can also translate a Wikipedia article into a different language (although not in the "same domain"). But translation is hard work and mostly (outside of meta) it doesn't get done. If you want to say "If there's no Albanian page, show me the English page", that is quite hard at the moment. It's not too hard for us, the readers, but it could certainly be made easier.
Now, I'm not sure my idea is better but it is there mostly to allow me to better explain what I'm asking: Where does this drive/need/desire for auto-creation of articles in a newly made domain come from? I feel like we already have a good enough multilingual infrastructure (with not much auto-creation) which we can better perfect, if we so choose to do. Why do we need a new domain made specifically for new auto-generated articles?
Well, I said we don't need a new domain. What we have is not good enough, however, because most languages have a fairly small number of articles, compared with the larger Wikipedias. This has always been true and the smaller Wikipedias don't seem to be catching up (perhaps there are a few exceptions). Nothing is stopping us from trying but time is always limited. If we can make the information in Wikidata understandable to average or typical readers, no matter what languages they speak, they will have more information available sooner. And if everyone can improve the information in Wikidata, then everyone can share the better information immediately. I think it is here we need what you call "a new domain". To me, it's just a different way of editing Wikidata, one where we can also say how we want the information to appear in different languages.
I know there must be benefits which I'm overlooking and that's why I asked. Hope I've been a bit more clear now. - Klein Muçi (talk) 01:08, 22 July 2020 (UTC)
You were clear before. It's not for me to justify Wikimedia Foundation decisions, but I hope you can see why I do support this one. But this new project will take a long time and may not be able to give us what we hope for. So I also support suggestions like yours, which might give us benefits today.
A small multi-lingual Wikipedia that follows my four rules is a project that interests me. It might help us to find out what is going to be hard for "Abstract Wikipedia". Perhaps you can see that my first rule addresses your first problem. By making the Wikidata Item ID the title of the page, it should display in the user's current language (and we can fix Wikidata if it does not). My second and third rules aim to address your POV concerns; we automatically inherit Wikidata and multiple Wikipedia neutrality (and notability). The automatic translation in rule three is to help monitor any bias that might be present in the surrounding natural-language text. Rule 4 also helps with this; it makes it easier to use words that have been used before rather than words with a similar meaning that might be "loaded" (not neutral POV) terms in some contexts. It also encourages us to improve the lexical information in Wikidata, which is supposed to be used for our automatic articles (eventually).--GrounderUK (talk) 03:11, 22 July 2020 (UTC)

@GrounderUK: Oh, okay. Now I fully understand what you mean. The problem was that I didn't expect for people to like my suggestion and I was just looking for an explanation about my question and how what I wanted was unfavorable. You agreeing with it (and even with the Abstract Wikipedia plan) confused me. Sorry for taking too much time to understand that.

As for I'm not sure how this is different from what we have now., you've explained it yourself. We could have 1 singular domain with all the languages and you get shown the article/main page of your chosen language. If it doesn't exist yet, you get shown:

  • a) the English version (?)
  • b) an auto-generated version from Wikidata (in your language)

You also have unified gadgets, templates and modules in multi-lingual versions.

What I propose is not ideal though because I think the worldwide communities need to have some autonomy and not be fully merged (I think that wouldn't benefit the project overall for many reasons) and also you have the not-so-small technical details that all the communities have set up differently from each other and would want to keep (different templates/modules etc.) Again, part of the autonomy problem.

Maybe we can keep what we already have and just implement the unified gadgets/modules/templates and add the B option from above. Auto-generated versions for articles missing in some languages. But the idea of creating a new domain to be populated just by auto-generated articles from Wikidata seems odd to me. (Even though you mentioned above that that is NOT the case about Abstract Wikipedia.) If it were more like a tool function, that you get an auto-generated article from WD when an article is missing from your language, (basically what I described above), that would be good. But having a whole domain filled with these articles... I don't know but it seems odd. Content Translation Tool already is great at filling up the language-gap. Many new articles, if not all, in small wikis come from it. It's basically auto-generated in your language and it's not all auto so the semantics are good enough too. We could make CTT even more powerful and just work towards some unification between the technical aspects like gadgets/modules/templates. We could even have the WD auto-articles as a "tool". It's just the idea of a whole new domain filled with auto-generated articles that doesn't feel right to me. And I do believe bot-translation can be of good quality if the process is set up right. I'm just appalled that we are thinking of things like these NOW. After more than 15 years of existence, after having Wikipedia-s with millions of articles in different languages, after developing tools like CTT... Now it just doesn't look that much needed like a feature, especially deserving as much attention as a whole new domain. That's why I wrote in the first place. To quench my curiosity of what was the drive behind the approach that was chosen and why that is not an "outdated" approach, as I so say above. But I do thank you a lot for finding time to fully explain yourself to me and also giving new approach ideas on this subject. :) - Klein Muçi (talk) 10:26, 22 July 2020 (UTC)

Ju lutem, @Klein Muçi: You're welcome. (I hope that's right.)
As I said, it's the goal that interests me more than how we get there. Unified gadgets/modules/templates are part of the plan. In fact, support for these are the first part of the plan: a wiki of functions.
Someone else will have to explain why something like your a and b can't be done right now. As a human being, I can go to Wikidata, search for my new topic, select the likely Wikidata item and follow the link to any Wikipedia that has an article on that topic. (OK, it's not super helpful that the links are nearly at the bottom.) But once I know Albania is Q222, I can go straight to any Wikipedia in any language. Your language's Wikipedia page for Albania (?), for example. This is the same as clicking the right language in the sidebar of any Wikipedia page ("Në gjuhë të tjera"). So... it should be really easy for a Wikipedia page to provide links to all the articles in other Wikipedias, given only the Q-number (because they already do so, in the sidebar). I say "should be" because I don't recall seeing such a thing.
Maybe that helps. But keep focusing on the goal and share all the ideas you have. That way, someone might help you get to where you want to be sooner rather than later. See mw:Extension:Article Placeholder for an extension that some smaller wikis have; it returns some Wikidata content if a page does not already exist on that Wikipedia. nn:Spesial:AboutTopic/Q845189 is an example (and it gives other Wikipedias in the sidebar).--GrounderUK (talk) 12:06, 22 July 2020 (UTC)
@GrounderUK: haha Yes, you're right!
Yes, I understand what you mean. I'm an interface administrator for SqWiki and SqQuote and I put great interest in the way things appear and how to make more them more intuitive, especially for new editors, since I'm involved in a lot of offline wiki-workshops. To be honest, the Wikipedia's interface does seem a bit outdated in some ways for the new generation of users (take a look here if you are interested to know more of my opinion on this subject) and WikiData could use some reworking to make it more new-user friendly but I know that if you learn how to navigate it, it's a powerful tool. (You can also take a look here if you are interested in knowing my opinion on intersite/foreign-friendliness traffic.) Given my work on subjects like these, I've literally seen every extension in here one by one and asked for a couple of them to be activated in Phabricator for SqWiki/SqQuote but was denied because they were not part of the Wikimedia bundle yet. We would benefit so much by Boileroom in SqQuote but, alas, we can't have that. I've seen what you suggest and I've been interested in activating Article Placeholder for SqWiki before but then I saw it required you to go to special pages and then look for the missing article before showing the article to you. After seeing that, it didn't look that beneficial anymore since the "special pages" place is part of the "dark unknown" places for average readers/contributors. That's why this new project we're discussing looks interesting to me (as it does to you). And to prove I'm not against auto-created articles or artificial help in general, I'd invite you to take a look here.
Anyway, in general I support everything that leads to this. I spend far too much time looking after the citation modules and templates on SqWiki and SqQuote periodically as they continuously evolve on EnWiki and most of that work is just copy-pasting with little change. And I can't just literally copy-paste the whole code because it would break the translation inbetween so the only way is to copy-paste line by line. Global templates/modules/gadgets would save so much time on these situations. - Klein Muçi (talk) 15:31, 22 July 2020 (UTC)

@Klein Muçi: I am not against your solution, not at all. And @Cscott: had a presentation at Wikimania last year proposing that. I would love to see that happen. I think that the issues with that solution are much less technical and much more social / project-political - and definitely nothing I want the Abstract Wikipedia to block on. So, yes, please, go ahead and gather support for that idea! --DVrandecic (WMF) (talk) 01:45, 5 August 2020 (UTC)

@DVrandecic (WMF): Oh, I didn't know there was already a discussion on-going for my proposal. Thank you for mentioning that! Just to make sure I'm not misunderstood by you though... I want to emphasize that I'm not against the AW project. It just seemed strange to me that we would need auto-generated articles now that we already have high coverage of many subjects in different languages. But maybe I'm wrong and my POV is too localized and in the global scale that's not true. - Klein Muçi (talk) 09:25, 5 August 2020 (UTC)
@Klein Muçi: I am surprised you say that there is a high coverage of many subjects in many different languages. Your home wiki is the Albanian one, right? Albanian has about 80,000 articles, and that is great - but when I take a look for the island I come from, Brač, I can see that none of the villages and towns have articles. Or when I look at the list of Nobel prize winners in Physics, it is somewhat out of date, and many of the Nobel prize winners have no articles. Wouldn't it be nice if those could be included as a baseline content from a common source until you get around to write the articles? Also it would allow the Albanian community to focus on the topics they really care about. And even to bring knowledge about Albanian topics to the common baseline content. I would be really interested in your thoughts on that. --DVrandecic (WMF) (talk) 21:21, 5 August 2020 (UTC)
@DVrandecic (WMF): yes, it sure would. But wouldn't it be better/easier if that baseline content came from existing articles auto-translated/auto-generated by a tool like CTT by other wikis (mostly EnWiki, since it is the biggest) and users fixing the machine-translation problems? Why do we have to have a totally new domain and new code being written "from scratch" just to "teach the AI" how to write articles? Maybe is technically easier that way? That's the reason? I'm just trying to understand what was the drive behind this approach, as I've said many times. Not really opposing it. - Klein Muçi (talk) 23:28, 5 August 2020 (UTC)
@Klein Muçi: There are a number of issues with translation. First, we don't have automatic translation for many languages. Second, even for those we do, we need someone to check the translation results before incorporating these. The hope is that Abstract Wikipedia will generate content of a consistently high quality that it can be incorporated by the local Wikipedias without the necessity to check each individual content. And third, translation doesn't help with updates. If the world changes, and the English Wikipedia article gets updated, there is nothing that keeps the local translation current. CTT is a great tool, and I love it, and there will be great use cases for it, and the Foundation will continue to develop and improve it. Abstract Wikipedia aims at a slightly different mode of operation, and we'll basically complement some new use cases like the ability to keep content fresh and current. I hope that helps to understand the difference. Let me know if it is still unclear. Oh, also, there is no AI writing the articles - it is entirely driven and controlled by the community. --DVrandecic (WMF) (talk) 22:07, 6 August 2020 (UTC)
@DVrandecic (WMF): ah, automatic content update. I hadn't thought of that feature. Well, that's a big plus towards simple translation. No, it's clear enough now.
PS: I know there is no real AI. :P I used it to imply the automatic creation driven by the, of course, human users - contrasting that with a more organic way that CTT has. Thank you for the explanations! :) - Klein Muçi (talk) 23:00, 6 August 2020 (UTC)

[edit]

I don't know if a potential logo has been discussed, but I propose the Wikipedia Logo with Pictograms replacing the various characters from different languages. I might mock one up if I feel like it. If you have any ideas, or want to inform me there already is a logo, please ping me below. Cheers, WikiMacaroons (talk) 18:00, 26 July 2020 (UTC)

The logo discussion has not started yet. We're going to have the naming discussion first, and once the name is decided, the logo will follow up. Feel free to start a subpage for the logo, maybe here if you want, to collect ideas. --DVrandecic (WMF) (talk) 01:46, 5 August 2020 (UTC)
Thanks, DVrandecic (WMF), perhaps I shall :) WikiMacaroons (talk) 21:57, 7 August 2020 (UTC)

Technical discussion and documentation[edit]

  • We need a page for documenting how development is going on. I have drafted Abstract Wikipedia/ZObject, but this is obviously not enough.
  • We need a dedicated page to discuss issues related to development. For example:
    • phab:T258894 does not said how non-string data is handled. How to store number? (It is not a good way to store floating-point number as string). What about associative array (aka dict/map/object)?
    • We need a reference type to differ the object Z123 with string "Z123", especially when we have functions that accepts arbitrary type.

--GZWDer (talk) 14:21, 29 July 2020 (UTC)

@GZWDer: I am not sure what you mean about the page. We have the function model that describes a lot. I will update the ZObject page you have created. We also have task list and the phases, the last one I am still working on and trying to connect to the respective entries in Phabricator. Let me know what is uncovered.
Yes, we should have pages (or Phabricator tickets) where we discuss issues related to development, like "How to store number?". That's a great topic. Why not discuss that here?
Yes. The suggestion I have for that is to continue to use the reference and the string types, as suggested in the function model and implemented in AbstractText. Does that resolve the issue? --DVrandecic (WMF) (talk) 21:20, 29 July 2020 (UTC)

Some questions and ideas[edit]

(more will be added)

@GZWDer: Thank you for these! They are awesome! Sorry to getting slowly to them, but there's a lot of substance here! --DVrandecic (WMF) (talk) 02:13, 5 August 2020 (UTC)
And if a function is paired with its inverse, you can automatically check that the final output is the same as the initial input. As under#Distillation of existing content, this is an important consideration for a rendering sub-function, whose inverse is a parser sub-function.--GrounderUK (talk) 23:26, 29 July 2020 (UTC)
I really like the idea of using the inverse to auto-generate tests! That should be possible, agreed. Will you create a phab-ticket? --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
As a second thought we can fold Z20 to Z7 and eliminate Z20 (the Z7 will require return a boolean true value to pass).--GZWDer (talk) 07:47, 30 July 2020 (UTC)
Maybe. My thought is that by separating the code that creates the result and the code that checks the result, wouldn't it be easier to be sure that we are testing the right thing? --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
  • I propose a new key to Z1 (Z1K2 "quoted"). A "quoted" object (and all subobjects) is never evaluated and will be left as is unless unquoted using the Zxxx/unquote function (exception: an argument reference should be replaced if necessary even if in quoted object, unless the reference is quoted itself). (There will also be a Zyyy/quote function.) Z3 will have a argument Z3K5/quoted (default: false) to specify the behavior of constructor (for example, the Z4K1 should have Z3K5=true). Similarly we have a Z17K3. In addition we add a field K6/feature for all functions which may take a list of ZObjects including Zxxx/quote_all.--GZWDer (talk) 08:13, 30 July 2020 (UTC)
    Yes, that's a great idea! I was thinking along similar ways. My current working idea was to introduce a new type, "Z99/Quoted object", and additionally on the Z3/Key have a marker, just as you say, to state that this is "auto-quoted", and then have an unquote function. Would that be sufficient for all use cases, or do I miss something? All the keys marked as identity would be autoquoted. But yes, I really like this idea, thanks for capturing it. We will need that. Let's create a ticket! --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
  • I propose to introduce a Z2K4Z1K3/refid field that will hold the ZID of a persistent object. Similar to Z2K1 the value is always a string, but it is held in the value, like

The ID will be removed when a new ZObject is created on the fly. This will support a ZID() function which returns the ZID of a specific object (e.g. ZID(one)="Z382" and ZID(type(one))="Z70"). Any object created on the fly have an empty string as ZID. (Note this is not the same as Z4K1/(identity or call), as Z1K3 may only be a string and may only be found in value of all kinds of Z2s (not ad-hoc created objects) regardless of type.--GZWDer (talk) 15:50, 30 July 2020 (UTC))--GZWDer (talk) 08:23, 30 July 2020 (UTC)

I'm not sure what your "one" represents. Are you saying ZID(Z382) returns the string "Z382"? So, the same as ZID(Z2K4(Z382))?--GrounderUK (talk) 10:52, 30 July 2020 (UTC)
Yes, ZID(Z382) will return the string "Z382". For the second question: a key is not intended to be used as a function, so we need to define a builtin function such as getvalue(Z382, "Z2K4"), which will return an ad-hoc string "Z382" that does not have a ZID.--GZWDer (talk) 11:03, 30 July 2020 (UTC)
Ok, thanks. So, ZID() just returns the string value of whatever is functioning as the key, whether it's a Z2K1 (like "Z382") or a Z1K1 (like "Z70")...? But "Z70" looks like the Z1K1/type of the Z2K2/value rather than the Z1K1/type of Z382 (which is "Z2", as I interpret it). --GrounderUK (talk) 12:34, 30 July 2020 (UTC)
"A Z9/Reference is a reference to the Z2K2/value of the ZObject with the given ID, and means that this Z2K2/value should be inserted here." and the parameter of ZID is implicitly a reference (so only Z2K2 is passed to the function). The ZID function can not see other part of a Z2 like label. Second thought, the key should be part of Z1 instead of Z2.--GZWDer (talk) 15:14, 30 July 2020 (UTC)
Thank you for bearing with me; I missed the bit about Z9/reference! To be clear, the argument/parameter to the proposed function is or resolves to "just a string", which might be Z9/reference. If it's not a Z9/reference, the function returns an empty string ("")? When it is a Z9/reference, it is implicitly the Z2K2/value of the referenced object. The function then returns the Z1K3/refid of the referenced object (as a string) or, if there is no such Z1K3/refid (it is a non-existent object or a transient object), it returns an empty string. I'm not sure, now, whether the intent is for the Z1K3/refid to remain present in a Z2/persistent object (and equal to the Z2K1/id if the object is well-formed). Your new note above (underlined) does not say that the Z1K3 must be a string and must be present in every Z2/persistent object and must not be present in a transient object (although a transient object does not support Z9/reference, as I see it).--GrounderUK (talk) 08:47, 31 July 2020 (UTC)
Ah, I missed this discussion before I made this this edit today. Would that solve the same use cases? I am a bit worried that the Z1K3 will cause issues such as "sometimes when I get a number 2 it has a pointer to the persistent object with its labels, and sometimes it doesn't", which crept up repeatedly. --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
  • Ambiguity of Local Keys: In this document it is unclear how local keys are resolved.
The de facto practice is if Z1K1 of an ZObject is Z7, then local keys refers to its Z7K1; otherwise it refers to its Z1K1.--GZWDer (talk) 14:58, 30 July 2020 (UTC)
Sorry, maybe I am missing something, but in the given example it seems clear for each K1 and K2 what they mean? Or is your point that over the whole document, the two two different K1s and K2s mean different things? The latter is true, but not problematic, right? Within each level of a ZObject, it is always unambiguous what K1 and K2 mean, I hope (if not, that would indeed be problematic). There's an even simpler case where K1 and K2 have different meanings, e.g. every time you have embedded function calls with positional arguments, e.g. add(K1=multiply(K1=1, K2=1), K2=0). That seems OK? (Again, maybe I am just being dense and missing something). --DVrandecic (WMF) (talk) 02:22, 5 August 2020 (UTC)
  • I propose a new type of ZObject Zxxx/module. A module have a key ZxxxK1 with a list of Z2s as value. The Z2s may have same ID as global objects. We introduce a new key Z9K2/module and a new ZObject Zyyy/ThisModule, which make ZObjects in a module able to refer to ZObjects in other modules. This will make functions portable and prevent polluting global namespace. We may also introduce a modulename.zobjectname (or modulename:zobjectname) syntax for refering individule ZObject in a module in composition expression. (Note a module may use ZObjects from other module, but it would be better that we create a function that use the required module as parameter, so that a module will not rely on global ZObjects other than builtin.)--GZWDer (talk) 19:13, 30 July 2020 (UTC)
    I understand your usecase and your solution. I think having a single flat namespace is conceptually much easier. Given the purely functional model, I don't see much advantage to this form of information hiding - besides for modules being more portable between installations (a topic that I intentionally dropped for now). But couldn't portability be solved by a namespacing model similar to the way RDF and XML does it? --DVrandecic (WMF) (talk) 02:25, 5 August 2020 (UTC)
  • Placeholder type: We introduce a "placeholder" type, to provide a (globally or locally) unique localized identifier. Every placeholder object is different. See below for the usecase.--GZWDer (talk) 03:56, 31 July 2020 (UTC)
  • Class member: I propose to add
    • a new type Zxxx/member; a instance of member have two key: ZxxxK1/key and ZxxxK2/value, both arbitrary ZObjects. If key does not need to be localized, it may be simply string. If it needs to be localized, it is recommended to use a placeholder object for it.
    • a new key Z4K4/members, value is a list of members. Note all members are static and not related to a specific instance.
    • new builtin function member_property and member_function: member_property(Zaaa, Zbbb) will return the ZxxxK2/value of a member of the type of Zaaa (i.e. Z1K1 of Zaaa, not Zaaa itself) with ZxxxK1/key equals to Zbbb. member_function is similar to member_property, but only work if member_property is a function, and the result is a new function that the first parameter of member_property bound to Zaaa. Therefore it returns a function related to a specific instance.

--GZWDer (talk) 03:56, 31 July 2020 (UTC)

  • Inherit type: I propose to add a new key Z4K5/inherit, the value is a list of ZObjects.
    • An inherited type will have all members of the parent type(s), and also all keys. (The parent type should be persistent, so that it will be possible to create a instance with specific keys - the keys may consist of those defined in child type and those defined in parent type. This may be overcomed - I have two schemes to define a relative key, but both have some weakpoints. One solution is discussed in "Temporary ZID" below--GZWDer (talk) 18:11, 4 August 2020 (UTC)) Members defined in child type will override any member defined in parent type(s).
    • We introduce a builtin function isderivedfrom() to query whether a type is a child type of another.
    • This will make it possible to build functions for arbitrary type derived from a specific interface (which are type itself with no keys), such as Serializable, Iterator.
      • An iterator (type derived from the Iteratible type) is simply any type with a next function, which generates a "next state" from current state. Some example is a generator of all (infinite) prime numbers, or a cursor of database query results.
      • We would be able to create a general filter function for arbitrary iterator, which will itself return a iterator.

--GZWDer (talk) 03:56, 31 July 2020 (UTC) A generalized filter for Iteratible will generate a Iteratible

  • @GZWDer: I raised inheritance with Denny a while back; I agree there needs to be a mechanism for it, and it's already implicit in the way Z1 keys work. But I wonder if it needs to be declared... If a ZObject has keys that belong to another, doesn't that implicitly suggest it inherits the meaning of those keys? Or there could be subdomains of ZObjects under some of which typing is stricter than for others (i.e. inherit from "typed object" vs inherit from plain "ZObject")? Typing and inheritance can get pretty complicated though and perhaps is only necessary for some purposes. ArthurPSmith (talk) 17:27, 31 July 2020 (UTC)
    @GZWDer: Agreed, that would be a way to implement OO mechanisms in the suggested model. And I won't stop anyone from doing it. My understanding is that this would all work *on top* of the existing model. I hope to be able to avoid putting inheritance and subtyping into the core system, as it makes it a much simpler system. But it should be powerful enough to implement it on top. Fortunately, this would not require a change in the current plans, if I see this correctly. Sounds right? --DVrandecic (WMF) (talk) 02:33, 5 August 2020 (UTC)
  • I propose a new key Z2K4/Serialization version to mitigate breaking change of serialization format. For example "X123abcK1" is not a valid key but I propose to use such key below.--GZWDer (talk) 18:11, 4 August 2020 (UTC)
    This might be needed. The good thing is that we can assume this to be already here, have a default value of "version 1", and introduce that key and the next value whenever we need it. So, yes, will probably be needed at some point. (I hope to push this point as far to the future as possible :) ) --DVrandecic (WMF) (talk) 02:36, 5 August 2020 (UTC)
  • Temporary ZID: we introduce a new key Z1K4/Temporary ZID. A temporary ZID may have a format Xabc123 where abc123 is random series of hexadecimal digits (alternatively we can use only decimal digits). For a transient object without a Z1K4 specified, a random Temporary ZID will be generated (which is not stable). Usecases:
    • As the key of a basic unit of ZObject used by evaluators; i.e. When evaluating, we use a pool of ZObjects to reduce redundant evaluation.
    • One of solutions of "relative keys" (see above) - the XID may easily use to form a key like X123abcK1.
    • A new serialized format to reduce duplication: A ZObject may have a large number of identical embedded ZObjects.
    Some other notes:
    • For easier evaluation, the temporary ZID should be globally unique. However it is not easy to guarantee this especially if the temporary ZID is editable.
    • When a ZObject is changed it should have a new temporary ZID. But similary it is not easy to guarantee.
    • We introduce a new function is() to check whether two object have the same temporary ZID (ZObjects created on the fly have a random temporary ZID, so is not equivalent with other ZObjects)
    • This may it possible to have ZObjects with subobjects relying on each other (such as a pair that the first element points to the pair itself). We should discuss whether it should be allowed. Such objects do not have a finite (traditional) serialization and may break other functions such as equals_to. If such object is not allowed, an alternative is to use "hash" to refer to a specific ZObject.
      • Note how will equal_to work for two custom objects is itself an epic.

--GZWDer (talk) 18:04, 4 August 2020 (UTC)

I agree with the use cases for Z1K4. In AbstractText I solved these use cases by either taking a hash of the object, or a string serialization - i.e. the object representation is its own identity. Sometimes, the evaluator internally added such a temporary ID and used that, IIRC. But that all seems to be something that is only interesting within the confines of an evaluation engine, right? And an evaluation engine should be free to do these modifications (and much more) as it wishes. And there such solutions will be very much needed - but why would we add those to the function model and to Z1 in general? We wouldn't store those in the wiki, they would just be used in the internal implementation of the evaluator - or am I missing something? So yes, you are right, this will be required - but if I understand it correctly, it is internal, right? --DVrandecic (WMF) (talk) 02:42, 5 August 2020 (UTC)