Talk:Abstract Wikipedia/Archive 2

From Meta, a Wikimedia project coordination wiki

Review from a computational linguistics perspective

Preamble: I especially like the outrageous idea behind the Abstract Wikipedia.

I've watched the talk at the Knowledge Graph conference[1] and gone through the working paper.[2] You can find below my stream of consciousness. I hope it helps.

Keep on keepin' on! --Hjfocs (talk) 20:14, 17 May 2020 (UTC)

Concepts

A Constructor looks to me like a frame definition,[3] with a frame and a set of frame elements (called Keys):

  • it has a JSON-like representation, which seems close to a verbose context-free grammar (CFG);[4]
  • it encodes semantics through a lexical approach;
  • it also has similarities to a Wikipedia template.

This reminds me of what I called a frame repository in my work on n-ary relation extraction.[5] The research effort led to the StrepHit project,[6] which aimed at parsing natural language into Wikidata statements, while Wikilambda targets the opposite process.

I'd like to highlight what I consider a crucial aspect when talking about knowledge representation: top-down versus bottom-up approaches. The StrepHit frame repository is naturally built from evidence in the input data, thus following a fully bottom-up direction. In my experience, it's hard to think that a top-down approach is feasible, so I'm curious to hear how you can avoid data-driven creation of Constructors.

Content is a way to fill frame slots with Wikidata items:

  • it contains a mix of Wikidata terminal symbols and lexicon;
  • it caters for syntax and morphology.

A Renderer is a language-specific CFG with terminal symbols: well, actually simpler, pretty much like a sentence template.

Challenges

  1. Ontologies suffer from the huge problem of putting together concepts that may have different representations across cultures, as also mentioned by Kaldari.[7] I believe we should try to avoid encoding abstract concepts as much as possible;
  2. Constructors are potentially language-agnostic, but one Renderer is needed for each language. No renderer means no article, which would lead to gaps again;
  3. it is known that CFGs don't scale: the approach is rule-based, so a "rule explosion" is unavoidable if we aim at expanding coverage. The proposed solution is to outsource the scalability issue to volunteer contributors. It's hard for me to think that the process can be made easy enough to expect contributions from a broad audience. The ideal target community might be composed of computational linguists with multilingual skills: how many people do you expect to attract?

Questions and comments over the technical paper

  • In Section 4 you say that we "could use a parser to make a suggestion in the form". "A simple classifier will probably be sufficient": I'd love to hear more details on this idea;
  • in Section 5.5 you say that "looking at the state of natural language generation research, there seems to be no consensus commiting to any specific linguistic theory", but I think that you definitely propose a frame-based representation with CFGs;
  • in Section 8 I'm not sure I understand the fourth bullet point at all;
  • in Section 10.4 what do you mean by "In order to support contributions through natural language we need to have classifiers for these texts"?

Answer

@Hjfocs: thank you so much for reading the paper and the detailed comments! I am happy to answer them as far as I can.

Challenge #1: that is correct, and I discussed this one with Kaldari above and the 'love' example. It won't be easy, but I think it is possible.

Challenge #2: that is correct. That will be a main challenge. The big advantage is that with comparably little effort we can create a lot of Wikipedia content with high coverage correctness, and currency: a single small grant can probably go far in any of the languages to create hundreds of thousands of articles which will remain up-to-date. So yes, it is true, but it will improve the current situation considerably and enable the efficient creation of a lot of good content.

Challenge #3: CFGs scaling issues concern parsing. But here we only need generation, and that's far less computational complex, particularly because we never have free options. Unless I am gravely mistaken - and please tell me if I am - given these constraints that is not a problem.

Question 1 (re Section 4): imagine a classifier that takes a sentence and tries to figure out which frames are probably being instantiated by that. Now, for Information Extraction, where you have the same task, you would want a high precision answer because in fact you want this to be fairly automated. For Abstract Wikipedia, we don't need that kind of precision, because it is people typing in the sentence, seeing suggestions by the classifier, and then choosing one of the suggestions if they like it and modify them further. If there are mistakes done by the classifier, that doesn't matter so much, because we will have a human in the loop before submitting the text. Also, the text is written by the contributor at that point. They are not tasked with confirming whether some other text supports an extraction, but rather, they type some text, they select the correct classification, and fix it, and see then the result being rendered again. This also means that the contributor is getting feedback on what the system probably understands and will adapt to that. I think that might be an interesting UX.

Also, this whole UX with the magic box allowing arbitrary free text entry is optional. So if it doesn't work, we're still having the baseline form-based editor.

Question 2 (re Section 5.5): that's in the examples and a possible suggestion. Since Wikilambda allows for arbitrary functional algorithms to be written, the system could be used for a different approach as well. We have about a year after the project starts to figure out whether the proposal of using frames and generation makes sense. If it doesn't and we're lucky, we can correct course, there's enough time for that. If it doesn't and we're not lucky, we can still restart later with a different solution within Wikilambda.

Question 3 (re Section 8): The bullet point says "in general, the exact surface text that is rendered is not so important, as long as it contains the necessary content and, even more importantly, does not introduce falsehoods". What i meant to say with that, taking the San Francisco mayor example, is that it doesn't matter if the rendered text reads as "In order to deny London Breed the advantage of incumbency, the Board of Supervisors elected Mark Farrell as Mayor of San Francisco." or "The Board of Supervisors chose Mark Farrell to be the Mayor of San Francisco because they wanted London Breed not to have the advantage of incumbency." It doesn't matter if the result is one or the other, as long as the result is correct. Does this make more sense?

Question 4 (re Section 10.4): This ties in with Question 1. In case we want to have this optional magic box UX for a given language, we would need classifiers for that language. We can always fall back to the form-based solution, so we wouldn't block on it. One neat thing is that we can actually create training data with the Renderers that we will have by then, and then use some background text model like BERT in addition to that to create a classifier even for low-resource languages. I have no idea if that will work, but it would be a very neat result.

Again, thank you for reading the proposal and watching the videos! I will also use your questions to improve the next version of the paper and make it clearer, thank you. If I answered the wrong question, or answered a question unsatisfactorily, please feel free to ask more! --denny (talk) 03:22, 18 May 2020 (UTC)

Additional brainstorming

Hey @Denny: thanks for your response, it gave me quite a lot of food for thought.

First, your answer to question 3 completely clarifies my comprehension.
Next, I'd like to shed more light on CFGs / rules scalability (challenge 3) and top-down versus bottom-up approaches:[8] I think they are intertwined with your answer on the classifier (question 1) and with question 4, hence with the overall vision of the contribution workflow. My understanding of it is as follows:

  1. a contributor types a sentence;
  2. a Constructor / frame classifier parses it;
  3. the classifier suggests Constructor and Content candidates;
  4. the contributor curates and ultimately defines them.

I totally agree that the classifier doesn't need to be performant, and that's indeed a major advantage. On the other hand, it looks like we still need to start from natural language evidence, whether or not we implement the optional steps. I believe that evidence would be essential for:

  • the very beginning, i.e., building the initial abstract content repository;
  • bootstrapping the optional classifier;
  • the contribution itself.

As a result, we would still have to parse text, and finally go for a bottom-up approach to create abstract content. That's where the scaling issue arises: the more we add or refine our textual evidence, the more we need rules that can cover it. Perhaps we may tackle this challenge by defining a reasonable trade-off between simplicity of generated text and complexity of rules: if we limit to very simple text, we would minimize the number of rules that are needed to generate it.

I'm also thinking of tree-adjoining grammars (TAG),[9] which may squeeze lexical and syntactic rules, since they are lexicalized grammars. However, I'm not sure how we can apply them to the semantic layer (Constructors). Do you have any pointers on the topic?

I'd really love to see this kind of discussion grow and the abstract Wikipedia happen!

Best,
Hjfocs (talk) 17:35, 29 May 2020 (UTC)

@Hjfocs: I think I know see where I need to be clearer: in the workflow you describe above I consider steps 1-3 entirely optional. The whole step from natural language to suggestion, is optional. I expect the first edits to be created directly via a form-based interface in abstract notation, and this can start in any language immediately.
Adding the UI that allows for input in free natural text is then a convenience to speed up finding the right constructors for the form-based interface, but if parsing fails, or if the classifier happens to get the wrong constructors, that really doesn't matter that much, because the main decision maker is the contributor in between, who is actually building the content with constructors and based on the renderings making sure they capture their intent.
And for the classifiers, I am not even sure I would aim for going through a syntactic parse first. I mean, sure, we can try - that's the beauty of Wikilambda, we can, in fact, implement the classifer and parser in Wikilambda itself and then just use those per language or even per user - but we can also use a classifier that directly suggests constructors without ever doing syntactic parses. And then we can see what works better.
And the classifiers themselves can be trained on content created from Wikilambda. Does this make sense? --denny (talk) 04:42, 5 June 2020 (UTC)

@Denny: this is an interesting project! With respect to question 2 above, I had a similar confusion when reading section 5.5. I would definitely advocate for explaining somewhere what our constraints are even with this constructor/renderer model in general as opposed to some other boxology approach (if that makes sense). --Chris.Cooley (talk) 06:05, 4 July 2020 (UTC)

References

Revisiting and twisting a Kaldari concern

My question has already been touched on very briefly above, but I would like to return to it from a different angle. As my concern is a bit more complicated, it will unfortunately be a bit long, for which I apologize. My concern is the representation of knowledge.

In a nutshell:

  • Wikilambda will result in participants from technically and economically privileged cultures dominating the content of language versions of less privileged cultures.

Wikipedias as essentially text-based projects are embedded in cultural spaces. The underlying language of a WP essentially limits the group of participants to their speakers, who are usually also participants of the same cultural spaces. You have already responded to Kaldari's concern about cultural POVs above with the remark that the English, Spanish, French and Portuguese WP are already multicultural and that this would work well. Unfortunately, it doesn't: your answer ignores the question of how strong the participation of postcolonial cultures in these wikipedias is. The en-WP is dominated by participants from North America and Great Britain (~60%), the much larger group of Indo-Pakistani English speakers hardly appears in it (~6%), Africa not at all. This is even more drastic in French (90% FR/CN/B, North Africa together 1.7%, the rest of Africa does not occur), it is better in Spanish (~32% ES, ~60% South America), in pt-WP Brazil is predominant, but African states are almost completely absent. ([1]) The problem is of course not only related to the inside of language versions, but also to the representation of the separate language versions, we all know that. Important factors here are internet access and education, which "we" can hardly change.

Nevertheless, up to now smaller language versions have been given the opportunity to grow successively, according to their own standards, by the inherent restriction to participants from their own culture, with all the problems and also deficits that Wikipedia language versions have in their beginnings. Wikilambda will make this border permeable. That sounds great at first, but it also means that in the future Wikipedia content will be played out globally, originating from the keyboards of mainly white European and Northamerican men. That's where the community is, that will use these tools. You wrote that from the confrontation of different points of view a better knowledge is created, ideally without bias. But that only works if all perspectives are equally represented in a constructive debate. How little this is possible when the technical and social conditions are so different can be seen in the results of the Wikipedias I wrote of above. And this leads to your two sentences from the proposal, "This allows everyone to read the article in their language" and "This will get us closer to a world where everyone can share in the sum of all knowledge." not being synonymous.

I know that's not your intention, quite the contrary. And I also know that at this stage Wikilambda will be unstoppable. But I would be happy if you would try to find a solution to this important problem. Denis Barthel (talk) 19:35, 10 June 2020 (UTC)

@Denis: Thank you for raising this important point.
You refer to the two sentences "This allows everyone to read the article in their language" and "This will get us closer to a world where everyone can share in the sum of all knowledge.", and yes, I very much agree that they are not synonymous and I chose both of them consciously. The first one is about consumption, in the second one we have the (in English quite awkward construction) "share in", which is meant to convey not only access to the knowledge, but also the ability to participate in it, to contribute to it.
Right now I have no ability to contribute to the Japanese Wikipedia. Right now no Japanese monolingual speaker has the ability to contribute to the Croatian Wikipedia. Right now no one from the Global South who does not speak English can contribute to the English Wikipedia. All of these paths are currently unavailable. I am suggesting to make these barriers more permeable. Larger communities tend to lead to better outcomes - more eyeballs, better results.
That does not mean that any Wikipedia will be forced to take this content from Abstract Wikipedia. No. Each community will still be able to choose how much of this content they want to use, if any, and how much they want to write themselves. It is an offer. And my expectation is that for the large majority of the content, some Wikipedias will be happy to rely on the content offered by Abstract Wikipedia, which will allow them to focus on the content they care about most. If they want to pull in content from Abstract Wikipedia on medical topics, on geography of far-away places, and on French and Indian movies and Russian and Argentinian authors, they can do so. If they want to write the articles about their local histories and locations and cultures in their own Wikipedia in their own voice, they can do so. If they want to contribute that knowledge upstream to Abstract Wikipedia, they can do so.
I understand your worry, and I take it serious. But would you want to restrict certain communities from access to Linux and rather tell them to create their own operating system, because it might be a vector for a Western value system?
The two questions I would like to ask you - since you expressed your opposition to the proposal:
  1. Given that you describe Wikilambda as unstoppable, which I am interpreting as you thinking that the time for an idea like this one has come, would it be better for such a project to be stewarded by the Wikimedia movement, or by another entity, potentially with commercial interests?
  2. Do you think that people who only speak languages where we don't have much Wikipedia content will, when having an information need, benefit from Wikilambda more than not?
Here's the scenario I am thinking of: a patient was told by their doctor that they have a specific disease, or that they need to take a certain medicine - is it better for them to have to find content in a different language and then use a machine translation tool? To make it concrete, is it better for someone from the Ivory coast who speaks French to have an article on adenoidectomy in French written by a person from France, than not have an article on it at all, when they need it?
So I strongly believe that this proposal will improve the situation overall. I do not deny the potential risks you are outlining, but I think they are worth the potential benefit. I believe that this proposal is strongly aligned with the Wikimedia vision and the 2030 strategy.
You are asking me to try to find a solution for this important problem. Bear with me, but I would like to ask you back: do you already have an idea for how a solution to this problem would look like? Is it technical, is it processes, is it policy, or a mix? If not, can you outline the conditions a solution has to fulfil? --denny (talk) 18:49, 13 June 2020 (UTC)
Hello Denny,
Thank you very much for your detailed answer and your efforts to clarify the matter.
In fact, you are of course right that the availability of knowledge is a high value and yes, WL could be a way to bring such knowledge to places where it is currently not widely available through Wikimedia platforms. The examples you have given emphasize especially contents on natural sciences as very valuable in this respect, your point is valid and one has to be careful not to overshoot the mark in criticism.
The passive participation (vulgo: reading :) ) of contents on natural sciences or purely factual content thus clearly can be considered a plus of WL. However, this quickly becomes much more difficult in the case of lemmas that convey identity-political content. Whether politics, history, art, music, literature, emancipation movements, the list could be extended at will. I see that it is good and unproblematic if a good text on adenotomy is available in all languages in almost identical form. But if the lemma is called e.g. Winston Churchill, Vietnam War, Algeria War or genocide of the Herero and Nama, then the origin of the texts is much more relevant. For in view of the fact that the community is dominated by mostly white, European/American male participants, these texts are determined by their choice of sources and narratives, and possibly even by their mission. So to speak a reversal of your example of the Croatian Wikipedia, for which you hope that the majority community will defuse the local bias. But what if it is the majority community that transports a bias into smaller language versions? Given the demographic situation, that is quite likely. This can certainly be exacerbated if it is precisely this community that also dominates the internal atmosphere of WL, I don't have to tell you how stubborn and unruly the existing communities are when it comes to structural problems of representation.
Your and my scenarios are not contradictory here. All of them are possible and will happen, and probably more, even beyond the specific problem discussed here. You have justifiably asked for solutions, or rather ways to solve the problem. Individual measures may help in some cases (a la "Only accounts from Heimwikis can import articles"), but it would be much more efficient to build a kind of "advocatus diaboli" into the planning process from the very beginning, which permanently tries to anticipate where social undesirable developments could occur and works out solutions together with all parties involved. Possible problems of appropriate representation and optimal inclusion can best be countered if they are considered from the outset. Naturally, little or no such work during development has been done in the past on Wikimedias current platforms, so that today we have to deal with the well-known problems that are difficult to solve (keywords gender gap, global south). WL can perpetuate them, but from scratch we can also do something about it. So that both sentences come true, "everyone to read" and "everyone can share in the sum of all knowledge."
I hope this helps and thank you for your constructive questions. Best regards, Denis Barthel (talk) 09:40, 15 June 2020 (UTC)
@Denis Barthel: Thanks for the further clarification and concrete suggestion, I really appreciate that. And I agree, both our scenarios are not contradictory, the scene is indeed large enough for both of them to play out.
Here's one point that I am struggling with: my assumption is that when we develop the project, we will have a separation between the community, responsible for the content, and the development team, responsible for the software and running it. As with all other projects, the influence of the development team on the content of the project should be very limited. And whereas I would appreciate if the community was thoughtful and considerate regarding the points you raise, I think that it should not be the task of the development team to play an outsized role in the creation of content and guidelines of the project.
Now, sure, the technical implementation has a certain influence on the social aspects of the project, but I doubt that this will be the major driver regarding your concerns. This will be, compared to the social dynamics of the project community itself, be a minor factor, I think.
So yes, we could write the role of such an Advocatus into the project plan, but I wonder whether that would be actually effective or merely a feel-good action.
What are your thoughts? --denny (talk) 22:12, 19 June 2020 (UTC)
@Denny:
You are right, of course, there is a need to separate development and community-based content creation. But the precise moment of the developers' withdrawal from this task does not begin until the first person creates an account and begins editing. Before this, however, isn't it the developer's responsibility to imagine the future community already and to design the software accordingly? Community is already included in your proposal as a cryptic assumption about the way vision and software are connected. You proposed a vision and a software tool, both in a precise and clear manner. But there is no direct, linear relationship between the two, but instead a magical triangle of vision, tool, and future community.
I plead for formulate this third factor as precisely and clearly as you have already done with the other two factors. And in my opinion it is essential to do so. There is a lot we know about wikis and communities, and conditions for a more inclusive community can be determined and at least promoted based on this.
It is possible to renounce it. A community will also simply emerge and grow, like on a wasteland, in a wild way. But weeding afterwards is destructive, gardening in the beginning is not. And you are right again - we don't know if it will work. So should you try it? In the worst case, there has been a meaningless instance for which you have spent time and money, in the best case there is a wiki that has solved (some) problems of representation in advance.
I have to admit: that's pretty rough thinking, a real process needs to be more focused and straightforward and requires much more research, but I hope that it can at least serve as a sketch.
Thank you very much for your persistent patience and your open-mindedness. I would really like to hear what you think about it. Denis Barthel (talk) 23:04, 20 June 2020 (UTC)
@Denis Barthel: Please forgive me for taking so long to answer, but the last two or three weeks I have been busy with transitioning to a full-time role to work on this project. This also means, future answers should come faster!
I am currently preparing a list of early topics for discussion, and I will explicitly add a discussion regarding diversity and sustainability, where the topics you have raised here should be covered. The goal of this discussion is to inform the further development of the project, and to figure out how to properly take the concerns you and others have raised regarding diversity, representation, and community growth and sustainability.
If you are willing to do so, I would like to ping you when that discussion starts. This way you don't have to follow all of the discussions regarding Abstract Wikipedia, and I hope that we will be able to come to a modus operandi that looks good to all of us. How does this sound? --DVrandecic (WMF) (talk) 22:22, 7 July 2020 (UTC)

Meine Frage ist oben bereits sehr kurz angerissen worden, ich möchte aber unter einem anderen Blickwinkel noch einmal darauf zurückkommen. Da meine Sorge etwas komplizierter ist, wird es leider etwas lang, dafür bitte ich jetzt bereits um Entschuldigung. Meine Sorge ist die der Wissensrepräsentation insbesondere postkolonialer Kulturen. In a nutshell: Wikilambda wird dazu führen, dass Teilnehmer aus technisch und wirtschaftlich privilegierten Kulturen die Inhalte von Sprachversionen weniger privilegierter Kulturen dominieren. Wikipedias sind als im wesentlichen textbasierte Projekte eingehegt in kulturelle Räume. Die einer WP zugrundeliegende Sprache begrenzt die Gruppe der Teilnehmenden im Wesentlichen auf ihre Sprecher, die in der Regel auch gemeinsame Teilhaber derselben kulturellen Räume sind. Du hast hierzu oben schon auf Kaldaris Sorge um kulturelle POVs geantwortet mit der Anmerkung, die englische, spanische, französische und portugiesische WP seien bereits multikulturell und das würde gut funktionieren. Leider funktioniert das nicht: In deiner Antwort wird nämlich die Frage übersehen, wie stark die Teilnehmerschaft postkolonialer Kulturen an diesen Wikipedias überhaupt ist. In der en-WP dominieren Teilnehmer aus Nordamerika und Großbritannien (~60%), die wesentlich größere Gruppe indopakistanischer Englischsprecher kommt darin kaum vor (~6%), Afrika gar nicht. Noch drastischer gilt das für Französisch (90% FR/CN/B, Nordafrika zusammen 1,7%, Afrika kommt nicht vor), besser ist es im Spanischen (~32% ES, ~60% Südamerika), in pt-WP ist zwar Brasilien vorherrschend, afrikanische Staaten aber fehlen fast vollständig. ([2]) Das Problem ist natürlich nicht nur auf das Innere von Sprachversionen bezogen, sondern auch auf die Repräsentation einzelner Sprachversionen, das ist bekannt. Wichtige Faktoren sind dabei Internetzugang und Bildung, auf die "wir" kaum Einfluss nehmen können. Trotzdem bekamen bisher kleinere Sprachversionen durch die Begrenzung auf Teilnehmer aus der eigenen Kultur, die Möglichkeit sukzessive zu wachsen, according to their own standards, mit all den Problemen und auch Defiziten, die Wikipedia-Sprachversionen in ihren Anfängen haben. Wikilambda wird diese Grenze durchlässig machen. Das klingt erstmal toll, bedeutet aber auch, dass in Zukunft Wikipedia-Inhalte global ausgespielt werden, die vornehmlich den Tastaturen weißer, europäischer und nordamerikanischer Männer entstammen. Denn dort sitzt die Community, die diese Werkzeuge nutzen werden. Du sagtest, dass aus der Konfrontation verschiedener Blickwinkel ein besseres Wissen entsteht, idealerweise ohne Bias. Das funktioniert aber nur, wenn alle Perspektiven gleichmäßig repräsentiert werden in einer konstruktiven Auseinandersetzung. Wie wenig das möglich ist, wenn die technischen und sozialen Bedingungen so verschieden sind, kann man den Verhältnissen der obigen Wikipedias entnehmen. Und das führt dazu, dass die beiden Sätze aus dem Proposal, "This allows everyone to read the article in their language." und "This will get us closer to a world where everyone can share in the sum of all knowledge." eben nicht synonym zueinander sind. Ich weiss, dass das nicht deine Absicht ist, ganz im Gegenteil. Und ich ahne auch, dass in diesem Stadium Wikilamda nicht mehr wirklich aufzuhalten sein wird. Über eine Lösung für dieses wichtige Problem würde ich mich aber freuen. Denis Barthel (talk) 19:35, 10 June 2020 (UTC)

Danke dass Du diesen wichtigen Punkt ansprichst. Du verweist auf die zwei Sätze "This allows everyone to read the article in their language" und "This will get us closer to a world where everyone can share in the sum of all knowledge.", und ja, ich stimme Dir vollkommen zu, diese Sätze sind nicht synonym zueinander. Ich habe sie beide sehr bewusst gewählt. Beim ersten Satz geht es darum, Wissen konsumieren zu können, beim zweiten, mit der seltsamen Englischen Konstruktion "share in", darum, dass man sich an dem Wissen auch aktiv beteiligen und dazu beitragen kann.
Zur Zeit habe ich nicht die Fähigkeit zur Japanischen Wikipedia beizutragen. Zur Zeit kann ein einsprachiger Japanischer Sprecher nicht bei der Kroatischen Wikipedia beitragen. Zur Zeit kann jemand aus dem sogenannten "Global South" der kein Englisch beherrscht nicht zur Englischen Wikipedia beitragen. All diese Pfade sind zur Zeit verschlossen. Mein Vorschlag ist es, diese Grenzen ein wenig zu öffnen. Größere Communities tendieren zu besseren Ergebnissen.
Das heißt nicht, dass Wikipedien dazu gezwungen werden, Inhalte aus der Abstrakten Wikipedia zu benutzen. Nein. Jede einzelne Community wird selbst entscheiden, welche Inhalte sie aus der Abstrakten Wikipedia abrufen möchte, wenn überhaupt, und welche sie selber schreiben will. Es ist ein Angebot. Ich denke, dass für die meisten Inhalte die meisten Wikipedien ganz glücklich damit sein werden, sich auf die Abstrakte Wikipedia verlassen zu können, was ihnen erlauben wird, sich auf die Inhalte zu konzentrieren die ihnen am Herzen liegen. Wenn sie zu medizinischen Themen, zur Geographie ferner Orte, zu Französischen oder Indischen Filmen, Russischen oder Argentinischen Autoren die Inhalte aus den Abstrakten Wikipedia abrufen wollen, so steht das ihnen frei. Wenn sie die Artikel in ihrer eigenen Wikipedia zu ihrer eigenen Geschichte und Kultur schreiben wollen, so steht ihnen das frei. Wenn sie dieses Wissen dann auch der Abstrakten Wikipedia hinzufügen möchten, so dass andere sich daran bedienen können, so steht ihnen das frei.
Ich kann Deine Sorge nachvollziehen, und ich nehme sie ernst. Aber würdest Du bestimmten Gruppen den Zugriff auf Linux verwehren möchten und ihnen stattdessen sagen, schreibt doch euer eigenes Betriebssystem, sonst holt ihr euch doch nur Westliche Werte ins Haus?
Hier sind zwei Fragen, die ich Dir stellen möchte, um vielleicht Deine Ablehnung zu dem Vorschlag zu überdenken:
  1. Da Du Wikilambda als unaufhaltbar beschreibst, was ich dahingehend interpretiere dass Du die Zeit reif hältst für so eine Idee, wäre es denn nicht besser wenn die Wikimedia Communities dieses Projekt an die Hand nehmen statt es jemand anderem zu überlassen, womöglich jemandem mit kommerziellen Interessen?
  2. Denkst Du dass für Leute die nur Sprachen sprechen für die wir zur Zeit wenige Inhalte in den Wikipedien anbieten, von Wikilamba eher profitieren oder nicht?
Hier ein Szenario: ein Patient erfährt von seinem Doktor, dass er eine bestimmte Krankheit hat, oder eine bestimme Medizin nehmen muss. Ist es besser wenn der Patient Inhalte in einer fremden Sprachen finden und sich dann auf eine automatische Übersetzung verlassen muss? Um es konkret zu machen, ist es denn nicht besser für jemanden der Französisch spricht, einen Französischen Artikel zu Adenotomie der von jemanden aus Frankreich geschrieben wurde zu haben, als keinen solchen Artikel zu haben?
Ich bin fest davon überzeugt dass dieser Vorschlag die Situation verbessert. Ich stimme Dir zu, Du beschreibst mögliche Risiken des Projekts, aber ich denke dass die möglichen Vorteile wiegen diese auf. Ich glaube dass dieser Vorschlag eng mit der Vision von Wikimedia im Einklang steht, und auch mit der Strategie für 2030.
Du fragst nach einer Lösung für dieses wichtige Problem. Bitte verzeih', aber ich möchte mit einer Gegenfrage antworten: hast Du bereits eine Idee, wie eine solche Lösung aussehen könnte? Ist die Lösung technisch, ist es ein Prozess, geht es um Regeln, oder eine Mischung daraus? Wenn Du noch nicht weisst, wie eine Lösung aussehen könnte, kannst Du die Bedingungen skizzieren, die eine solche Lösung erfüllen sollte? --denny (talk) 18:49, 13 June 2020 (UTC)
Hallo Denny,
einen ganz herzlichen Dank für deine ausführliche Antwort und deine Bemühungen um Klärung.
In der Tat hast du natürlich recht, dass die Verfügbarkeit von Wissen einen hohen Wert darstellt und ja, WL könnte eine Möglichkeit sein, solches Wissen auch dort hin zu bringen, wo es aktuell in der Breite durch Wikimedia-Plattformen noch nicht verfügbar gemacht wird. Die von dir angeführten Beispiele heben insbesondere naturwissenschaftliche Inhalte als diesbezüglich sehr wertvoll hervor, dein Punkt ist valide und es gilt aufzupassen, in der Kritik das Kind nicht mit dem Bade auszuschütten.
Die passive Partizipation (vulgo: Lesen :) ) an naturwissenschaftlichen oder rein faktischen Inhalten können wir also als ein Plus von WL verbuchen. Das wird allerdings rasch deutlich schwieriger bei Lemmata, die identitätspolitische Inhalte transportieren. Ob Politik, Geschichte, Kunst, Musik, Literatur, Emanzipationsbewegungen, die Liste liesse sich beliebig verlängern. Ich sehe, dass es gut und unproblematisch ist, wenn ein guter Text zur Adenotomie in allen Sprachen in annähernd identischer Form vorliegt. Wenn das Lemma aber z.B. Winston Churchill, Vietnamkrieg, Algerienkrieg oder Völkermord an den Herero und Nama heisst, dann ist der Ursprung der Texte deutlich relevanter. Denn angesichts der Tatsache, dass die Community von mehrheitlich weissen, europäischen/amerikanischen männlichen Teilnehmern dominiert wird, werden diese Texte durch deren Quellenauswahl und Narrative bestimmt, ggf. sogar durch ihre Mission. Sozusagen eine Umkehrung deines Beispiels der kroatischen Wikipedia, für die du hoffst, dass die Mehrheits-Community den lokalen Bias entschärft. Was aber nun, wenn es die Mehrheitscommunity ist, die einen Bias in kleinere Sprachversionen transportiert? Angesichts der demographischen Verhältnisse ist das durchaus wahrscheinlich. Das kann durchaus noch verschärft werden, wenn es eben diese Community ist, die auch die Binnenatmosphäre von WL beherrscht, ich muss dir nicht erzählen, wie stur und widerspenstig die bestehenden Communitys sind, wenn es um strukturelle Probleme der Repräsentation geht.
Deine und meine Szenarien stehen sich hier nicht widersprechend gegenüber. Alle sind möglich und werden passieren und vermutlich auch noch weitere, auch über das hier erörterte Problem hinaus. Du hast berechtigterweise nach Lösungen, bzw. Lösungswegen gefragt. Einzelne Maßnahmen mögen punktuell helfen (a la "Nur Accounts aus Heimwikis können Artikel importieren"), deutlich effizienter wäre es aber wohl, von vornherein in den Planungsprozess eine Art "advocatus diaboli" als Instanz einzubauen, die permanent anhand von Nutzungsszenarien versucht zu antizipieren, wo soziale Fehlentwicklungen entstehen könnten und gemeinsam mit allen Beteiligten Lösungen erarbeitet. Möglichen Problemen angemessener Repräsentation und optimaler Inklusion kann am besten entgegnet werden, wenn sie von vornherein mitgedacht werden. Bei den bisherigen Plattformen ist so etwas naturgemäß wenig oder gar nicht erfolgt, so das wir uns heute mit den bekannten, nur schwer lösbaren Problemen (Stichworte GenderGap, Global South) auseinandersetzen müssen. WL kann diese perpetuieren, from scratch dem aber auch etwas entgegensetzen. Damit tatsächlich beide Sätze zutreffen, "everyone to read" und "everyone can share in the sum of all knowledge."
Ich hoffe, das hilft und danke für dein konstruktives Nachfragen. Liebe Grüße, Denis Barthel (talk) 09:40, 15 June 2020 (UTC)
Danke für die weitere Klarstellung und den konkreten Vorschlag. Ich schätze letzteren besonders! Und ich stimme Dir zu, unsere beiden Szenarien sind nicht widersprüchlich.
Aber ich kämpfe mit der folgenden Überlegung: ich gehe davon aus dass wir auch in Zukunft eine Trennung zwischen der Community, die für die Inhalte zuständig ist, und dem Entwicklungsteam wollen, welches für die Software und die Technik zuständig ist. Wie bei allen unseren Projekten sollte das Entwicklungsteam auf die Inhalte nur wenig Einfluss haben. Und natürlich würde ich es begrüßen, wenn die Community behutsam ob der von Dir genannten Punkte vorgeht - aber es sollte nicht die Aufgabe des Entwicklungsteams sein, darauf zu sehr Einfluss zu nehmen.
Ja, natürlich hat die Implementierung eine gewisse Auswirkung auf die sozialen Aspekte des Projekts, aber ich bezweifle dass die treibenden Impulse vom Code kommen werden, sondern durch die soziale Dynamik der Community definiert werden.
Ja, wir könnten die Rolle eines solchen Advocatus in den Projektplan schreiben, aber ich frage mich ob das wirklich effektiv wäre oder lediglich eine Wohlfühlaktion.
Was denkst Du? --denny (talk) 22:12, 19 June 2020 (UTC)
denny:
Du hast natürlich Recht, es ist nötig, die Entwicklung und die Erstellung von Inhalten durch die Community zu trennen. Doch der genaue Moment des Rückzugs der Entwickler von dieser Aufgabe beginnt erst, wenn die erste Person ein Konto erstellt und mit Bearbeitungen beginnt. Bevor dies aber geschieht, liegt es nicht in der Verantwortung der Entwickler, sich die zukünftige Community bereits vorzustellen und die Software entsprechend zu gestalten? Community ist bereits in deinem Proposal enthalten als kryptische Annahme der Art und Weise, wie Vision und Werkzeug miteinander verbunden sind. Du hast eine Vision und ein Werkzeug vorgeschlagen, beides in präziser und klarer Weise. Aber es gibt ja keine direkte, lineare Beziehung zwischen beiden, sondern statt dessen ein magisches Dreieck aus Vision, Werkzeug und künftiger Community.
Ich plädiere dafür, diesen dritten Faktor so präzise und klar auszuformulieren, wie Du es bereits mit den beiden anderen Faktoren getan hast. Und meiner Meinung nach ist es unerlässlich, dies zu tun. Wir wissen viel über Wikis und Communitys, die Bedingungen für eine inklusivere Community kann man anhand dessen ermitteln und zumindest fördern. Es ist möglich, darauf zu verzichten. Eine Community wird auch einfach entstehen und wachsen, wie auf einem Brachland, auf wilde Art und Weise. Aber Jäten nachher ist destruktiv, Gartenarbeit am Anfang nicht.
Und du hast wieder Recht - wir wissen nicht, ob es funktioniert. Solltest du es also versuchen? Im schlimmsten Fall ist es eine bedeutungslose Instanz, für die du Zeit und Geld aufgewendet hast, im besten Fall ist es ein Wiki, das (einige) soziale Probleme der Repräsentation im Voraus gelöst hat.
Ich muss zugeben: das ist alles viel zu grob gedacht. Ein echter Prozess muss natürlich zielgerichteter und geradliniger gestaltet werden, aber ich hoffe, dass es zumindest als Skizze dienen kann.
Vielen Dank für deine ausdauernde Geduld und deine Offenheit. Ich würde wirklich gerne hören, was du davon hältst. Denis Barthel (talk) 23:04, 20 June 2020 (UTC)
Verzeih die lange Wartezeit für diese Antwort, aber die letzten zwei, drei Wochen war ich damit beschäftigt, meinen Job zu wechseln, um nun Vollzeit an diesem Projekt zu arbeiten. Das heisst, dass weitere Antworten deutlich schneller kommen sollten!
Ich bin gerade dabei eine Liste von Diskussionsthemen zusammenzustellen, die das Projekt bald angehen sollte, und da will ich explizit die Themen die Du angesprochen hast mit aufnehmen. Das Ziel dieser Diskussion ist es dann die weitere Entwicklung des Projekts so zu gestalten, dass eine Diskussion zu Vielfalt, Repräsentation, und nachhaltige Communityenwicklung darin berücksichtigt werden.
Wenn Du dazu bereit sein solltest, würde ich Dich anpingen, wenn diese Diskussion startet. Das bedeutet, dass Du jetzt nicht alle Diskussionen zur Abstrakten Wikipedia verfolgen brauchst. Ich hoffe, dass wir dann zu einem Modus Operandi gelangen, der uns allen zusagt. Was denkst Du? --DVrandecic (WMF) (talk) 22:22, 7 July 2020 (UTC)
@DVrandecic (WMF): - (No english this time) - Erstmal einen herzlichen Glückwunsch, es freut mich, dass Du nach all den langen Vorbereitungen endlich loslegen kannst. Und persönlich auch noch mal vielen Dank von mir für deine Bereitschaft, im Rahmen der weiteren Entwicklung die angesprochenen Themen mit einfliessen zu lassen. Ich bin sicher, dass es dem Projekt dienlich sein wird und freue mich auf deinen Ping, ich beteilige mich gern weiterhin daran. Beste Grüße, Denis Barthel (talk) 22:44, 7 July 2020 (UTC)

Sub-pages

Articles

Talk pages

--GrounderUK (talk) 13:41, 7 July 2020 (UTC)

Carn's comment

"Ваш проект очень хороший, но..." Обычно после "но" стоит сразу остановиться. Я сам писал о том, что вы говорите - en:WP:MONDIAL, Systemic bias.
Вы говорите исходя из позиций монолинвистического проекта. Одного - английского, другого - французского, третьего-испанского. Вы могли бы проанализировать такой мультилингвистический проект, как Викиданные. Это было бы ближе к реальной картине культурного состава будущего википроекта.
В русскоязычном (и, уверен, иных) разделе существует посредничество ru:ВП:АА, которое связано с конфликтом между Арменией и Азербайджаном, и столкновение культур и взглядов на исторические события порождает зачастую совершенно неконструктивные дискуссии. И чего не хватает в подобном случае - это понятных и воспроизводимых механизмов принятия решений.
В России есть много народов, которые вымирают. Их языки сейчас исчезают, и вот создавать проект Википедии на таком языке совершенно невозможно. Даже немаленьким языкам и культурам трудно преодолеть барьер и стать по-настоящему живым местом. У нас нету ресурсов, чтобы привлечь все их них, некоторые не спасти. Должны ли мы тратить свои ресурсы на исправление этого положения? Да, но только самым эффективным путём. Возможно нужно привлекать больше таких людей из разных культур. Может ли Фонд Викимедиа заняться вопросом предоставления техники для участия в Википедии тем, кого вы назвали и кто является недопредставленным в Википедии? Возможно да, возможно нет, но это точно не предмет обсуждения на этой странице. Попытки привлекать школьников часто проваливались, так как качество создаваемого контента было низким. На удивление хорошо пошёл проект вики-бабушек.
Есть искуственные языки, тот же эсперанто. В качестве витрины можно выбрать его. Или есть toki pona к примеру - язык, который занимается разложением смыслов - что это, как не читаемый код семантической кодировки понятий. Кто пытался записать что-то простое в Викидату, подобрать свойство, значение и квалификатор так, чтобы получилось похоже на реальность - тот знает как далека Викидата от генерации текстов статей.
Нам нужно верить в идею Википедии и улучшать мир!
"Your project is very good, but ..." Usually after a "but" it is worth stopping right away. I myself wrote about what you say - en:WP:MONDIAL, Systemic bias

You speak from the standpoint of a monolinguistic project. One is English, the other is French, the third is Spanish. You could analyze a multilingual project like Wikidata. This would be closer to the real picture of the cultural composition of the future Wikilambda.

In the Russian-language (and, I am sure, different) section of Wikipedia, there is mediation ru:WP:АА, which is associated with the conflict between Armenia and Azerbaijan, and the clash of cultures and views on historical events often gives rise to completely unconstructive discussions. And what is missing in such a case is clear and reproducible decision-making mechanisms.

In Russia there are many native-people languages that are dying out. They ​​are disappearing, and creating a Wikipedia project in such a language is completely impossible. Even wikis of comparatively big languages ​​and cultures find it difficult to overcome the barrier and become a truly living place. We do not have the resources to attract all of them, some cannot be saved. Should we spend our resources on rectifying this situation? Yes, but only in the most efficient way. Perhaps you need to attract more of these people from different cultures. Can the Wikimedia Foundation deal with the issue of providing equipment for participation in Wikipedia to those whom you named and who are underrepresented on Wikipedia? Maybe yes, maybe no, but it's definitely not the subject of discussion on this page. Attempts to attract schoolchildren often failed because the quality of the generated content was low. Surprisingly, the wiki-grandmothers project went well.

There are artificial languages, the Esperanto. You can choose it as a showcase-language. Or there is a toki pona for example - a language that deals with the decomposition of meanings - an readable code for semantic encoding of concepts. Whoever tried to write something simple into Wikidata, choose a property, value and qualifier so that it turned out to be similar to reality - he knows how far Wikidata is from generating article texts.

We need to believe in the Wikipedia idea and improve the world! Carn (talk) 22:23, 2 July 2020 (UTC)


@Carn: Yes, I agree with you. But, as you point out, the AA conflict on the Russian Wikipedia is being mediated, and there are articles on the topic in the Russian Wikipedia. I am sure that there are articles on these topics on the Azerbaijani Wikipedias and the Armeinian Wikipedia don't reflect the Russian consensus, but have a more local color.

It is planned that the individual Wikipedias do not use the content from Abstract Wikipedia, and I am pretty sure that the article on Nagorno-Karabach in the Armenian and in the Azerbaijani Wikipedias will not be coming from Abstract Wikipedia, but will be written locally. But there are also a lot of topics where all these Wikipedias agree on and where they may not have too much content - my mother's village has no articles in any of those. So they still can benefit from that, and they can make this decision case by case if they want to. I hope that this is sufficient to handle such locally sensitive topics with the necessary care. --DVrandecic (WMF) (talk) 02:27, 14 July 2020 (UTC)

How was it approved

I am happy that this project was approved. But it is unclear and not transparent for me how it has happened. There are currently at least 45 new projects proposed: Proposals for new projects. This one is one of them. There are also many proposals marked as stale. The support this project gained does not seem to be significantly bigger than what Wikilang had, for example. It is the same order of magnitude Structured Wikiquote has. Sister Projects Committee is not formally approved (though I guess it was in capacity of SPC member that Amqui has closed Wikilang proposal. New project process, which I know that BoT is aware of, since it was edited by then a trustee Sj, is not yet approved. Nor is an alternative process. I acknowledge, that there is no principal divergence from what Proposals for new projects describes as de facto process, but still I have a certain degree of frustration. Rather than relying on a special committee that proactively considers submitted proposals, just as langcom does that for languages, whether a new project gets created depends on how hard the proposal author pushes for it and probably how good their connections with the BoT are. This feels to be flawed and unfair to some extend. Again, I am personally happy that this was approved, I am happy that Denny got this approved, but so I would be happy to see some of the other projects approved or clearly rejected by the BoT. --Base (talk) 19:30, 2 July 2020 (UTC)

@Base: the Sister Projects Committee does not seems actively running. Anyway the right of final decision is held by the Board.--GZWDer (talk) 22:49, 2 July 2020 (UTC)
Thanks for raising awareness for it, Base. Let me do some homework and get back to you on that one. Best, Shani (WMF) (talk) 00:07, 3 July 2020 (UTC)

@Base and Shani (WMF): Thank your being happy for the approval! I can answer this only as someone who went through this process and cannot speak for the process itself or for the other parties involved in the process. But in order for the Board to approve a project, my understanding is based on the proposal template, that it also needs to be submitted to the Board. Wikilang was closed by Koavf with the argument that there was no significant activity around the proposal nor on the demo. Structured Wikiquote was never submitted. The process requires to make a submission to the Board, and if the submission doesn't happen I don't see how the Board can be expected to make a decision. Or do I misunderstand your point? --DVrandecic (WMF) (talk) 03:09, 14 July 2020 (UTC)

See also Talk:Proposals_for_new_projects#Marking_the_page_historical? for a related complain. @Koavf:: When closing a request, please use the "comments" field in the template and not the edit summary, so that remarks will be more visible.--GZWDer (talk) 03:16, 14 July 2020 (UTC)
@Base: Wikilang is a propoal that you may reopen or repropose at any time, but given the activity it is unlikely to be approved in the near future.--GZWDer (talk) 03:26, 14 July 2020 (UTC)

Which website or domain?

On which website or domain will the new project be hosted. Will be part of Wikidata, or will it be a separate website? And Denny, as the Godfather of the project, will you impose the contributor covenant right from the start for your project? Ad Huikeshoven (talk) 19:40, 2 July 2020 (UTC)

See Abstract_Wikipedia/Plan#Task_P1.1:_Project_initialization. This have not been started yet.--GZWDer (talk) 22:45, 2 July 2020 (UTC)

@Ad Huikeshoven: As GZWDer says, the domain will depend on the name as selected by the community. Personally, I would very much like a Code of Conduct to be in place from the very beginning. --DVrandecic (WMF) (talk) 03:12, 14 July 2020 (UTC)

git

Please let there be an opportunity to participate through GitHub, for example.

It is not necessary to make everyone get used to Gerrit. Carn (talk) 21:37, 2 July 2020 (UTC)

Most of Wikimedia software uses Gerrit, other than some standalone packages and libraries (I know Wikibase relies on many of them).--GZWDer (talk) 22:44, 2 July 2020 (UTC)
This is bad and drives away some participants. Git is more popular than gerrit.
It would be nice if there would be both ways thou, and wiki code-editor too.Carn (talk) 06:37, 3 July 2020 (UTC)
Carn, Gerrit is a code review system built upon Git, so we have already been using Git since 2012.
But I guess that you mean to say that we should allow using another code review system in addition or instead of Gerrit. There has been some talk recently about introducing GitLab, which is not the same as GitHub, but quite similar, and much more popular than Gerrit. See mw:Wikimedia Release Engineering Team/GitLab.
But going even further, I'm not entirely sure why is this discussed here, because it sounds like code on Wikilambda will be stored on wiki pages, similarly to templates and modules, and not in a Git-like system. It's quite possible that I am missing something, however. Perhaps Denny can clarify.
A Git-like system would probably be better, at least for some kinds of functions and modules, although it may also make it even harder for new editors to join. The easy editing of a wiki page, without going through the complexities of branches and code review, makes it easy to break stuff, but also easy to fix stuff, which is very much the core of the wiki spirit that I'd really hate to lose. If the functions can have a version control system that is Git-based, but allows easy editing through a web interface without having to run commands and understand branches, it will be a good compromise. --Amir E. Aharoni (talk) 14:26, 3 July 2020 (UTC)
As I see it in an ideal world - there is a wiki page with some history of changes, that can be viewed both as wiki page and it's history (like the usual Module) and some parallel git@λ.wikipedia.org:Module:Something.git - and then you push commits - each commit can be viewed as page diff, and vice versa. If it is hard to implement as one system - there can be a bot, that synchronizes files in repositories with wiki pages.

I even started making a system for transferring testing and debugging modules by normal development tools (visual code studio for example) - but in order to fully use all mw. functions, you will need to somehow adjust the requests through the API, if you know some bot on Wikipedia written in lua - it could help. Carn (talk) 16:42, 3 July 2020 (UTC)

@Carn: The code base for Wikilambda will be developed using pretty much the usual process for extension development by the Foundation. But as Amir points out, the more interesting question is about the code inside Wikilambda - and there I currently have no plans of using git nor Gerrit, but indeed to start with MediaWiki. We will see how far we can get with that, before we need to think about more complex workflows like branching, merges, versioning, etc. Let's start it simple, and add complexity when needed. --DVrandecic (WMF) (talk) 03:16, 14 July 2020 (UTC)

No

Wikipedia was developed mostly with its initial impulse during the dotcom bubble. As time went on, wikipedia calcified, editors developed a more conservative attitude towards changes often deviating attention from big overhauls to petty grammatical or style fights. This was a natural progression since the negative aspect of any change scales with the size of a project, while the positive aspects stay constant.

As progress died down, the wikipedia community changed from focusing on adding content to wikipedia to removing content and biases from wikipedia, the strong focus on sources and the prohibition of original research are the ultimate demonstration of this, Wikipedia became obsessed with the filters on what isn't allowed, so that when something passes those filters, it has value either by virtue of its contents, or by virtue of passing all of its filters.

In this stage, the editors and developers who had originally contributed arduously to Wikipedia simply became bored, and they naturally use their newfound time to devout themselves to new exciting projects, deluded into believing that lightning srikes twice in the same place. Wikipedia has a huge amount of systemic problems, now that the dust is settling it's time to focus on them instead of creating a new sandstorm.

Regarding the proposal itself, it sounds like the kind of optimistic futurism that you would read at the beggining of a dystopian sci-fi novel. A collection of facts not tied to any language? Come on, the only thing you are going to accomplish is to embed the biases of a technocratic community of a single language, (the one we are writing in) under the illusion that it is objective. Why not just call it the Ministry of Truth of Wikipedia?

The collaboration of knowledge between multiple languages is a great ideal, and I hope we see more exchanges between communities of different languages, but it's not going to happen in Wikipedia.--TZubiri (talk) 00:05, 3 July 2020 (UTC)

If you stand for original research, you should go to Wikiversity, not Wikipedia. The technocratic community is better than traditional. A technocratic community will allow the traditional to reflect their views. The traditional community would not allow the technocratic to do so. Carn (talk) 06:49, 3 July 2020 (UTC)
Yes it is a bit futureism, but Abstract Wikipedia is optional for all Wikipedias. Individual Wikipedia choose to show no rendered articles at all.--GZWDer (talk) 01:13, 3 July 2020 (UTC)
The other futuristic wiki, WikiData, has the potential to damage wikipedia, recently Wikipedia was in the news in my country because searching the president of my country in Google returned "thief" [1]. This was due to a wikidata-Google integration. Changing wikidata information cannot even be done from Wikipedia, yet this damaged the reputation of Wikipedia.
Please fix wikidata before moving on to the next next big thing.--TZubiri (talk) 09:48, 3 July 2020 (UTC)
@TZubiri: Wikidata will be fixed by its usage. Just as Wikipedia is fixed because it has readers, users and contributors who constantly watch it, Wikidata will be OK if it’s watched by the same set of users. And one step for this is by integration into Wikipedias, not the other way around. When this is done, Wikidata will have the sum of all Wikipedians in all languages to « fix » it. And all Wikipedias will be better. TomT0m (talk) 10:55, 3 July 2020 (UTC)

There's a lot of things to fix in Wikipedia before we introduce new features, but if you want to introduce the next big thing, at least fix the mess your last next big thing.

https://www.clarin.com/politica/insolito-error-buscar-cristina-kirchner-google_0_Lw46ePc8T.html— The preceding unsigned comment was added by TZubiri (talk) 09:48, 3 July 2020 (UTC)

In my experience, unprogressive users who cannot change their routines in a more advanced way usually are opposed to using Wikidata. Inaccuracies and vandalism can be in any wiki project. This cannot be an argument for pulling us back into a stagnating past. Carn (talk) 13:15, 3 July 2020 (UTC)
It's not that we don't want to take the effort to change, it's that we believe that such a change isn't positive. Wikipedia is successful because of its open nature, users can typically understand the website and its history and its discussion page, it's all transparent. But not everyone wants to learn how to use wikipedia, or what the tags in the code are. Things like "Edit" and "Edit Code" were great subtle advancements that put technical users and non-technical users on the same playing field. Wikidata is opaque, on the other hand, it's wiki component, the ability to be edited by everyone, is diminished because of its high cost of entry. If you make a system more complex, less people will use it, and the more bias you introduce. Initially Wikipedia had a very technical bias, for example, something like BGP was a page to a router protocol instead of a disambiguation between other terms. As time went by, less technically inclined people learned to use the system precisely because it didn't change, and because of changes that aided simplicity.--TZubiri (talk) 23:15, 3 July 2020 (UTC)

@TZubiri: Thank you for raising these concerns. I am convinced that Wikidata has considerably helped the Wikipedias. It has greatly simplified the management of interwiki links, which lead to the removal of hundreds of millions of lines of wikitext. It has increased on several Wikipedias how up-to-date and correct their data is, as I discuss in the first half of this essay. Sure, it might lead sometimes to errors, but so do the Wikipedias. The claim that Wikidata is harder to edit than the Wikipedias doesn't vibe well with the fact that Wikidata has more active contributors than any Wikipedia besides English. I very much hope that the system will be sufficiently welcoming and accessible to allow for a large contributor community, because, I agree with you - if the system cannot reach an inclusive community, the consequences might be problematic. I hope you will follow the project development and will offer critical observations on our designs and decisions that will help us to improve the project. Thanks! --DVrandecic (WMF) (talk) 03:52, 14 July 2020 (UTC)

Construction Grammar

It all seems very construction grammar of course, particularly that of Croft, who had a similar focus on a universal conceptual space and syntax as language-particular.

On another note, I have seen some comparisons to efforts towards a "perfect language." My personal inclination would be to stay away from this when describing the project, as I know many who associate the concept with crankishness. --Chris.Cooley (talk) 07:30, 3 July 2020 (UTC)

@Chris.Cooley: I agree that we are definitely *not* building a perfect language. But one issue is that this project will be compared with these efforts, and there is a lot one can learn from the previous projects in that direction.
Regarding Construction Grammar, in the first 12-18 months of the project we will have time to create a survey of existing approaches, and I would really hope we manage to create a good overview that will help guide us in the second part of the project. --DVrandecic (WMF) (talk) 03:56, 14 July 2020 (UTC)

Building on the reference to Croft above, it would be interesting to look into his work on Comparative Concepts. He builds on Haspelmath's ideas and systematizes them in a forthcoming book "Morphosyntax: constructions of the world's languages". --TiagoTorrent (talk) 19:25, 13 July 2020 (UTC)

@TiagoTorrent: That sounds like an interesting book! --DVrandecic (WMF) (talk) 03:56, 14 July 2020 (UTC)

Comparison

When we discussed this project on de:wikipedia:Kurier, I compared this idea with the project to establish prices in relation to values, drawing on Marx's propositions, as the Soviet Union and other societies tried to do. Denny, you told me then that you don't see the point of this comparison. I do still think that it applies very well. Marx's "value theory" was essentially, as Chomsky's deep structures, an effort of analysis, not implementation. They tried to investigate a real and living system of permanent decisions of very different actors in order to find the underlying rules, laws, and principles of this system. Both value system and deep structure are purely theoretical ideas, they have no existence of their own, they are just analytical endeavours to find out essentials of a living system. To be sure, both Marx and Chomsky were somewhat ambiguous about this status of their theoretical concepts and sometimes tried to derive actual measures from them, a project that was taken up by some of their followers. But in both cases, this did not prove successful. The idea of a socialist market economy resulted in a kind of en:Ersatz that could only be implemented by force in the form of a standardized and homogenized project. The targets were initially humane and good but the results were the opposite of emancipation and autonomy. It is similar here, to my mind: Language gaps (as "market gaps") can definitely be "power gaps" and can impede an autonomous development and it can be useful to look for ideas how to bridge them. But nothing good can come from an idea to simply override them. This is a "dream of reason that might produce monsters" (to adapt a phrase by Goya). It will result in less autonomy rather than more autonomy because it can only become reality "top-down".Mautpreller (talk) 16:32, 3 July 2020 (UTC)

Only to avoid misunderstandings: Marx and Chomsky are very important for me as analysts and "fathers" of ideas. My intention is not to deride or denigrate the idea but to point to a problem of implementing theoretically thought-out ideas in living reality.Mautpreller (talk) 16:57, 3 July 2020 (UTC)
I am very skeptic about the second part ("Abstract Wikipedia"); one reason is technical barrier, and the other is performance. However I will not oppose.--GZWDer (talk) 18:31, 3 July 2020 (UTC)

@Mautpreller: In many ways, the Goal is not really following Chomskyan ideas - I would not be surprised if Chomsky would dismiss the whole project as uninteresting from the point of view of linguistics, because, well, in many ways it is. The goal is not to capture the whole of human language, but merely a very small sliver of it, and then again, the goal is not to be able to understand text, but merely to generate it. In the end, all we claim is that we can have an abstract content such as clause(house, big) and that this can be used to generate natural language sentences such as The house is big. and Das Haus ist groß. and Kuća je velika. etc. Sure, yes, Chomskyan ideas of a universal grammar inspired that, but we are far away from really exploring that properly.

I am not sure what you mean with your last sentence, that it would result in less autonomy and that it can only become reality "top-down". All autonomy will remain with the local communities. --DVrandecic (WMF) (talk) 04:51, 14 July 2020 (UTC)

WikiLang

I'd have preferred WikiLang. Not so ambitious, less hubris, but certainly useful. Whereas Abstract Wikipedia is highly ambitious, interesting and engaging but hardly useful. But I know that hubris is the second name of the WMF.Mautpreller (talk) 19:31, 3 July 2020 (UTC)

@Mautpreller: It is great that you prefer Wikilang. You are free to contribute to Wikilang here. I am sure that the project will benefit from your contributions. --DVrandecic (WMF) (talk) 04:56, 14 July 2020 (UTC)
Some thoughts: tldr - there are two barrier of new WMF projects, scalability and activity.
  1. Before Abstract Wikipedia there is a large number of proposals (WikiCode is 14 year ago). Wikilamdba is really not a fresh new idea. It was already thoroughly discussed.
  2. Abstract Wikipedia (narrow sense) is mostly a technical project (its aim is not to create new genres of free knowledge). WikiLang is a new project intended to create new genre of knowledge, so it should be considered by Board (if ever) cautiously - Scalability is a concern. WMF is not ready to host hundreds of different sister projects.
  3. For content projects, a more realist idea is try to incubate them in a existing Wikimedia project, by widening the scope of one (e.g. WikiLang and WikiJournal be part of Wikiversity. Once it reached significant activity (there are zero recent recent activity of Wikilang), a discussion about a dedicated project may only be open (and we need a consensus for it - Wikicookbook was rejected for lack of consensus). However, there may still be a concern about activity of a dedicated project: In 2013, there was a proposal to close all Wikinews. (This point is irrelevant with Wikilambda, as the function of Wikilambda is not replacable with other projects.)
  4. Therefore:
    1. I propose that all new proposed "mainly content projects" should be incubated in Wikispore (note Abstract Wikipedia is not a "mainly content project".)
    2. And I also think we need to revise New project process - in particular we may work on a temporary version that does not involve a SPCom.

--GZWDer (talk) 20:48, 3 July 2020 (UTC)

@GZWDer: yes, I agree. It would be good to rework the new project process. I am very much in favour of using Wikispore for incubation. --DVrandecic (WMF) (talk) 04:56, 14 July 2020 (UTC)

Reducing downstream complexity and sharpening project goals

This is an exciting proposal. I love the idea of collaborating on functions. I would suggest framing the project as an additional layer for knowledge collaboration projects like Wikipedia. It does not replace Wikipedia as we know it, nor text-based collaboration (it never can!), but it gives its authors new ways to manage certain types of information more effectively at scale. This does not diminish its value -- this is its value.

One key concern I have with the proposal details is what I would summarize as downstream complexity impact. It is one thing to have a cool new project to collaborate on how to generate articles where this approach is feasible; it's another for the additional complexity to make its way into many Wikipedias (Wikilambda/Plan#Extensions to local Wikipedias), including small ones that are trying to build community. Imagine contributing to a growing Wikipedia and trying to make sense of additional "magic words", new link syntax, and so on.

Yes, there are ways to hide such complexity in tools like Visual Editor. However, the cost of adding complexity to wiki markup is orders of magnitude smaller than the cost of making it easy to use. So, when you use a phrase like "This can be integrated invisibly into the visual editor", I can't help but feel that this is a bit hand-wavy. And it's not just about markup -- it's about whether a user can develop a mental model of how the encyclopedia is constructed through simple exploration.

I think one way out of the dilemma may be to take a design first approach to thinking about Wikilambda, instead of taking what I would describe as a technical architecture first approach, and try to find a way for the two to meet in the middle. For example, I think it would be extremely valuable for a team of talented designers to take a stab at thinking about exactly how to present articles that are derived from functions.

I recognize that you mention design research at critical stages of the project -- in my view, this is a critical stage of the project. The architecture and goals of the project must be influenced from the very beginning by design explorations, when it is still flexible enough to be fundamentally reshaped and reconsidered.

Some ideas:

  • Think of "Abstract Wikipedia" articles displayed in a local Wikipedia context as placeholders. This is not a new idea, of course; the ArticlePlaceholder extension introduced it, even if its actual functionality is very basic. You may disagree with this, but philosophically, my view is that anything displayed in the context of a traditional Wikipedia must be considered a placeholder until such a point that the community chooses to replace it with text that is open to collaboration in the traditional wiki way.
  • Translate the concept of placeholders into consistent UI design patterns. If a community opts into this functionality, there is no reason why search results (including autocompletion), links, hovercards, and other core MediaWiki UI elements can't make it clear when placeholder content is available for a given subject: through icons, subtle hints, perhaps (gasp!) even a new link type.
  • Consider whether placeholders should be a different page type altogether. A key concern with the bulk creation of content in many communities is that it creates maintenance burden and inflates statistics. That concern may be lessened if community members are empowered to create and review pages of an atomic placeholder type, which are counted separately in statistics and can only be published, transformed into an article, or unpublished. This would potentially take away some features Wikipedians love such as categories, but it may be a tradeoff worth making to keep this feature simple and easy for communities to gradually adopt.
  • Impose a "no new markup" constraint on a the first useful iteration of the project. A constraint like this can inspire creative solutions that would not otherwise have been considered. Complexity debt is rarely recognized in the same way that other categories of technical debt are.

I'm not saying these are the right ideas -- but I think these are the kinds of directions you'll explore more if you bring design and user experience thinking into this project at this stage. Moreover, sharing design ideas will provoke strong (and useful!) reactions because it moves the project out of the realm of the completely abstract (no pun intended).

This might also help to sharpen the near-term project pitch, such as "Article Placeholders on steroids", to make it clearer to communities how they would benefit from the existence of Wikilambda. I recognize that the vision of the project is greater than just this use case -- but just as Wikidata benefited from having an immediate use case that was clearly understandable (interlanguage links), having a similar immediate use case could help Wikilambda, as well.--Eloquence (talk)

@Eloquence: Thank you for your great suggestions. I pointed to your comment on the task about the design study, to make sure that it is taken into account. I also added to each of the three components - F5, F6, and F7 - that introduces new markup a remark to consider whether we can achieve this functionality in a different way. That's a great point.
As you point out, the project has a task to design this interaction explicitly. But you're right that we shouldn't predefine the results of these design activities. The timing is that the first line of code regarding the integration into the Wikipedias won't happen until well into the second year of the project, and the design task starts half a year earlier than that. The assumption is that this half a year will give enough time to explore the possible designs and discuss them with the communities. I really like many of your suggestions, especially the "no new markup" constraint and the considerations about how to count this kind of content for the existing statistics. I hope that with this time frame there is enough time for these discussions.
Hmm, maybe that design work should start even earlier. It could be used for writing user stories, and to make mock ups, and to make the whole thing much more concrete and thus involve the community earlier and more effectively. I think that's good. We did the same for Wikidata, and the mock ups really helped, but we didn't have it as an explicit point in the project plan. I think it is a good idea to add that explicitly. Will do so as soon as possible.
Thank you for your support this time, and the last. I am glad to see that you support the idea, and I will be happy about and appreciate your advice going forward! --denny (talk) 18:32, 14 May 2020 (UTC)
I know added explicitly that we should avoid new magic words, as one example of the ideas you suggested 1, 2 --denny (talk) 01:14, 15 May 2020 (UTC)
@Denny: Thank you for the responses, and for all the time you're taking to engage with feedback in the early stages of this project. :-) I agree, it would be great to see user stories and mockups early in the process. For a project I'm currently working on, we spent a couple of months on user interviews and RITE-style testing and iteration, and it really helped in prioritizing user stories, understanding users' mental models, and figuring out the final user experience we wanted to implement.
I'm wondering what the best ways would be to open up an iterative design and research process for a project like Wikilambda, potentially with public and/or targeted calls for participation. Is that something you previously tried for Wikidata, or that could be appropriate here? ---Eloquence (talk) 19:47, 15 May 2020 (UTC)
@Eloquence: Thanks for pointing to RITE! That sounds like a very promising method.
Yes, for Wikilambda we aimed and sometimes achieved a form of participatory design. We had storyboards on-wiki which were discussed and refined. Here is a link to one example. It is the third iteration, the two previous ones are linked, and you can see on the discussion pages (for v1, for v2) that we had quite a bit of discussion going on.
I sure hope to repeat that, as it was improving the result quite a bit (as you can see with the three iterations). --denny (talk) 00:42, 18 May 2020 (UTC)
@Denny: Those look great and the on-wiki discussion process looks like it was super-valuable! In case it's helpful, for recent rapid prototype testing work I've helped organize, we set up interactive clickthrough prototypes (we mainly used Invision, but lightweight HTML/CSS prototypes work well, too) and then scheduled 30-60 minute video call sessions with individual prospective users. Those sessions followed a script, where the user screen-shares and narrates as they click through the prototype. We then prepare a synthesis spreadsheet from all the tests, and use that to generate the next prototype. Rinse and repeat. Here's an example of a synthesis spreadsheet we used.
We also used remote user studies via UserFeel (for quick tests where we could use a broad public pool of testers, and didn't need to supervise individual tests).
You may have done all this with Wikidata or in other contexts, just sharing in case it's useful. :-) I find this kind of research incredibly valuable, because it gives you an opportunity to really observe a user's first reactions to a concept that's visually realized, without them necessarily even having much of a mental model yet of what you're trying to do. In future tests I also hope to use some more at-scale testing tools for card sorting exercises or validation of UI language. I know WMF and maybe WMDE have user research teams that may have insights on ways user research could contribute to the Wikilambda project plan early.
Really excited about this initiative - thanks again for all your work putting this proposal together! --Eloquence (talk) 08:51, 20 May 2020 (UTC)
denny, when you are writing Thank you for your support this time, and the last (my emphasis), it is a very cute understatement :)
I agree whole-heartedly. --denny (talk)
I agree with Eloquence, especially about these points:
  • The cost of adding complexity to wiki markup: this more or less corresponds to what I wrote here earlier in the section #VisualEditor support. User:Strainu wrote something similar, too.
  • Design first vs. technical architecture first approach: This is the approach I took when writing mw:Global templates/Proposed specification, and I even explicitly said so there in the lead section: "This document does not try to go into the details of technical implementation", and so on.
  • no new markup on the first useful iteration: this is also the approach I took with the "Global templates" proposal, where I wrote: "The syntax for developing templates and modules, and the general template maintenance and deployment cycle will not change".
I also agree that ArticlePlaceholder-like functionality should be developed early, and that's one of the things at which I hinted in the section #Implicit article creation. However, a modules repository should probably come even earlier, not because that's my big dream, but because it's probably just impossible to implement implicit article creation without at least a basic global code repository. --Amir E. Aharoni (talk) 07:59, 19 May 2020 (UTC)
Yes, agreed. This shows me that we really should start very early with the design explorations and user stories for Abstract Wikipedia. I originally thought about three months prior to the start on Part 2, Erik already convinced me to move it six months earlier - maybe even sooner is warranted (given Wikilambda, that person wouldn't be bored in the meantime either). So yes, maybe we should start with those explorations and designs and user stories even earlier. Thanks for the feedback on this! --denny (talk) 23:04, 24 May 2020 (UTC)

Good suggestions above. I'm also confused by what's the difference between this project and the ArticlePlaceholder alias Wikidata/Notes/Article generation alias Scribe (terrible name by the way), apart from "let's throw massively more resources at it". Nemo 07:33, 3 July 2020 (UTC)

@Nemo bis: Much more flexibility in the creation of the content, and also the ability to have much more knowledge than what can be expressed in Wikidata. These are the two main differences to the great work on ArticlePlaceholder. --DVrandecic (WMF) (talk) 03:35, 15 July 2020 (UTC)

MassMessage to every project?

@Elitre (WMF): I really think that messaging every single project to announce the approval of Abstract Wikipedia is really overdoing it. --Yair rand (talk) 19:59, 9 July 2020 (UTC)

Thanks for your feedback. This is the first new sister project in several years and exposing it to all the languages, especially the smallest ones, is crucial for its success, hence why it makes sense to us to use mass message at least for this very first announcement. (I wish I had a better tool to do this of course.) --Elitre (WMF) (talk) 20:05, 9 July 2020 (UTC)
...There are substantial flaws in that argument, but you just went ahead and spammed the rest of the wikiverse already, so there's not much point in continuing this, I guess. --Yair rand (talk) 20:14, 9 July 2020 (UTC)
By all means, feel free to point them out. Denny already wants to hear everything about how to reach the farthest corners of our movement in a sustainable and sane way. --Elitre (WMF) (talk) 20:17, 9 July 2020 (UTC)

Not mentioning that the text itself stands in opposition to everything that Wikipedia is. Creation of knowledge? So OR is out of window? Lukasz Lukomski (talk) 23:08, 9 July 2020 (UTC)

@Lukasz Lukomski: No, Original Research is not expected to be allowed. That's a pillar of Wikipedia, and not to be changed by this project. That is not meant to be implied by the announcement. --DVrandecic (WMF) (talk) 13:19, 14 July 2020 (UTC)
@Yair rand and Lukasz Lukomski: As a bureaucrat of a small Wikipedia I appreciated the announcement, which has the same prominence as the monthly tech news. I think the opportunity for this new project to bring different language editions of Wikipedias together in terms of content is certainly worth more prominence than monthly updates on software features! Deryck C. 22:10, 10 July 2020 (UTC)
The text is unfortunately written. It a mystery to me why the text didn't mention the strong ratio of support that this proposal got with 105 pro votes and only 6 against. Referencing it would have been a token gesture to care about community consensus. Talking instead of about knowledge construction is a bad way to signal what the WMF cares about.
It might be the same prominence as monthly updates on software features on some Wiki's but for other's tech news are less prominent. I think a broad announcement of Abstract Wikipedia was the right decision as it allows people who might not yet heard of it participate in the setup of the early foundations of the project. Further updates might go better via the technews channel. ChristianKl09:12, 11 July 2020 (UTC)

@Yair rand, ChristianKl, and Deryck Chan: Thanks for your perspective. I am indeed very interested in how to balance how much communication we should be doing versus sending out too much information too widely. I would like to avoid the situation that we had with Wikidata where people came in a few months after launch and said they were never informed, but I also don't want to keep pushing about it all the time. So I would be very happy to hear more thoughts on that and what appropriate channels are and how much to send out. These topics are also early discussion topics for the project moving forward, and if you agree, I will ping you about this when we get there. Thanks! --DVrandecic (WMF) (talk) 13:19, 14 July 2020 (UTC)

A few questions and concerns

First off, while I have issues with this, I will say congratulations on getting this idea approved so swiftly and for the successes of Wikidata the past seven years or so.

  1. How did this get approved so quickly? As noted in the mass message sent to many (all?) WMF projects, this is the first project approved in seven years and Wikidata and Wikivoyage were the first projects approved in eight. Yet, this proposal seemed to glide thru the approval process in a couple of months? I'm flabbergasted. There are dozens of proposals (e.g. WikiJournal) that have community support but haven't been implemented. How is this different?
  2. Which domain will this live on?
  3. Is this actually a separate knowledge project like Wikiquote or Wikinews or is it just a kind of backend tool like Content translation or some interface to Wikidata? It seems like the latter to me: it is just a way of essentially rearranging and publishing data that you draw from an existing project (e.g. Wikidata or a given edition of Wikipedia). If it's the latter, then is this really a "sister project" anymore than Visual Editor or the search engine are "sister projects"?
  4. Why Abstract Wikipedia? Why not Wikibooks? Or Wikiversity? Is this somehow particular to encyclopedia articles but not travel guides or news articles?

Again, I don't want to be excessively negative here and I can see some value in an endeavor like this but it's really shocking to see this play out. —Justin (koavf)TCM 08:16, 10 July 2020 (UTC)

  1. For this point, see Talk:Abstract_Wikipedia#WikiLang - tldr: Someone created a Wikilang as a subproject of Wikiversity, but it is mostly abandoned. Hosting many inactive project is not a good idea. See Wikispore for a proposed place to incubate new projects.
  2. Undetermined for now - see Abstract_Wikipedia/Name
  3. See Abstract_Wikipedia/Architecture - Wikidata will store abstract content, rendered by functions defined in Wikilambda. Abstract Wikipedia is the name of the whole development project; there will not be a actual Wikimedia project named Abstract Wikipedia.
  4. It is possible to create other texts, see Abstract_Wikipedia/Tasks#Task_O12:_Health_related_text_creation; See also Talk:Abstract_Wikipedia/Plan for my previous comment.
--GZWDer (talk) 22:48, 10 July 2020 (UTC)
@GZWDer: I'm one of the most active Spore-ers but that doesn't answer my question: this sister project was almost immediately approved with no real proof of concept, etc. Note how this contrasts with what I mentioned (WikiJournal), which is a fully functional project within Wikiversity. (I don't think it should be spun off but nonetheless, there is very strong community support for it.) Additionally, I have imported all of this content into wikispore:, so I have some familiarity with it but these pages don't really answer the question. It seems like you're confirming that this is a tool or technological approach rather than a sister project as such. —Justin (koavf)TCM 02:29, 11 July 2020 (UTC)
It seems to me that on the one hand this proposal had strong hand and on the other Denny had conversations with the WMF that lead the WMF to agree to make this project happen and hire Denny.
We currently seem to lack an active process to create new projects that doesn't include a lot of backroom conversation. Given the strategic vision I would guess that such a mechanism seems desireable from both the WMF point of view and also from the community point of view. Maybe there should be a committee that's tasked with creating a process the same way that there will be a committee tasked with the code of conduct? (Of course in either case it would be great to have afterwards some process to seek community agreement for the adoption). ChristianKl09:22, 11 July 2020 (UTC)
See Sister Projects Committee, which is never actively running.--GZWDer (talk) 09:54, 11 July 2020 (UTC)
It's one thing to write such a policy and have an open list where people can express interest to be on the committee. What would be needed is a system in which both the commmunity and the WMF does have buy-in. ChristianKl11:21, 13 July 2020 (UTC)

@Koavf, ChristianKl, and GWZDer: Thank you for the congratulations! And thank you for your questions.

Re 1.: It didn't feel quick at all to me. I have been presenting this idea on this wiki first in 2013, when we needed the lexicographic extension on Wikidata in order to proceed with that proposal, which was finally launched in early 2018, and then I started to give presentations and write publications which I shared with the communities about the multilingual Wikipedia in 2018. Eventually, I made a detailed project proposal here, and ChristianKl suggested to follow proposals for new projects, and although I was first hesitant due to the state of that process (as pointed out here as well by GWZDer), but I thought, well, the best thing we have. I also talked with people at the Foundation, to make sure we are aligned about this, and then I submitted the proposal to the Board, and given the state of the discussion here, and I guess based on the proposal and previous discussions over the years, the Board voted to approve.

You say there are dozens of proposals with similar support. But I only see WikiJournal having been submitted to the Board, and WikiJournal landed on the Board’s agenda. I don’t see any other proposals that have similar or better community support and that have been submitted.

I think WikiJournal is a great initiative. It is great to see the strong support by the Community for the project. If I understand correctly, the Board decided that the proposal was not yet in a state that would warrant immediate approval, and asked to go through the Product Committee (see previous link) for an additional round of evaluation and possibly for a round of comments. The Foundation will be assessing WikiJournal as directed by the Board. I am looking forward to seeing progress on the proposal.

But yes, I fully agree with the discussion above - I really think that this process needs an overhaul and should be improved. I think that WikiSpore should play a major part in the new process, particularly for projects that don’t require implementation work. Or we can accept the situation where this just doesn’t work through processes, but has to be done ad-hoc and per project. I don’t think that’s a good idea.

Re 2.: we first need to decide on the name. The domain will follow from that.

Re 3.: there are several parts to the overall proposal, and one of them is a new separate knowledge project, i.e. a new Wikimedia sister project, which is called Wikilambda in the proposal and is meant to create a new collaborative space for creating, maintaining, and using a library of "functions". You can read a little bit more about that at Abstract Wikipedia/Plan#Task P1.2: Initial development. The second part, which we're currently referring to as the Abstract Wikipedia, is currently suggested to just be part of Wikidata, but there’s discussion needed about that. We’ll need to figure that out. You can read some detailed ideas of how the communities might each choose to use the new functionality/content (from the combination of both new parts, plus other existing parts) within their existing Wikipedias, at Abstract Wikipedia/Plan#Components. Overall, it might help you to think about it as the next step in the evolution of mw:Extension:ArticlePlaceholder, e.g. ht:Special:AboutTopic/Q14384.

Re 4.: It can be used for other content too, but just as we did with Wikidata our first priority will be the encyclopedia, as it has a bit more traction than Wikiversity. The other projects, again just as with Wikidata, will be able to use it too, but they are not the first priority.

Thank you for your questions, Justin, and I hope the answers make sense. --DVrandecic (WMF) (talk) 03:28, 15 July 2020 (UTC)

Google's involvement

Moved from #Statement of opposition

This project is proposed by Google (see here). It is very shocking that Wikimedia is becoming a lobby organisation for this company, which stands for monopolisation and against data protection on the Internet. Habitator terrae (talk) 11:36, 9 May 2020 (UTC)

@Habitator terrae: I am sure it won't change your mind, but just to correct the facts: this is a proposal by me in my personal capacity, and it is not a proposal by my employer. Also, to bring receipts: I made a previous version of this proposal in 2013, before I was employed by Google. I have been working on this for many years. --denny (talk)
@Denny: This statement is contradictory to your paper: Why did you write "Denny Vrandečić Google" there??? Habitator terrae (talk) 14:39, 9 May 2020 (UTC)
@Habitator terrae: Because that is my name and my employer. What else would I write on a research publication? --denny (talk) 14:56, 9 May 2020 (UTC)
OK, I think I get it: I make a distinction between an official Google proposal and a proposal by someone employed by Google. You seem not to make that distinction. I guess that's the point where we can agree to disagree here. Sounds right? --denny (talk) 15:04, 9 May 2020 (UTC)
The point is, that this isn't only your personal idea. This paper is obvious written and published by your role as employee by Google. This isn't only your hobby. It's your work. Right? Habitator terrae (talk) 15:27, 9 May 2020 (UTC)
@Habitator terrae: I don't understand what you mean with "this isn't only my personal idea". I want to make sure we have a common understanding of that before it leads to more misunderstandings, particularly since you say that this is the point, and I don't get it yet. I would appreciate some clarification on this.
And yes, research is my work, it is not just a hobby of mine, and publishing research results is an integral part of research and thus of my work as a researcher. --denny (talk) 15:47, 9 May 2020 (UTC)
To clarify: You was paid by Google to research on this topic. Because of this research you propose this project you propose this project in you're role as a researcher at Google. Therefore it is obviously, that this is a proposal "paid" by Google and not only because of your personal intention. In conclusion, the proposal is from Google. Habitator terrae (talk) 21:00, 9 May 2020 (UTC)
Denny has claimed the exact opposite just a few lines above (“this is a proposal by me in my personal capacity, and it is not a proposal by my employer”), and I don’t see any reason not to trust him. This is a really bizarre conversation here. —MisterSynergy (talk) 22:48, 9 May 2020 (UTC)
The reason is, that his whole role at Google is about Wikimedia. I see no reason to believe, that is uninfluential on this proposal. Habitator terrae (talk) 22:26, 10 May 2020 (UTC)
@Habitator terrae: Yes. As I described in that mail you link, my current role is to 'facilitate the improvement of Wikimedia content by the Wikimedia communities', 'improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.' I think with this proposal I am doing exactly that. --denny (talk) 00:50, 11 May 2020 (UTC)
So we could conclude, developing the concept proposed here was part of your work at Google. Habitator terrae (talk) 15:14, 11 May 2020 (UTC)
@Habitator terrae: if in was propousal from Google, it would be better, becouse Google would put some effort into it. Not to tell about the support from multimillion corporation is not very clever, if this support implied. So, as I see, you just being unethical toward Denny, in fact calling his words a lie.Carn (talk) 06:43, 18 May 2020 (UTC)
@Carn: Two questions:
  1. Why did you ping me, if you already see I'm "just being unethical"?
  2. Where I "in fact calling his words a lie"? My whole argumentation is based upon his words.
"if in was propousal from Google, it would be better, becouse Google would put some effort into it."
Google uses Wikipedia-Information for example in en:Knowledge Graph. The Community of this Wiki would transfer the information in a much more useful form for Google. I guess it is possible such Community would be much more effective, compared to bots. Further with your argumentation there would be no justifications for projects like mw:Content translation/Machine Translation/Google Translate, in the wake which Wikimedia called Wikimedia and Google partners.
Habitator terrae (talk) 07:59, 18 May 2020 (UTC)
This your words - Therefore it is obviously, that this is a proposal "paid" by Google are the implimentation that Denny lie to you. Sorry did ping you, but this project is not main for the most people, I myself prefer to know then someone talk to me. Thank you that you admit that Google and Wikimedia are helpful to each other. Carn (talk) 08:07, 18 May 2020 (UTC)
Denny already stated, what you call "the implimentation that [he] lies to [me]", only with other words:
"As I described in that mail you link, my current role [paid by Google] is to 'facilitate the improvement of Wikimedia content by the Wikimedia communities', 'improving the coverage and quality of the content, and about pushing the projects closer towards letting everyone share in the sum of all knowledge.' I think with this proposal I am doing exactly that [what I'm paid for]."
Further many actions by Google fight the mission of a free Internet, by monopolizing the use of knowledge, collecting personal data from every person possible or personalizing humans into filter bubbles with unopen algorithms. Therefore it is contradictory to Wikimedias mission to be a partner with Google.
Habitator terrae (talk) 13:04, 18 May 2020 (UTC)
I perceived this words as saying that a he should do socially useful work for 20% of his time in Google. And he himself decided that this proposal was such a socially useful work, and not his boss said - "We have a plan to conquer the Internet, you are responsible for sub-clause C of this plan, start work on lobbying our interests on Wikipedia!" - this impression is created from what you say about the situation.
If Wikipedia putted itself against corporations, then we would forbid the commercial use of the created content, which is not, so it seems to me your windmill war is not common for wikipedia society.Carn (talk) 08:10, 19 May 2020 (UTC)
Hello Carn, first I must correct some of your statements:
  1. "I perceived this words as saying that a he should do socially useful work for 20% of his time in Google."
    No, the 20% Project (as linked by Denny) is time of work, for example AdSense for advertisement (not social) was created with this 20%.
  2. "And he himself decided that this proposal was such a socially useful [=20%, as stated before this is incorrect] work".
    Yes, but now it is his fulltime job, decided by himself and Google, see statement here.
  3. "and not his boss said - ' We have a plan to conquer the Internet, you are responsible for sub-clause C of this plan, start work on lobbying our interests on Wikipedia! ' - this impression is created from what you say about the situation."
    Sorry, the intended expression was: Google already "conquered" the Internet.
In my view Wikipedia should put itself for corporations, that do not fight against the freedom of internet. This isn't question of commercial or not.
Also the Copyleft-licenses of Wikipedia express its will, that the freedom of knowledge should be protected, also if it is used commercial.
Habitator terrae (talk) 17:20, 27 May 2020 (UTC)
Denny: I think you need to clarify whether you are being paid by Google to develop Wikilambda and Abstract Wikipedia. What you're saying gives the impression that you're trying to muddle the waters. On the one hand you're trying to distinguish "an official Google proposal and a proposal by someone employed by Google" to give the impression that you're working on it outside your Google working hours in your Wikimedia volunteer capacity, but on the other hand you stated your affiliation as Google on your paper (why not use, say, "Wikidata" or "Croatian Wikipedia" if you were doing it as a volunteer? I certainly make that distinction when I make my Wikimania submissions depending on whether my university or a WM affiliate is paying for my attendance that year) and you said your role at Google is to improve connections with Wikipedia and Wikidata. We need to know whether Google will be contributing your working hours to Wikilambda (in which case there will be resource negotiations with Google regarding hiring additional staff developers) or not (in which case Denny will not be able to manage all of it himself). Deryck C. 13:40, 27 May 2020 (UTC)
I'll try to clarify. Please keep asking if I don't clarify sufficiently.
I worked on this mostly in my 20% time. Googlers are being paid while working on their 20% time. I publish the paper with my Google affiliation because I was paid by Google while working on the paper.
The project is not an official Google project / product. There are many publications and Open Source projects that are written by Googlers with this status Check out this search. An official Google product has organized company support, usually with a team, etc. This project doesn't.
I hope this makes the situation clearer, and doesn't muddle the waters. --denny (talk) 17:12, 27 May 2020 (UTC)

Denny: Thank you, your explanation about your current position with this project is very clear. I think the next question that the Wikimedia community ought to know is: How much support should Wikilambda expect from Google if this goes ahead? Will we get some of Denny's 80% from Google to work on Wikilambda? How many Google employees in addition to Denny should we expect to have on the project if WMF and the WM community approve this? Deryck C. 22:02, 27 May 2020 (UTC)

@Deryck Chan: Sorry for not answering this question earlier, but I guess now you know why :) For the record: I left Google and joined the Wikimedia Foundation in order to lead the development of this project. --DVrandecic (WMF) (talk) 03:33, 15 July 2020 (UTC)
@DVrandecic (WMF): Will Google continue to contribute to Abstract Wikipedia in any way? Deryck C. 11:09, 15 July 2020 (UTC)
@Deryck Chan: I honestly don't know. --DVrandecic (WMF) (talk) 17:23, 15 July 2020 (UTC)

Hybrid article

My proposed solution is a "hybrid article", where every part of the article may contain two parts:

  • A "abstract" or "semantic" part, which is a rendered result of abstract content.
  • A "opaque" part, which is language-specific text.

Parts of article may be intergrated with abstract content in these modes:

  • Having "abstract" part only - this will be the default for most short articles especially bot-generated ones, and some parts of article like the infobox.
  • Having both "abstract" and "opaque" part - the "opaque" part is shown, overriding the "abstract" one. We may have some way to indicate whether the override is permanent (usually it should not). If it is not, eventually the abstract content should be edited and the override removed.
  • Having "opaque" part - all existing articles can be easily transformed to this mode. This will generate some placeholder abstract content like {{#switch:{{int:lang}}|en=Text1|de=Text2}} where the content is automatically updated with local Wikipedias (if exists and local Wikipedias uses them), and may be converted to real abstract content in the future.

For editing a article:

  • Sections containing "abstract" part only - you may create an opaque content to it (with rendered result as default text), which may be converted to abstract one by technically-competent users; or more recommended, you may modify the abstract content, however it requires some technical skill (until we developed a rather advanced high-level editor).
  • Sections containing both "abstract" and "opaque" part - you may modify either the opaque or abstract content, though modifying the abstract content have no effect on rendered text (therefore, it may be a challenge on vandalism detection if some content is orphan); you may also remove the opaque content, making the abstract one rendered.
  • Sections containing only "opaque" part - You may modify the "opaque" part (which automatically updates the placeholder abstract content; translate the "opaque" part to another language even if there is no (explicit) article in that language; or create a real abstract content (recommended but maybe not user friendly).

BTW: we may expect there're very few languages that the Abstract article can be fully rendered by completing all renderers the article uses; the distribution of constructors - if no opaque content is involved - will follow Zipf's law and Heaps' law. --GZWDer (talk) 21:22, 5 July 2020 (UTC)

[I have created a new topic since what follows does not appear to relate to "Distillation of existing content"]
I think you are right to identify two separate functional components. Here, I shall call them InfoText and ArticleSkeleton.
InfoText is like an InfoBox, only it is natural language text. The values in the InfoText come from Wikidata and the editor is free to remove or replace these. The text around the values initially comes from a rendering of a language-neutral structure, also from Wikidata (This, I think, is your "semantic" part). I imagine this will be natural language WikiText which will be changed by the editor in the current way (your "opaque" part). However, I can see advantages in being able to mark sections of text as needing to remain automated (for the time being or indefinitely: your "hybrid").
ArticleSkeleton is the framework for an entire Article. Most simply, it is a sequence of article components (Infoboxes, media, categories and so on, and, of course, InfoTexts). It is probably inevitable that there will be optional and conditional components, but I see this as a requirement common to each level. From the top down, you get (skeleton) Articles, sections and sub-sections; from the bottom up, you get Wikidata values, properties and items, InfoText and nested compositions of InfoText. (Articles might also come in sets, the nested composition of which would ultimately be a Wikipedia or WikiWhatever.)
I don't see any need for an article having only an "abstract" part; it may be an optimisation but it looks to me exactly like a re-rendered ArticleSkeleton (a transclusion or "virtual article"). But if there is a benefit to transcluded InfoText at any level, you might get the option to keep the article virtual by default. The major benefit of the hybrid article, as I see it, is that the editor does not have to take responsibility for the entire article at the outset; multiple edits can be used to convert the instantiated virtual article into a fully adopted project article. In practice, I think the editor would instantiate a particular InfoText by changing its WikiText. Or the virtual article would be fully instantiated at outset and the editor would have the option to "re-virtualize" any particular InfoText (which marks it up for bot-editing).--GrounderUK (talk) 14:34, 6 July 2020 (UTC)
@GZWDer and GrounderUK: thanks for these descriptions, and yes, I agree. The assumption that an article is either coming from the Abstract Wikipedia or written locally is too simple, and really might be much more interesting, and you describe a number of interesting workflows and interplays that might happen. Whether there are skeletons or not is something that should be entirely within the hands of the community, not decided by the development team, and the system should support their creation but also work well without.
Having this more fine-grained model, where abstract and local parts are arbitrarily mixed, allows indeed further interesting workflows, as GWZDer points out, where a local editor, without having to show any regard for the abstract content, can simply materialize a rendering as text and then change it manually. If we manage to keep track of these changes, we can have queues where other contributors can then try to migrate these changes back into the abstract content so it can propagate into the other languages. Or not. As said, the workflows and processes regarding editorial decisions should always remain with the community, and never with the algorithms.
Again, thanks for the vivid description of use cases, I very much agree with them! --DVrandecic (WMF) (talk) 05:27, 14 July 2020 (UTC)
@ABaso (WMF): suggested continuing our discussion here. What I am envisioning is probably 3 or more layers of article content: (1) Language-specific, topic-specific content as we have now in the wikipedias, (2) Language-independent topic-specific content that would be hosted in a common repository and available to all languages that have appropriate renderers for the language-independent content (abstract wikipedia), (3) Language-independent generic content that can be auto-generated from Wikidata properties and appropriate renderers (generic in the sense of determined by Wikidata instance/subclass relations, for instance). I'm not entirely sure how the existing Reasonator and Article Placeholder functionalities work, but my impression was they are sort of at this 3rd or maybe even a lower 4th level, being as fully generic as possible. That probably makes them a useful starting point or lowest-functionality-level for this at least. The question of how to mix these various pieces together, and even harder how to present a useful UI for editing them, is definitely going to be tricky! ArthurPSmith (talk) 20:38, 28 July 2020 (UTC)
@ArthurPSmith:: (squeezing in) Yes, that is a great model to view it! Fully agreed to all three levels (what is the fourth level?) I totally expect those three levels to develop and to support all of them. And additionally to that have more or less generic functions for certain types of contents, e.g. taxons, researchers, locations, etc. similar to LSJbot. --DVrandecic (WMF) (talk) 00:52, 5 August 2020 (UTC)
@ArthurPSmith: I like to think of it from the bottom up, as you suggest. Every statement in Wikidata should have a fairly straightforward natural language representation. This is to convey the information that Article Placeholder pops into boxes at the moment (except that it leaves out some kinds of statement, based on a simple "blacklist", as I understand it, but that may just be the final filter). Let's look at a non-existent nynorsk article. Slightly less boring than most is Grammatical Framework (in Norwegian). You can probably work out what it says without knowing Norwegian. If you compare it with its Wikidata item you'll see the same information in the language of your choosing anyway. Notice the P348 statement. In Wikidata English it says "software version identifier 3.10, publication date 2 December 2018. 1 reference: reference URL https://www.grammaticalframework.org/download/release-3.10.html retrieved 4 May 2020". One way to render that into natural English would be "Version 3.10 was published on 2 December 2018 (according to information here on 4 May 2020). The nynorsk page puts the reference in a footnote, as you might expect. So its box says "versjon 3.102 utgjevingstidspunkt: 2. desember 2018" and the footnote says "2.↑https://www.grammaticalframework.org/download/release-3.10.html, 4. mai 2020". It's a fairly simple, real-world example, but notice that there is a little hierarchy there already: the value (3.10), the value's date, and the source (logically, for both value and date) and when that was accessed. Of course, that is just the start (as I see it), but I thought it was worth going through just to see what sort of questions it raises.
The next level is: when and how to combine two Wikidata statements (or more). Here, the first three look like they would go into a sentence beginning with the name of the Wikidata Item. We can see that, but how does an automated process achieve that? I'm pretty sure it can make good progress by considering a hierarchy of properties. "The one to beat" is instance of (P31) but some of these can be rather artificial or a bit weak; we want to be perhaps a bit more general, perhaps a bit more specific. So, yes, Grammatical Framework is a programming language (according to their website, accessed...) but it's more specifically "A programming language for multilingual grammar applications". That's the sort of "enriching" that human contributors to Wikipedias probably can't help wanting to do, once we hear it expressed in our own language. It's the difference between form-filling and writing. "Grammatical Framework is a programming language developed by Xerox Research Centre Europe, starting in 1998. The latest release, 3.10, came out at the end of 2018." Who uses it? What for? And so on...
Maybe there's some evidence from Norway or Wales or wherever the other Article Placeholder Wikipedias are of contributors changing Wikidata or creating articles to replace the placeholder. Neither of those is easy or likely to be undertaken lightly. Even if we could just insert some persistent text, associated with the Wikidata statement, you might be looking at a tenfold increase in interaction (almost certainly more or less than that). That's a signal. I'd hope that, like translators, some contributors would be able to check these annotations and see how they should be translated into Wikidata. That seems like a workflow that might be used more widely across WMF projects, but the particular challenge for the Article Placeholder is that there is no page to change. Denny talks about materialising a rendering as text (above) and that's a nice place to end up. But clicking on an Article Placeholder box or a Statement in Wikidata and typing a note as text in your own language, a note that pings itself off to a Wikidata translator... I don't know, how hard can it be?It's a start.--GrounderUK (talk) 02:17, 29 July 2020 (UTC)
I feel like you're trying to keep things at my "level 3" (or 4) - the key to what we're trying to do here though is I think "level 2" - how to express an interesting article ABOUT (say) Grammatical Framework (Q5593683) in a way that's independent of language, but specific to that particular topic. So there's no need to automatically select what things to say, the abstract content does that selecting for that specific topic: it could use P31 and P348 and ... together in this first sentence/paragraph, use these other properties in second and third etc., leave out certain properties, add additional information from other QItems that are related, add references here or there but not everywhere, etc. ArthurPSmith (talk) 14:30, 29 July 2020 (UTC)
@ArthurPSmith: I mostly agree with what you say, except that I'm not trying to keep things at any particular level. What I am suggesting is that the problem is self-similar at any conceivable level. I sum that up as "a nested composition of infoTexts". Well, theoretically infinite recursions are fine by me, but they must bottom out in the real world. And where it must bottom out, for our purposes, is the "value in context", the triple that represents... well... anything. So, what you're calling "level 2" is what I called "ArticleSkeleton": the things on a page (article), each represented by its identifier (a value) and the identifier's link to the identifier of the "Article". You can create that explicitly or you can derive it from your Wikidata Items. And that is true at the next level up (Category, as just one example) and it is true at the next level down (section, sub-section, paragraph, sub-paragraph, sentence... it really doesn't matter).
That's why I invented the term infoText (which I'm not attached to). An infoText is (defined to be) a composition of infoTexts "from the bottom up, you get Wikidata values, properties and items, InfoText and nested compositions of InfoText". An "elementary" infoText contains only values, properties and item identifiers; loosely, it corresponds to a Wikidata statement but, as the example above shows, you could/would decompose the statement into more fundamental infoTexts. Then, the statement's infoText is a composition of infoTexts, and the composition is fairly shallow and fairly narrow. We would expect some rendition of that infoText in the context of an article about Grammatical Framework, but also in the context of a paragraph about GF on a natural-language generation page or in a list of products on the Xerox page or in any other context (in any available language). But there we would expect a link to the GF page (and we may not care whether such a page exists) but no link to the page we're on.
Enough for now. Yes, given a particular Q, we can return a derived or pre-defined infoText (speaking hypothetically) but I doubt we'll have a separate pre-defined result for every conceivable use case. We can imagine what a "full" set might look like, and maybe a "minimal" set ("short description"), but less-than-full or more-than-minimal...? Pre-defined, explicitly, yes: just go ahead and define it as if it were an article. Derived from style guidelines and editorial policy but not specific to a particular Q, maybe: I guess our "full" set would respect some express cross-topic guidelines, which should be "adjustable" (level of language, level of subject expertise etc). Let's see what people want.--GrounderUK (talk) 17:19, 29 July 2020 (UTC)
I really don't think it's self-similar. Not every Wikidata item will have an "ArticleSkeleton" (level 2 content) - for those that don't presumably something like Article Placeholder is probably fine. For those that do, though, it will in my view be a fully defined page of content, perhaps calling in other things like templates and infoboxes but those are just like images, not really part of the main content of the page. I don't see how sections, paragraphs etc. can be defined separately from the main content on a specific topic (a single Wikidata item). ArthurPSmith (talk) 18:01, 29 July 2020 (UTC)
@ArthurPSmith: I agree that not every Wikidata item will need a bespoke ArticleSkeleton. Some probably shouldn't even have an Article Placeholder (Q22665571, for example). So we have a Requirement [a] to be able to identify these, perhaps with a specific Property statement about the QItem, perhaps according to rules relating to the QItem's properties (or values within some QItem properties or combinations of values and properties for the QItem and/or other QItems related in one way or another...).
The sub-class of QItems that might have an ArticleSkeleton, we might as well call encyclopedic QItems. One Wikipedia has a page on the subject, another might just have a section or an infobox. They have their view on the subject and they don't want Wikidata's. We might say it's all or nothing, or we might say you can opt in to some categories and opt out of others, and if you have categories you opt into, you might want to opt out of particular sub-categories, including individual QItems. So that's another possible Requirement [b]. And, self-similar or not, there might be certain images that certain Wikipedias would object to, and there might be certain claims... So this might be another possible Requirement [c].
Well, if the language-neutral ArticleSkeleton is going to include templates and images and infoboxes and anything else that might be present on a current Wikipedia page (and why would it not?), then they absolutely must be considered to be "really part of the main content of the page". Maybe the inclusion of a particular image or one of a category of images would, by implication, lead to the whole article being unacceptable, as in Requirement [b], but I think we should consider whether implied unacceptability is a different Requirement [d]
So, we should consider:
  • Requirement [a]: Implicit and/or explicit exclusion of a Wikidata Item (explicit could be a special case of implicit but not vice versa)
  • Requirement [b]: Wikipedias can opt out of sets of Items (including a single Item)
  • Requirement [c]: Wikipedias can opt out of sets of images, templates etc (including sets of one), and of specific Wikidata claims (or types of claim, loosely defined...)
  • Requirement [d]: Inclusion of anything subject to [c], if unconditional, implies opt-out by Wikipedias opting out under [c] (opt-outs are inherited upwards by unconditional inclusion)
Do those requirements make sense? Feel free to suggest amendments and additions, obviously. It might be worth splitting [c] up.--GrounderUK (talk) 20:23, 29 July 2020 (UTC)

I just wanted to say, that yes, I agree that we should have the possibility to create hybrid articles. Have the lead written locally, include an infobox and a section, etc., and it should be a seamless experience for the reader - and it will be difficult to get the experience for the contributors just right (but that's already true today with the mix of templates and module calls, etc.). In general I see that there will be some types of articles which will be very much generated with a simple call and assembled from data on Wikidata (Level 3, as described by ArthurPSmith), and this will be an important contribution from the new system - but I also really want us to get to the point where we have Level 2 content. I am convinced that many very interesting topics (such as photosynthesis or World War II or language) will be really hard to do with an approach rooted in Level 3, and will be only accessible to a Level 2 approach (hopefully - it would be sad if we learn that only a Level 1 approach works for them). But thanks for this interesting discussion! The important thing is to ensure, and I think we are doing that, to allow for all the different levels mentioned here - and also for a mix of these per article. --DVrandecic (WMF) (talk) 01:07, 5 August 2020 (UTC)

Transcluded talk pages

Non-Wikipedia Content

It appears that the abstract content will be stored on Wikidata? Is that correct? I'm curious where non-wikipedia content would be stored? For instance, let's say we hypothetically wanted to create an "Abstract Wikivoyage" would that be a new wiki or would that abstract content be stored on Wikidata as well? --DBarratt (WMF) (talk) 19:27, 2 July 2020 (UTC)

This can easily be achieved by creating a specific part of content. See Talk:Abstract_Wikipedia#Two_small_questions. An article for Paris will have a "main" part and a "Wikivoyage" part, where the "Wikivoyage" may transclude "Wikivoyage-Getaround" part and "Wikivoyage-See" part. (In my opinion, the "Wikivoyage" part will simply be a GeneralWikivoyageArticle(), which automatically transclude other parts; the "Wikivoyage-See" part will be a WikivoyageSee(), which query information from Wikidata andd format them to prose. Both will no longer require manual maintanence. The "Wikivoyage-Getaround" part will still be a complex Content.)--GZWDer (talk) 14:21, 3 July 2020 (UTC)
That's a great question we don't have an answer for yet. What GZWDer suggest could work. Also it could be stored in Wikilambda directly. Or we could add another namespace in Wikidata? (Probably not). If I see that correctly, Wikivoyage would be, fortunately, the only other project that has that issue, as it overlaps so much regarding items with Wikipedia but has so much different content. So yes, once we get to that point we would need to figure out a solution together with the Wikivoyage communities. Great question! --DVrandecic (WMF) (talk) 01:14, 5 August 2020 (UTC)
Motivated by this question, I don't think the other sisters will have the Abstract Wikipedia functionalities, no? I'm an admin at SqQuote and I've been wanting to ask for a long time but I thought it was Wikipedia only in the beginning. This question made me curious now. - Klein Muçi (talk) 09:56, 7 August 2020 (UTC)
Adding on the above question: What about help pages in Wikipedia located in Wikipedia/Help namespaces? - Klein Muçi (talk) 12:23, 7 August 2020 (UTC)

From Talk:Abstract_Wikipedia/Plan

Primary goals

"Allowing more people to read more content in their language"

Rather than "in their language", I think it should be "in the language they choose". Some people have more than one language. Some people may want to read content in a language that is not one of their own, either because they are learning that language, or because they are studying in that language, or for other reasons.--GrounderUK (talk) 15:58, 6 July 2020 (UTC)

@GrounderUK: Good point, and I've tweaked it to your suggestion. That seems to keep it concise, whilst being a bit clearer. Thanks. Quiddity (WMF) (talk) 05:51, 21 July 2020 (UTC)

From Talk:Abstract_Wikipedia/Goals

Renderers

"Solution 4 has the disadvantage that many functions can be shared between the different languages, and by moving the Renderers and functions to the local Wikipedias we forfeit that possibility. Also, by relegating the Renderers to the local Wikipedias, we miss out on the potential that an independent catalog of functions could achieve."

It seems that we are not neutral on this question! It is not obvious that we cannot have the functions in Wikilambda, the data in Wikidata and the implementation in the local Wikipedias. I don't propose that we should, but would such an architecture be "relegating" Renderers? Or is it not a form of Solution 4 at all?
I think we need a broader view of the architecture and how it supports the "Content journey". In this context, Content means the Wikipedia (or other WikiCommunity) articles and their components. Consumption of Content is primarily by people reading articles in a language of their choice, in accordance with the project's first Primary Goal. I take this to be the end of the "Content journey". I tend to presume that it is also the start of the journey: Editors enter text into articles in a language of their choice. In practice, this means that they edit articles in the Wikipedia for their chosen language. This seems to be the intent of the project's second Primary Goal but it is not clear how the architecture supports this.

"We think it is advantageous for communication and community building to introduce a new project, Wikilambda, for a new form of knowledge assets, functions, which include Renderers. This would speak for Solution 2 and 3."

Clearly, Renderers are functions and, as such, they should reside with other global functions in what we are calling Wikilambda. However, Renderers are not pure function; they are function plus "knowledge". Since some of that knowledge resides within editors of the natural language Wikipedias, whose primary focus may well be the creation and improvement of content in their chosen language, I am inclined to conclude that natural language knowledge should be acquired for the Wikilambda project from the natural language Wikipedias and their editors' contributions. As with encyclopedic content, there may well be a journey into Wikidata, with the result that the Renderers are technically fully located within Wilkilambda and Wikidata (which is not quite Solution 2).

"Because of these reasons, we favor Solution 2 and assume it for the rest of the proposal. If we switch to another, the project plan can be easily accommodated (besides for Solution 4, which would need quite some rewriting)."

I'd like to understand what re-writing Solution 4 would demand. I take for granted that foundational rendering functions are developed within Wikilambda and are aligned to content in Wikidata, but is there some technical constraint that inhibits a community-specific fork of their natural language renderer that uses some of the community's locally developed functionality?--GrounderUK (talk) 18:31, 6 July 2020 (UTC)

Hmm. Your point is valid. If we merely think about overwriting some of the functions locally, then that could be a possibility. But I am afraid that would end up in difficulties in maintaining the system, and also possibly hampering external reuse. Also, it would require to add the functionality to maintain, edit, and curate functions to all existing projects. Not impossible, but much more intrusive than to add it to a single project dedicated to it. So yes, it might be workable. What I don't see though, and maybe you can help me with that - what would be the advantage of that solution? --DVrandecic (WMF) (talk) 01:22, 5 August 2020 (UTC)

@DVrandecic (WMF): I'm happy to help where I can, but I'm not sure it's a fair question. I don't see this as a question of which architecture is better or worse. As I see it, it is a matter of explaining our current assumptions, as they are documented in the main page. What I find concerning is not that Solution 2 is favored, nor the implication that Solution 4 is the least favored, it is the vagueness of "quite some rewriting". It sounds bad, but I've no idea how bad. This is more about the journey than the destination, or about planning the journey... It's like you're saying "let's head for 2, for now, we can always decide later if we'd rather go to 1 or 3; so long as we're all agreed that we're not going to 4!"
Not to strain the analogy too much(?), my reply is something like, "With traffic like this we'll probably end up in 4 anyway!" The "traffic" here is a combination of geopolitics and human nature, the same forces that drove us to many Wikipedias, and a Wiktionary in every language for all the words in all the languages ...translated. Wikidata has been a great help (Thank You!) and I'm certainly hoping for better with added NLG. But NLG brings us closer to human hearts and may provoke "irrational" responses. If a Wikipedia demands control over "its own" language (and renderer functions), or a national government does so, how could WMF respond?
In any event (staying upbeat), I'm not sure that "Renderer" is best viewed as an "architectural" component. I see it as more distributed functionality. Some of the more "editorial" aspects (style, content, appropriate jargon...) are currently matters of project autonymy. How these policies and guidelines can interact with a "renderer's" navigation of language-neutral encyclopedic and linguistic content is, of course, a fascinating topic for future elaboration.--GrounderUK (talk) 13:49, 5 August 2020 (UTC)

@GrounderUK: Ah, to explain what I mean with "quite some rewriting", I literally mean that the proposal would need to be rewritten in some part, because the proposal is written with Solution 2 in mind. So, no, that wouldn't be a real blocker and it wouldn't be that bad - we are talking merely about the proposal. So no, it wouldn't be that bad.

I think I understand your point, and it's a good one, and here's my reply to that point: if we speak of the renderers, it sounds like this is a monolithic thing, but in fact, they are built from many different pieces. And the content as well, is not a monolith, but a complex best with parts and sections. The whole thing is not a all-or-nothing thing: a local Wikipedia will have the opportunity either to pull in everything from the common abstract repository, or they can choose to pull in only certain parts. They can also create alternative renderers and functions in Wikilambda, and call these instead of the (standard?) renderers. In the end, the local Wikipedia decides which renderer to call with which content and with which parameters.

So there really should be no need to override individual renderers in their local Wikipedia, as they can create alternative renderers in Wikilambda and use those instead. And again, I think there is an opportunity to collaborate: if two language community have a common concern around some type of content, they can develop their alternative renderers in Wikilambda and share the work there. I hope that makes sense. --DVrandecic (WMF) (talk) 22:01, 6 August 2020 (UTC)

@DVrandecic (WMF):What a terrible tool this natural-language can be! Thanks, Denny. That is a whole lot less worrying and more understandable (well, I think we basically agree about all of this except, perhaps, "standard?" would be "existing").--GrounderUK (talk) 22:29, 6 August 2020 (UTC)

From Talk:Abstract_Wikipedia/Architecture

Wikilambda vs. rebranding

The project seems to be called Wikilambda, which is in line with the other projects. Then why is it called Abstract Wikipedia almost everywhere? Is this related to the (senseless) rebranding of WikiMedia to Wikipedia? Is this an attempt for this project to get a headstart and capitalize on the reputation of Wikipedia? Don't rebrand, don't call things that aren't wikipedia Wikipedia and don't behave like a commercial company. Zanaq (talk) 07:43, 13 July 2020 (UTC)

This project will be more releated to Wikipedia than other sister projects. Imagine this as supplement to existing content of Wikipedias or another language version. So I think the name may be good. Note this name may be changed. Wargo (talk) 20:49, 13 July 2020 (UTC)
If you asked 100 people to explain the idea behind the project to a journalist from their local newspaper, how many would use the word "Wikipedia" in the first sentence? --Gnom (talk) 22:03, 13 July 2020 (UTC)
Abstract Wikipedia is a development project for all the ideas. There will likely not be an actual Wikimedia project with this name.--GZWDer (talk) 03:22, 14 July 2020 (UTC)

@Zanaq: Both Wikilambda and Abstract Wikipedia are distinct parts of the proposal and have always been (Wikilambda is the wiki for functions, and Abstract Wikipedia the repository for abstract content), and in communications it was just easier to use one name. Both names are preliminary and will be replaced within the year anyway, and the community will be choosing the final name. If you have good proposals for the name, feel free to add them on this page, and join us on the mailing list to discuss the names later this summer! Thanks! --DVrandecic (WMF) (talk) 03:31, 15 July 2020 (UTC)

  • @DVrandecic (WMF): That leaves the question of why you didn't chose the less controversial name and called it Wikilambda. It would have been easy to make a choice that produces less community pushback. How about taking care of not wasting some of your political capital on fights that aren't important to you? Especially going forward? ChristianKl13:55, 15 July 2020 (UTC)
    @ChristianKl: I thought the name 'Abstract Wikipedia' would be more evocative of what it is doing than 'Wikilambda' for this particular communication. Since it was explicitly stating that it would change, I thought let's use the name that makes the explanation easier to understand. --DVrandecic (WMF) (talk) 17:29, 15 July 2020 (UTC)
  • I'd personally keep a lambda in the logo for geekiness/simplicity, regardless of what the eventual name will be. John M Wolfson (talk) 03:36, 16 July 2020 (UTC)

You're welcome to join us at Wikispore Day on Sunday July 19, it starts at 1pm Eastern / 17:00 UTC, and it will run for about 2 hours. You can RSVP on that page and possibly give a lightning talk on any Wikispore-adjacent topic. You will also be able to participate and ask questions via the YouTube livestream here.--Pharos (talk) 12:33, 19 July 2020 (UTC)

Wikidata VS Wikipedia - General structure

Hello everyone! I'm a strong supporter of using multiprojects infrastructures. (Something like that has been long needed on templates and modules but that's another subject.) I like the idea behind this new project but I had a somewhat technical dilemma that I believe it does arise from me not being tech-savvy enough. While the idea of auto-generating articles from Wikidata is good on itself, why not use the already-built multilingual structure we have on Wikipedia itself? Wouldn't it be easier, from a technical point of view, transforming Wikipedia in a multilingual project Wikidata/Commons/Meta-alike where articles appear in different languages according to the users preferences?

I understand that 2 problems may arise from this:

  1. Unlike Meta/Wikidata pages, there are no homologous articles for every article in every language;
  2. The same article, can change in POV regarding the culture (especially in Small Wikis where neutrality is not strong enough);

But I believe there are enough articles in Wikipedias in different language to start a transition like this and fix it along the way. And maybe after Wikipedia, other Wiki projects can join too in this style.

So, the question I ask is why is there a need to autogenerate articles from Wikidata when we already have a multilingual structure we can use?

I must make it clear that I'm not against the idea per se. I just want to better understand what is the drive behind the need of auto-generating articles from Wikidata. (I'm not even against the auto-generation itself.) - Klein Muçi (talk) 14:27, 21 July 2020 (UTC)

@Klein Muçi: Before I reply, please note my Strong support Strong support for this project. But I Support Support the goal much more than the Abstract Wikipedia/Plan. My reply to you is: Let's do this! I'll ping @Pharos: here because I'm not sure Wikispore is 100% ready for multi-lingual collaboration with heavy Wikidata integration (even on a small scale). I suggest a few rules.
  1. Every translingual Wikipedia article page (Q50081413) must take its MediaWiki page title (Q83336425) from 1 Wikidata item.
  2. The Wikidata item must have 1 Wikipedia article page in >1 Wikipedia.
  3. The translingual Wikipedia article page must contain >1 Wikidata claim (Q16354754), which should be embedded in wiki markup text in 1 natural language that can be (or has been) the successful target of (input/output to/from) machine translation.
  4. Words (of the natural language) which do not come from Wikidata must be present in the controlled vocabulary, where each word links to a lexeme in Wikidata (ideally one which has the required Wikibase sense (Q54285715), i.e. meaning).
Try viewing this with a language other than English selected. If the Wikipedia links don't work, there could be data missing in Wikidata or pages absent from the Wikipedia in that language. It would be nice if it would fall back to the Wikidata Item, but even blanking the site or entering "wikidata" will not do that; you have to copy the Item ID into the search field.--GrounderUK (talk) 19:40, 21 July 2020 (UTC)
@GrounderUK: I'm really sorry but I can barely understand what you have written. First of all, I think you haven't answered my question. Maybe you haven't understood what I was asking or maybe I'm not technically informed enough to understand how your answer relates to it. Secondly, using the Wikidata items as words is SO CONFUSING to me. I can literally not understand one single sentence. Maybe it's just with Albanian (my language) but most of the translations (if they exist) are really bad. - Klein Muçi (talk) 00:08, 22 July 2020 (UTC)
@Klein Muçi: I'm sorry too. I don't speak Albanian but I asked Google to translate its Albanian back into English, French and German and it seemed to manage. I hope this helps.
@Klein Muçi: Para se të përgjigjem, ju lutem vini re votën time të mbështetjes së fortë të Symbol.svg Mbështetje e fortë për këtë projekt. Por unë mbështes mbështesin qëllimin shumë më tepër sesa Wikipedia / Plani Abstrakt. Përgjigja ime për ju është: Le ta bëjmë këtë! Unë do të ping @Pharos: këtu sepse nuk jam i sigurt se Wikispore është 100% gati për bashkëpunim shumë-gjuhësor me integrimin e rëndë të Wikidata (madje edhe në një shkallë të vogël). Unë sugjeroj disa rregulla.
  1. Everydo faqe artikull transkriptues i Wikipedia (Q50081413) duhet të marrë titullin e faqes së saj MediaWiki (Q83336425) nga 1 artikull Wikidata.
  2. Artikulli Wikidata duhet të ketë 1 faqe të artikullit të Wikipedia në >1 Wikipedia.
  3. Faqja e artikujve ndërgjuhësorë të Wikipedia duhet të përmbajë >1 kërkesë (Q16354754), e cila duhet të futet në tekstin e markup-it në 1 gjuhë natyrore që mund të jetë (ose ka qenë) shënjestra e suksesshme e përkthimit të makinës (hyrje / dalje nga / nga).
  4. Fjalët (të gjuhës natyrore) të cilat nuk vijnë nga Wikidata duhet të jenë të pranishme në fjalorin e kontrolluar, ku secila fjalë lidhet me një leksemë në Wikidata (në mënyrë ideale ajo që ka kuptimin e kërkuar (Q54285715), d.m.th. kuptimin).--GrounderUK (talk) 00:37, 22 July 2020 (UTC)
@GrounderUK: Oh, thank you for taking the long way to make it easier for me to understand what you meant. Unfortunately I'm still not sure how that answer relates to what I asked in the beginning. I'll try to rewrite my question in a very short way.
@Klein Muçi: I did not give a direct answer to your question. My point was that we can do what you suggest right now ("Let's do this!").
What I think Abstract Wikipedia will bring (maybe I've got it wrong): A new project on a new domain populated by articles written automatically on different languages by the information that already exists on Wikidata.
I don't think so. The articles written automatically would be in the existing Wikipedias, if they were asked for. But, yes, the information would come from Wikidata.
What I think could maybe be a better idea (since we're talking innovations): Don't create a new project on a new domain. Transform Wikipedia so it works on the same domain populated by the articles that already do exist in different languages not written automatically. A bit similar to how Meta works. The content's language gets determined by users' preferences.
I'm not sure how this is different from what we have now. (Well, I can see it's not the same, but maybe we could already be doing this). Meta works for content because people translate content that is marked up for translation. People can also translate a Wikipedia article into a different language (although not in the "same domain"). But translation is hard work and mostly (outside of meta) it doesn't get done. If you want to say "If there's no Albanian page, show me the English page", that is quite hard at the moment. It's not too hard for us, the readers, but it could certainly be made easier.
Now, I'm not sure my idea is better but it is there mostly to allow me to better explain what I'm asking: Where does this drive/need/desire for auto-creation of articles in a newly made domain come from? I feel like we already have a good enough multilingual infrastructure (with not much auto-creation) which we can better perfect, if we so choose to do. Why do we need a new domain made specifically for new auto-generated articles?
Well, I said we don't need a new domain. What we have is not good enough, however, because most languages have a fairly small number of articles, compared with the larger Wikipedias. This has always been true and the smaller Wikipedias don't seem to be catching up (perhaps there are a few exceptions). Nothing is stopping us from trying but time is always limited. If we can make the information in Wikidata understandable to average or typical readers, no matter what languages they speak, they will have more information available sooner. And if everyone can improve the information in Wikidata, then everyone can share the better information immediately. I think it is here we need what you call "a new domain". To me, it's just a different way of editing Wikidata, one where we can also say how we want the information to appear in different languages.
I know there must be benefits which I'm overlooking and that's why I asked. Hope I've been a bit more clear now. - Klein Muçi (talk) 01:08, 22 July 2020 (UTC)
You were clear before. It's not for me to justify Wikimedia Foundation decisions, but I hope you can see why I do support this one. But this new project will take a long time and may not be able to give us what we hope for. So I also support suggestions like yours, which might give us benefits today.
A small multi-lingual Wikipedia that follows my four rules is a project that interests me. It might help us to find out what is going to be hard for "Abstract Wikipedia". Perhaps you can see that my first rule addresses your first problem. By making the Wikidata Item ID the title of the page, it should display in the user's current language (and we can fix Wikidata if it does not). My second and third rules aim to address your POV concerns; we automatically inherit Wikidata and multiple Wikipedia neutrality (and notability). The automatic translation in rule three is to help monitor any bias that might be present in the surrounding natural-language text. Rule 4 also helps with this; it makes it easier to use words that have been used before rather than words with a similar meaning that might be "loaded" (not neutral POV) terms in some contexts. It also encourages us to improve the lexical information in Wikidata, which is supposed to be used for our automatic articles (eventually).--GrounderUK (talk) 03:11, 22 July 2020 (UTC)

@GrounderUK: Oh, okay. Now I fully understand what you mean. The problem was that I didn't expect for people to like my suggestion and I was just looking for an explanation about my question and how what I wanted was unfavorable. You agreeing with it (and even with the Abstract Wikipedia plan) confused me. Sorry for taking too much time to understand that.

As for I'm not sure how this is different from what we have now., you've explained it yourself. We could have 1 singular domain with all the languages and you get shown the article/main page of your chosen language. If it doesn't exist yet, you get shown:

  • a) the English version (?)
  • b) an auto-generated version from Wikidata (in your language)

You also have unified gadgets, templates and modules in multi-lingual versions.

What I propose is not ideal though because I think the worldwide communities need to have some autonomy and not be fully merged (I think that wouldn't benefit the project overall for many reasons) and also you have the not-so-small technical details that all the communities have set up differently from each other and would want to keep (different templates/modules etc.) Again, part of the autonomy problem.

Maybe we can keep what we already have and just implement the unified gadgets/modules/templates and add the B option from above. Auto-generated versions for articles missing in some languages. But the idea of creating a new domain to be populated just by auto-generated articles from Wikidata seems odd to me. (Even though you mentioned above that that is NOT the case about Abstract Wikipedia.) If it were more like a tool function, that you get an auto-generated article from WD when an article is missing from your language, (basically what I described above), that would be good. But having a whole domain filled with these articles... I don't know but it seems odd. Content Translation Tool already is great at filling up the language-gap. Many new articles, if not all, in small wikis come from it. It's basically auto-generated in your language and it's not all auto so the semantics are good enough too. We could make CTT even more powerful and just work towards some unification between the technical aspects like gadgets/modules/templates. We could even have the WD auto-articles as a "tool". It's just the idea of a whole new domain filled with auto-generated articles that doesn't feel right to me. And I do believe bot-translation can be of good quality if the process is set up right. I'm just appalled that we are thinking of things like these NOW. After more than 15 years of existence, after having Wikipedia-s with millions of articles in different languages, after developing tools like CTT... Now it just doesn't look that much needed like a feature, especially deserving as much attention as a whole new domain. That's why I wrote in the first place. To quench my curiosity of what was the drive behind the approach that was chosen and why that is not an "outdated" approach, as I so say above. But I do thank you a lot for finding time to fully explain yourself to me and also giving new approach ideas on this subject. :) - Klein Muçi (talk) 10:26, 22 July 2020 (UTC)

Ju lutem, @Klein Muçi: You're welcome. (I hope that's right.)
As I said, it's the goal that interests me more than how we get there. Unified gadgets/modules/templates are part of the plan. In fact, support for these are the first part of the plan: a wiki of functions.
Someone else will have to explain why something like your a and b can't be done right now. As a human being, I can go to Wikidata, search for my new topic, select the likely Wikidata item and follow the link to any Wikipedia that has an article on that topic. (OK, it's not super helpful that the links are nearly at the bottom.) But once I know Albania is Q222, I can go straight to any Wikipedia in any language. Your language's Wikipedia page for Albania (?), for example. This is the same as clicking the right language in the sidebar of any Wikipedia page ("Në gjuhë të tjera"). So... it should be really easy for a Wikipedia page to provide links to all the articles in other Wikipedias, given only the Q-number (because they already do so, in the sidebar). I say "should be" because I don't recall seeing such a thing.
Maybe that helps. But keep focusing on the goal and share all the ideas you have. That way, someone might help you get to where you want to be sooner rather than later. See mw:Extension:Article Placeholder for an extension that some smaller wikis have; it returns some Wikidata content if a page does not already exist on that Wikipedia. nn:Spesial:AboutTopic/Q845189 is an example (and it gives other Wikipedias in the sidebar).--GrounderUK (talk) 12:06, 22 July 2020 (UTC)
@GrounderUK: haha Yes, you're right!
Yes, I understand what you mean. I'm an interface administrator for SqWiki and SqQuote and I put great interest in the way things appear and how to make more them more intuitive, especially for new editors, since I'm involved in a lot of offline wiki-workshops. To be honest, the Wikipedia's interface does seem a bit outdated in some ways for the new generation of users (take a look here if you are interested to know more of my opinion on this subject) and WikiData could use some reworking to make it more new-user friendly but I know that if you learn how to navigate it, it's a powerful tool. (You can also take a look here if you are interested in knowing my opinion on intersite/foreign-friendliness traffic.) Given my work on subjects like these, I've literally seen every extension in here one by one and asked for a couple of them to be activated in Phabricator for SqWiki/SqQuote but was denied because they were not part of the Wikimedia bundle yet. We would benefit so much by Boileroom in SqQuote but, alas, we can't have that. I've seen what you suggest and I've been interested in activating Article Placeholder for SqWiki before but then I saw it required you to go to special pages and then look for the missing article before showing the article to you. After seeing that, it didn't look that beneficial anymore since the "special pages" place is part of the "dark unknown" places for average readers/contributors. That's why this new project we're discussing looks interesting to me (as it does to you). And to prove I'm not against auto-created articles or artificial help in general, I'd invite you to take a look here.
Anyway, in general I support everything that leads to this. I spend far too much time looking after the citation modules and templates on SqWiki and SqQuote periodically as they continuously evolve on EnWiki and most of that work is just copy-pasting with little change. And I can't just literally copy-paste the whole code because it would break the translation inbetween so the only way is to copy-paste line by line. Global templates/modules/gadgets would save so much time on these situations. - Klein Muçi (talk) 15:31, 22 July 2020 (UTC)

@Klein Muçi: I am not against your solution, not at all. And @Cscott: had a presentation at Wikimania last year proposing that. I would love to see that happen. I think that the issues with that solution are much less technical and much more social / project-political - and definitely nothing I want the Abstract Wikipedia to block on. So, yes, please, go ahead and gather support for that idea! --DVrandecic (WMF) (talk) 01:45, 5 August 2020 (UTC)

@DVrandecic (WMF): Oh, I didn't know there was already a discussion on-going for my proposal. Thank you for mentioning that! Just to make sure I'm not misunderstood by you though... I want to emphasize that I'm not against the AW project. It just seemed strange to me that we would need auto-generated articles now that we already have high coverage of many subjects in different languages. But maybe I'm wrong and my POV is too localized and in the global scale that's not true. - Klein Muçi (talk) 09:25, 5 August 2020 (UTC)
@Klein Muçi: I am surprised you say that there is a high coverage of many subjects in many different languages. Your home wiki is the Albanian one, right? Albanian has about 80,000 articles, and that is great - but when I take a look for the island I come from, Brač, I can see that none of the villages and towns have articles. Or when I look at the list of Nobel prize winners in Physics, it is somewhat out of date, and many of the Nobel prize winners have no articles. Wouldn't it be nice if those could be included as a baseline content from a common source until you get around to write the articles? Also it would allow the Albanian community to focus on the topics they really care about. And even to bring knowledge about Albanian topics to the common baseline content. I would be really interested in your thoughts on that. --DVrandecic (WMF) (talk) 21:21, 5 August 2020 (UTC)
@DVrandecic (WMF): yes, it sure would. But wouldn't it be better/easier if that baseline content came from existing articles auto-translated/auto-generated by a tool like CTT by other wikis (mostly EnWiki, since it is the biggest) and users fixing the machine-translation problems? Why do we have to have a totally new domain and new code being written "from scratch" just to "teach the AI" how to write articles? Maybe is technically easier that way? That's the reason? I'm just trying to understand what was the drive behind this approach, as I've said many times. Not really opposing it. - Klein Muçi (talk) 23:28, 5 August 2020 (UTC)
@Klein Muçi: There are a number of issues with translation. First, we don't have automatic translation for many languages. Second, even for those we do, we need someone to check the translation results before incorporating these. The hope is that Abstract Wikipedia will generate content of a consistently high quality that it can be incorporated by the local Wikipedias without the necessity to check each individual content. And third, translation doesn't help with updates. If the world changes, and the English Wikipedia article gets updated, there is nothing that keeps the local translation current. CTT is a great tool, and I love it, and there will be great use cases for it, and the Foundation will continue to develop and improve it. Abstract Wikipedia aims at a slightly different mode of operation, and we'll basically complement some new use cases like the ability to keep content fresh and current. I hope that helps to understand the difference. Let me know if it is still unclear. Oh, also, there is no AI writing the articles - it is entirely driven and controlled by the community. --DVrandecic (WMF) (talk) 22:07, 6 August 2020 (UTC)
@DVrandecic (WMF): ah, automatic content update. I hadn't thought of that feature. Well, that's a big plus towards simple translation. No, it's clear enough now.
PS: I know there is no real AI. :P I used it to imply the automatic creation driven by the, of course, human users - contrasting that with a more organic way that CTT has. Thank you for the explanations! :) - Klein Muçi (talk) 23:00, 6 August 2020 (UTC)

Confusion

After reading some comments about this project, there seems to be a lot of confusion in almost everybody. Only the person who is developing this concept has a clear idea about where he is aiming at. To me, this seems like a project led by Google that needed some neutral ground, and they sent one of his minions to operate "behind the enemy lines". The way this project was approved doesn't seem to be aligned with the spirit of the Wikimedia community. That the Foundation wanted to invest so much money into a project that has so much uncertainty, and with so little community buy-in, seems suspicious and sketchy to say the least. Nice that the WMF leadership wants to appear to be doing something useful for the community other than the big waste of money that was the strategy process or the rebranding. However, I believe there are better ways to burn cash more efficiently, like offering a PeerTube instance to store all video tutorials about Free Software/Open Science, or video courses developed by the community financed by WMF grants. Sure it is not MediaWiki-based, but I am sure your clever minds can find a way to appropriate the tech and make it better wiki-style.--MathTexLearner (talk) 00:16, 5 September 2020 (UTC)

@MathTexLearner: Hi. It is a complicated project, but there is extensive community support for it per Talk:Abstract Wikipedia/Archive 1#Supporting signatures and statements and participation in discussions here and elsewhere. It might also be helpful to know that the person who proposed this project was also the person who started the Wikidata project (check his volunteer-account's userpage for a more extensive bio), and he had them both in mind throughout. If you want to try reading another explanation of this project, the page at w:en:Wikipedia:Wikipedia Signpost/2020-04-26/In focus might clarify some details. We will also continue working to improve the clarity of the project page here, with a major overhaul soon. I hope that helps. Quiddity (WMF) (talk) 00:48, 9 September 2020 (UTC)
Dear Community Relations Specialist Quiddity (WMF), if a website that claims to have 39000 users in English Wikipedia alone considers ~100 users, that is 0.25% of users of that wiki, "extensive community support" then I feel in the right to call you a liar and a manipulator that uses his position of authority within a shady organization structure to push forward an unwanted and unknown agenda by the majority of users. Given that you are so keen on using figures in such a manipulative way, I want to make you aware that the Community Wishlist Survey 2020 got 423 users involved, so I hope they get at least 4x the budget to develop things that are wanted by the community, and not this project drafted by some shady individual whose biggest life accomplishment was to piggyback on the work done by the other co-founder, who did the heavy-lifting defining the basis for Wikidata. Your organization looks like a head-less chicken, always looking for the "Next Big Thing" instead of improving the things there are there by actually listening to what the community of editors and readers wants *you* to do. The community, myself included, does not need "Community Relations Specialist" to justify the unjustifiable, they need obedience at all levels, starting from the leadership to the lowest ranks. I am utterly disappointed about how much frustration you guys are generating from "the clouds" in San Francisco, where on Earth people value efficiency over empty promises. This "Abstract Wikipedia" thing is a disaster in the making, albeit a necessary one. I'm glad that I can watch you guys get deeper and deeper into the mud, as more and more people will start asking for heads to start rolling within the Wikimedia Foundation. If I were you, I would start looking for other employment, because once the madness and the accusations of mismanagement begin, it will be hard to stop them without burning your whole wasteful and inefficient organization down to ashes. Have good luck "managing" the community.--MathTexLearner (talk) 15:15, 9 September 2020 (UTC)
You are welcome to post specific comments on the project. However, we will not allow unfounded allegations, against staff or any other people, on this page. Quiddity (WMF) (talk) 17:21, 16 September 2020 (UTC)

Distillation of existing content

I wonder whether we have hold of the right end of the stick. From the little I have so far read of this project, the big idea seems to be to help turn data into information. The goal, in practice, seems to be a satisfactory natural language presentation of previously defined subjects and predicates (content). Or, rather, the goal is to develop the capability to present content in any natural language. That, I say, is a worthy ambition.

As a contributor who mainly edits articles written by others, I feel the need to look at this from the opposite direction. Missing content is, of course, a problem. But so are inconsistent content, misleading content and confusing content. For me, an "abstract wikipedia" distils the content of an article, identifying the subjects, their qualities and the more or less subtle inter-relationships between them. That is, I imagine we begin with natural language content and automatically analyse it into an abstract form. I believe this is one way that non-technical contributors can really help us move towards our goal. If the person editing an article can see a "gloss" of the article in its target language, it is easy to imagine how they might adjust the original text so as to nudge the automated abstract towards a more accurate result. At the same time, the editor could be providing hints for more natural "rendering" (or re-rendering) of the abstract content into the natural language. In practice, this is what we already do when we provide alternative text in a link.

In my view, this sort of dialogue between editor and machine abstraction will help both. The editor gets objective feedback about what is ambiguous or unclear. I imagine some would prefer to "explain" to the machine how it should interpret the language ("give hints") while others might prefer to change the language so that it is more easily interpreted by the machine. Either way, the editor and the machine converge, just as collaborative editors already do, over time.

The crucial point here, I suppose, is that the re-rendering of abstracted content can be more reliably assessed at the editing stage. To the editor, it is just another tool, like "Show preview" or "Show changes" (or the abstract results appear with the text when those tools are used). Giving hints becomes as natural to the editor as fixing redlinks; how natural taking hints might become, time alone can tell.

Congratulations on getting this exciting project started.--GrounderUK (talk) 01:21, 5 July 2020 (UTC)

tldr: Once the content become abstract one, it will be a technical burden for community to contribute.
That is why the content should be maintained as part of the wikipedia editing process, with the editor only (and optionally) guiding the way the content is mapped back into Wikidata or verifying that it would be rendered correctly back into the source language WikiText (which is still fully concrete and always remains so).--GrounderUK (talk) 22:37, 5 July 2020 (UTC)

I do not think automated abstract is a good thing to do - this means we introduced a machine translation system, which generated interlingua with unclear semantics and can not be reused simply. I am also very skeptic for making all articles fully abstract.--GZWDer (talk) 21:22, 5 July 2020 (UTC) [signature duplicated from new topic added below]

I would be sceptical too. In fact, I would strongly oppose any such proposal. My suggestion (in case you have misunderstood me) is for a human-readable preview of the machine's interpretation of the natural language WikiText.--GrounderUK (talk) 22:37, 5 July 2020 (UTC)
@GrounderUK: Yes! Thank you for the congratulations, and yes, I agree with the process you describe. The Figure on Page 6 sketches an UI for that: the contributor enters natural language text, the system tries to automatically guess the abstraction, and at the same time displays the result in different languages the contributor chooses. The automatic guess in the middle can be directly modified, and the result of the modification is displayed immediately.
I also very much like how you describe the process of contributors fixing up ambiguities (giving hints). I also hope for such a workflow, where failed renderings get into a queue and allow contributors to go through them and give the appropriate hints, filtered by language.
But in my view nothing of this happens fully automated, it always involves the human in the middle. We don't try to automatically ingest existing articles, but rather let the contributors go and slowly build and grow the content.
Thank you for the very inspiring description of the process, I really enjoyed reading it. --DVrandecic (WMF) (talk) 05:15, 14 July 2020 (UTC)
@DVrandecic (WMF): You are very welcome, of course! Thank you for the link to your paper. You say, "The content has to be editable in any of the supported languages. Note that this does not mean that we need a parser that can read arbitrary input in any language." I think we agree. I would go further, however, and suggest that we do need a parser for every language for which there is a renderer. I invite you to go further still and view a parser as the precise inverse function of a renderer. If that sounds silly, it may be because we think of rendering as a "lossy" conversion. But the lesson from this should be that rendering (per se) should be constrained to be "lossless", meaning neither more nor less than that its output can be the input to the inverse function (the "parser"), returning as output the exact original input to the renderer. Such an approach implies that there will be subsequent lossy conversion required to achieve the required end result, but we need to think about ways in which the "final losses" can be retained (within comments, for example) so that even the end result can be parsed reliably back into its pre-rendered form. More importantly, an editor can modify the end result with a reasonable expectation that the revised version can be parsed back into what (in the editor's opinion) the pre-rendered form should have been.
To relate this back to your proposed interaction, where you say, "The automatic guess in the middle can be directly modified", I hesitate. I would prefer to think of the editor changing the original input or the implied output, rather than the inferred (abstract) content (although I doubt this will always be possible). We can explore the user experience later, but I would certainly expect to see a clear distinction between novel claims inferred from the input and pre-existing claims with which the input is consistent. Suppose the editor said that San Francisco was in North Carolina, for example, or that it is the largest city in Northern California.
I agree about always involving the human in the middle. However... If we develop renderers and parsers as inverses of each other, we can parse masses of existing content with which to test the renderers and understand their current limitations.--GrounderUK (talk) 23:11, 19 July 2020 (UTC)
@GrounderUK: Grammatical Framework has grammars that are indeed bi-directional, and some of the other solutions have that too. To be honest, I don't buy it - it seems that all renderings are always lossy. Just go from a language such as German that uses gendered professions to a language such as Turkish or English that does not. There is a necessary loss of information in the English and Turkish translation. --DVrandecic (WMF) (talk) 00:47, 5 August 2020 (UTC)
@DVrandecic (WMF): Yes, that's why I said "we think of rendering as a "lossy" conversion. But the lesson from this should be that rendering (per se) should be constrained to be "lossless" [...which...] implies that there will be subsequent lossy conversion required to achieve the required end result, but we need to think about ways in which the "final losses" can be retained ... so that ... an editor can modify the end result with a reasonable expectation that the revised version can be parsed back..." [emphasis added]. In your example, we (or she) may prefer "actor" to "actress", but the rendering is more like "actor"<f> (with the sex marker not displayed). In the same way, the renderer might deliver something like "<[[Judi Dench|Dame >Judi<th Olivia> Dench< CH DBE FRSA]]>" to be finally rendered in text as "Judi Dench", or "<[[Judi Dench|>Dame Judi<th Olivia Dench CH DBE FRSA]]>" for "Dame Judi", or just "<[[Judi Dench|>she<]]>" for "she". (No, that's not pretty, but you get the idea. "...we need to think about ways in which the "final losses" can be retained..." ["she", "[[Judi Dench|", "]]"]?)--GrounderUK (talk) 20:48, 6 August 2020 (UTC)
@GrounderUK: Ah, OK, yes, that would work. That seems to require some UX to ensure that these hidden fields get entered when the user writes the content? --DVrandecic (WMF) (talk) 22:11, 6 August 2020 (UTC)
@DVrandecic (WMF):Not sure... I'm talking about rendered output that might be tweaked after it is realised as wikitext. If the wikitext is full of comments, a quick clean-up option would be great. But if the final lossy conversion has already happened, then it's a question of automatically getting back the "final losses" from wherever we put them and merging them back into our wikitext before we start the edit. You might even create the page with the heavily commented wikitext (which looks fine until you want to change it) and then "replace" it with the cleaned up version, so an editor can go back to the original through the page history, if the need arises.
If you're talking about creating the language-neutral content from natural-language input in the first place, that's a whole other UX. But with templates and functions and what-have-you, we could go quite a way with just the familiar wikitext, if we had to (and I have to say that I do usually end up not using the visual editor when it's available, for one reason or another).
Either way, there might be a generic template wrapping a generic function (or whatever) to cover the cases where nothing better is available. Let's call it "unlossy" with (unquoted) arguments being, say, the end-result string and (all optional) the page name, the preceding lost-string, the following lost-string and a link switch. So, {{unlossy|she|Judi Dench| | |unlinked}} or {{unlossy|Dame Judi|Judi Dench| |th Olivia Dench CH DBE FRSA|linked}}. (There are inversions and interpolations etc to consider, but first things first.)
In general, though, (and ignoring formatting) I'd expect there would be something more specific, like {{pronoun|Judi Dench|subject}} or {{pronoun|{{PAGENAME}}|subject}} or {{UKformal|{{PAGENAME}}|short}}. If {{PAGENAME}} and subject are the defaults, it could just be {{pronoun}}. Now, if humans have these templates (or function wrappers) available, why wouldn't the rendering decide to use them in the first place? Instead of rendering, say, {"type":"function call", "function":"anaphora", "wd":"Q28054", "lang":"en", "case":"nominative"} as "she", it renders (also?) as "{{pronoun|Judi Dench}}", which is implicitly {{pronoun|Judi Dench|subject}}, which (given enwiki as the context) maps to something like {"type":"function call", "function":"anaphora", "wd":"Q28054", "lang":"en", "case":"nominative"} (Surprise!). As an editor, I can now just change "pronoun" to "first name" or "UKformal"... and all is as I would expect. That's really not so painful, is it?--GrounderUK (talk) 03:24, 7 August 2020 (UTC)
Okay, so maybe it really is that painful! It's still not 100% clear to me exactly what an editor would type into Wikipedia to kick off a function. Say I have my stripped down pronoun function, which returns by default the nominative pronoun for the subject of the page ("he", "she", "it" or "they"). I guess it looks like pronoun() in the wiki of functions (with its English label)... but, then, where is it getting its default parameters ("Language":"English", "case":"nominative", "number":"singular", "gender":"feminine") from? Number and gender are a function of "Q28054", so we're somehow defaulting that function in; the case is an internal default or fallback, for the sake of argument; and the language is (magically) provided from the context of the function's invocation. Sounds okay, -ish. Is it too early to say whether I'm close? We don't need to think of this as NLG-specific. What would age("Judi Dench") or today() look like?--GrounderUK (talk) 16:46, 1 September 2020 (UTC)

Technical discussion and documentation

  • We need a page for documenting how development is going on. I have drafted Abstract Wikipedia/ZObject, but this is obviously not enough.
  • We need a dedicated page to discuss issues related to development. For example:
    • phab:T258894 does not said how non-string data is handled. How to store number? (It is not a good way to store floating-point number as string). What about associative array (aka dict/map/object)?
    • We need a reference type to differ the object Z123 with string "Z123", especially when we have functions that accepts arbitrary type.

--GZWDer (talk) 14:21, 29 July 2020 (UTC)

@GZWDer: I am not sure what you mean about the page. We have the function model that describes a lot. I will update the ZObject page you have created. We also have task list and the phases, the last one I am still working on and trying to connect to the respective entries in Phabricator. Let me know what is uncovered.
Yes, we should have pages (or Phabricator tickets) where we discuss issues related to development, like "How to store number?". That's a great topic. Why not discuss that here?
Yes. The suggestion I have for that is to continue to use the reference and the string types, as suggested in the function model and implemented in AbstractText. Does that resolve the issue? --DVrandecic (WMF) (talk) 21:20, 29 July 2020 (UTC)

Some questions and ideas

(more will be added)

@GZWDer: Thank you for these! They are awesome! Sorry to getting slowly to them, but there's a lot of substance here! --DVrandecic (WMF) (talk) 02:13, 5 August 2020 (UTC)
example
{
 "type": "test",
 "call": {
   "type": "function call",
   "function": "equal positive integer",
   "left": {
     "type": "function call",
     "function": "add",
     "left": "two",
     "right": "two"
   },
   "right": "four"
 }
}
{
 "Z1K1": "Z20",
 "Z20K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z150",
   "Z144K1": {
     "Z1K1": "Z7",
     "Z7K1": "Z144",
     "Z144K1": "Z382",
     "Z144K2": "Z382"
   },
   "Z144K2": "Z384"
 }
}
And if a function is paired with its inverse, you can automatically check that the final output is the same as the initial input. As under#Distillation of existing content, this is an important consideration for a rendering sub-function, whose inverse is a parser sub-function.--GrounderUK (talk) 23:26, 29 July 2020 (UTC)
I really like the idea of using the inverse to auto-generate tests! That should be possible, agreed. Will you create a phab-ticket? --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
Eventually, yes. phab:T261460--GrounderUK (talk) 21:34, 27 August 2020 (UTC)
As a second thought we can fold Z20 to Z7 and eliminate Z20 (the Z7 will require return a boolean true value to pass).--GZWDer (talk) 07:47, 30 July 2020 (UTC)
Maybe. My thought is that by separating the code that creates the result and the code that checks the result, wouldn't it be easier to be sure that we are testing the right thing? --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
@DVrandecic (WMF): I don't think there will be a clear distinction between call and check error.--GZWDer (talk) 11:14, 13 August 2020 (UTC)
  • I propose a new key to Z1 (Z1K2 "quoted"). A "quoted" object (and all subobjects) is never evaluated and will be left as is unless unquoted using the Zxxx/unquote function (exception: an argument reference should be replaced if necessary even if in quoted object, unless the reference is quoted itself). (There will also be a Zyyy/quote function.) Z3 will have a argument Z3K5/quoted (default: false) to specify the behavior of constructor (for example, the Z4K1 should have Z3K5=true). Similarly we have a Z17K3. In addition we add a field K6/feature for all functions which may take a list of ZObjects including Zxxx/quote_all.--GZWDer (talk) 08:13, 30 July 2020 (UTC)
    Yes, that's a great idea! I was thinking along similar ways. My current working idea was to introduce a new type, "Z99/Quoted object", and additionally on the Z3/Key have a marker, just as you say, to state that this is "auto-quoted", and then have an unquote function. Would that be sufficient for all use cases, or do I miss something? All the keys marked as identity would be autoquoted. But yes, I really like this idea, thanks for capturing it. We will need that. Let's create a ticket! --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
  • I propose to introduce a Z2K4Z1K3/refid field that will hold the ZID of a persistent object. Similar to Z2K1 the value is always a string, but it is held in the value, like
example
{
  "type": "persistent object",
  "id": "Z382",
  "value": {
    "type": "positive integer",
    "refid": "Z382",
    "decimal representation": "2"
  },
  "label": {
    "type": "multilingual text",
    "texts": [
      {
        "type": "text",
        "language": "English",
        "text": "two"
      },
      {
        "type": "text",
        "language": "German",
        "text": "zwei"
      }
    ]
  }
}
{
  "Z1K1": "Z2",
  "Z2K1": "Z382",
  "Z2K2": {
    "Z1K1": "Z70",
    "Z2K4": "Z382",
    "Z70K1": "2"
  },
  "Z2K3": {
    "Z1K1": "Z12",
    "Z12K1": [
      {
        "Z1K1": "Z11",
        "Z11K1": "Z251",
        "Z11K2": "two"
      },
      {
        "Z1K1": "Z11",
        "Z11K1": "Z254",
        "Z11K2": "zwei"
      }
    ]
  }
}

The ID will be removed when a new ZObject is created on the fly. This will support a ZID() function which returns the ZID of a specific object (e.g. ZID(one)="Z382" and ZID(type(one))="Z70"). Any object created on the fly have an empty string as ZID. (Note this is not the same as Z4K1/(identity or call), as Z1K3 may only be a string and may only be found in value of all kinds of Z2s (not ad-hoc created objects) regardless of type.--GZWDer (talk) 15:50, 30 July 2020 (UTC))--GZWDer (talk) 08:23, 30 July 2020 (UTC)

I'm not sure what your "one" represents. Are you saying ZID(Z382) returns the string "Z382"? So, the same as ZID(Z2K4(Z382))?--GrounderUK (talk) 10:52, 30 July 2020 (UTC)
Yes, ZID(Z382) will return the string "Z382". For the second question: a key is not intended to be used as a function, so we need to define a builtin function such as getvalue(Z382, "Z2K4"), which will return an ad-hoc string "Z382" that does not have a ZID.--GZWDer (talk) 11:03, 30 July 2020 (UTC)
Ok, thanks. So, ZID() just returns the string value of whatever is functioning as the key, whether it's a Z2K1 (like "Z382") or a Z1K1 (like "Z70")...? But "Z70" looks like the Z1K1/type of the Z2K2/value rather than the Z1K1/type of Z382 (which is "Z2", as I interpret it). --GrounderUK (talk) 12:34, 30 July 2020 (UTC)
"A Z9/Reference is a reference to the Z2K2/value of the ZObject with the given ID, and means that this Z2K2/value should be inserted here." and the parameter of ZID is implicitly a reference (so only Z2K2 is passed to the function). The ZID function can not see other part of a Z2 like label. Second thought, the key should be part of Z1 instead of Z2.--GZWDer (talk) 15:14, 30 July 2020 (UTC)
Thank you for bearing with me; I missed the bit about Z9/reference! To be clear, the argument/parameter to the proposed function is or resolves to "just a string", which might be Z9/reference. If it's not a Z9/reference, the function returns an empty string ("")? When it is a Z9/reference, it is implicitly the Z2K2/value of the referenced object. The function then returns the Z1K3/refid of the referenced object (as a string) or, if there is no such Z1K3/refid (it is a non-existent object or a transient object), it returns an empty string. I'm not sure, now, whether the intent is for the Z1K3/refid to remain present in a Z2/persistent object (and equal to the Z2K1/id if the object is well-formed). Your new note above (underlined) does not say that the Z1K3 must be a string and must be present in every Z2/persistent object and must not be present in a transient object (although a transient object does not support Z9/reference, as I see it).--GrounderUK (talk) 08:47, 31 July 2020 (UTC)
Ah, I missed this discussion before I made this this edit today. Would that solve the same use cases? I am a bit worried that the Z1K3 will cause issues such as "sometimes when I get a number 2 it has a pointer to the persistent object with its labels, and sometimes it doesn't", which crept up repeatedly. --DVrandecic (WMF) (talk) 02:11, 5 August 2020 (UTC)
  • Ambiguity of Local Keys: In this document it is unclear how local keys are resolved.
example
For example we have a Z999/cons type which is equivalent to pair but with keys x and y and we create a type Z22/Pair(Z999/cons(Z70/Positive Integer, Z70/Positive integer), Z70/Positive integer):


{
 "type": {
   "type": "function call",
   "function": "pair",
   "first": {
     "type": "function call",
     "function": "cons",
     "x": "positive integer",
     "y": "positive integer"
   },
   "second": "positive integer"
 },
 "first": {
   "type": {
     "type": "function call",
     "function": "cons",
     "x": "positive integer",
     "y": "positive integer"
   },
   "x": "one",
   "y": "two"
  },
 "second": "three"
}
{
 "Z1K1": {
   "Z1K1": "Z7",
   "Z7K1": "Z22",
   "Z22K1": {
     "Z1K1": "Z7",
     "Z7K1": "Z999",
     "Z999K1": "Z70",
     "Z999K2": "Z70"
   },
   "Z22K2": "Z70"
 },
 "K1": {
   "Z1K1": {
     "Z1K1": "Z7",
     "Z7K1": "Z999",
     "Z999K1": "Z70",
     "Z999K2": "Z70"
   },
   "K1": "Z381",
   "K2": "Z382"
  },
 "K2": "Z383"
}

The K1 and K2 does not mean the same thing.--GZWDer (talk) 11:44, 30 July 2020 (UTC)

The de facto practice is if Z1K1 of an ZObject is Z7, then local keys refers to its Z7K1; otherwise it refers to its Z1K1.--GZWDer (talk) 14:58, 30 July 2020 (UTC)
Sorry, maybe I am missing something, but in the given example it seems clear for each K1 and K2 what they mean? Or is your point that over the whole document, the two two different K1s and K2s mean different things? The latter is true, but not problematic, right? Within each level of a ZObject, it is always unambiguous what K1 and K2 mean, I hope (if not, that would indeed be problematic). There's an even simpler case where K1 and K2 have different meanings, e.g. every time you have embedded function calls with positional arguments, e.g. add(K1=multiply(K1=1, K2=1), K2=0). That seems OK? (Again, maybe I am just being dense and missing something). --DVrandecic (WMF) (talk) 02:22, 5 August 2020 (UTC)
  • I propose a new type of ZObject Zxxx/module. A module have a key ZxxxK1 with a list of Z2s as value. The Z2s may have same ID as global objects. We introduce a new key Z9K2/module and a new ZObject Zyyy/ThisModule, which make ZObjects in a module able to refer to ZObjects in other modules. This will make functions portable and prevent polluting global namespace. We may also introduce a modulename.zobjectname (or modulename:zobjectname) syntax for refering individule ZObject in a module in composition expression. (Note a module may use ZObjects from other module, but it would be better that we create a function that use the required module as parameter, so that a module will not rely on global ZObjects other than builtin.)--GZWDer (talk) 19:13, 30 July 2020 (UTC)
    I understand your usecase and your solution. I think having a single flat namespace is conceptually much easier. Given the purely functional model, I don't see much advantage to this form of information hiding - besides for modules being more portable between installations (a topic that I intentionally dropped for now). But couldn't portability be solved by a namespacing model similar to the way RDF and XML does it? --DVrandecic (WMF) (talk) 02:25, 5 August 2020 (UTC)
  • Placeholder type: We introduce a "placeholder" type, to provide a (globally or locally) unique localized identifier. Every placeholder object is different. See below for the usecase.--GZWDer (talk) 03:56, 31 July 2020 (UTC)
  • Class member: I propose to add
    • a new type Zxxx/member; a instance of member have two key: ZxxxK1/key and ZxxxK2/value, both arbitrary ZObjects. If key does not need to be localized, it may be simply string. If it needs to be localized, it is recommended to use a placeholder object for it.
    • a new key Z4K4/members, value is a list of members. Note all members are static and not related to a specific instance.
    • new builtin function member_property and member_function: member_property(Zaaa, Zbbb) will return the ZxxxK2/value of a member of the type of Zaaa (i.e. Z1K1 of Zaaa, not Zaaa itself) with ZxxxK1/key equals to Zbbb. member_function is similar to member_property, but only work if member_property is a function, and the result is a new function that the first parameter of member_property bound to Zaaa. Therefore it returns a function related to a specific instance.

--GZWDer (talk) 03:56, 31 July 2020 (UTC)

  • Inherit type: I propose to add a new key Z4K5/inherit, the value is a list of ZObjects.
    • An inherited type will have all members of the parent type(s), and also all keys. (The parent type should be persistent, so that it will be possible to create a instance with specific keys - the keys may consist of those defined in child type and those defined in parent type. This may be overcomed - I have two schemes to define a relative key, but both have some weakpoints. One solution is discussed in "Temporary ZID" below--GZWDer (talk) 18:11, 4 August 2020 (UTC)) Members defined in child type will override any member defined in parent type(s).
    • We introduce a builtin function isderivedfrom() to query whether a type is a child type of another.
    • This will make it possible to build functions for arbitrary type derived from a specific interface (which are type itself with no keys), such as Serializable, Iterator.
      • An iterator (type derived from the Iteratible type) is simply any type with a next function, which generates a "next state" from current state. Some example is a generator of all (infinite) prime numbers, or a cursor of database query results.
      • We would be able to create a general filter function for arbitrary iterator, which will itself return a iterator.

--GZWDer (talk) 03:56, 31 July 2020 (UTC) A generalized filter for Iteratible will generate a Iteratible

  • @GZWDer: I raised inheritance with Denny a while back; I agree there needs to be a mechanism for it, and it's already implicit in the way Z1 keys work. But I wonder if it needs to be declared... If a ZObject has keys that belong to another, doesn't that implicitly suggest it inherits the meaning of those keys? Or there could be subdomains of ZObjects under some of which typing is stricter than for others (i.e. inherit from "typed object" vs inherit from plain "ZObject")? Typing and inheritance can get pretty complicated though and perhaps is only necessary for some purposes. ArthurPSmith (talk) 17:27, 31 July 2020 (UTC)
    @GZWDer: Agreed, that would be a way to implement OO mechanisms in the suggested model. And I won't stop anyone from doing it. My understanding is that this would all work *on top* of the existing model. I hope to be able to avoid putting inheritance and subtyping into the core system, as it makes it a much simpler system. But it should be powerful enough to implement it on top. Fortunately, this would not require a change in the current plans, if I see this correctly. Sounds right? --DVrandecic (WMF) (talk) 02:33, 5 August 2020 (UTC)
  • I propose a new key Z2K4/Serialization version to mitigate breaking change of serialization format. For example "X123abcK1" is not a valid key but I propose to use such key below.--GZWDer (talk) 18:11, 4 August 2020 (UTC)
    This might be needed. The good thing is that we can assume this to be already here, have a default value of "version 1", and introduce that key and the next value whenever we need it. So, yes, will probably be needed at some point. (I hope to push this point as far to the future as possible :) ) --DVrandecic (WMF) (talk) 02:36, 5 August 2020 (UTC)
  • Temporary ZID: we introduce a new key Z1K4/Temporary ZID. A temporary ZID may have a format Xabc123 where abc123 is random series of hexadecimal digits (alternatively we can use only decimal digits). For a transient object without a Z1K4 specified, a random Temporary ZID will be generated (which is not stable). Usecases:
    • As the key of a basic unit of ZObject used by evaluators; i.e. When evaluating, we use a pool of ZObjects to reduce redundant evaluation.
    • One of solutions of "relative keys" (see above) - the XID may easily use to form a key like X123abcK1.
    • A new serialized format to reduce duplication: A ZObject may have a large number of identical embedded ZObjects.
    Some other notes:
    • For easier evaluation, the temporary ZID should be globally unique. However it is not easy to guarantee this especially if the temporary ZID is editable.
    • When a ZObject is changed it should have a new temporary ZID. But similary it is not easy to guarantee.
    • We introduce a new function is() to check whether two object have the same temporary ZID (ZObjects created on the fly have a random temporary ZID, so is not equivalent with other ZObjects)
    • This may it possible to have ZObjects with subobjects relying on each other (such as a pair that the first element points to the pair itself). We should discuss whether it should be allowed. Such objects do not have a finite (traditional) serialization and may break other functions such as equals_to. If such object is not allowed, an alternative is to use "hash" to refer to a specific ZObject.
      • Note how will equal_to work for two custom objects is itself an epic.
example
{
  "type": "persistent object",
  "id": "Z70",
  "value": {
    "type": "type",
    "identity": "positive integer",
    "keys": ["X7cc26b14"],
    "validator": "validate positive integer",
    "refid": "Z70"
  },
  "label": {
    "type": "multilingual text",
    "texts": [
      {
        "type": "text",
        "language": "English",
        "text": "positive integer"
      },
      {
        "type": "text",
        "language": "German",
        "text": "natürliche Zahl"
      }
    ]
  },
  "X7cc26b14": {
    "type": "key",
    "value type": "string",
    "key id": "Z70K1",
    "label": "X82af0232",
    "default": "value required",
    "temporaryzid": "X7cc26b14"
  },
  "X82af0232": {
    "type": "multilingual text",
    "texts": ["Xa3321b14", "Xf4422211"],
    "temporaryzid": "X82af0232"
  },
  "Xa3321b14": {
    "type": "text",
    "language": "English",
    "text": "decimal representation",
    "temporaryzid": "Xa3321b14"
  },
  "Xf4422211": {
    "type": "text",
    "language": "German",
    "text": "Dezimaldarstellung",
    "temporaryzid": "Xf4422211"
  }
}
{
  "Z1K1": "Z2",
  "Z2K1": "Z70",
  "Z2K2": {
    "Z1K1": "Z4",
    "Z4K1": "Z70",
    "Z4K2": ["X7cc26b14"],
    "Z4K3": "Z559",
    "Z1K3": "Z70"
  },
  "Z2K3": {
    "Z1K1": "Z12",
    "Z12K1": [
      {
        "Z1K1": "Z11",
        "Z11K1": "Z251",
        "Z11K2": "positive integer"
      },
      {
        "Z1K1": "Z11",
        "Z11K1": "Z254",
        "Z11K2": "natürliche Zahl"
      }
    ]
  },
  "X7cc26b14": {
    "Z1K1": "Z3",
    "Z3K1": "Z6",
    "Z3K2": "Z70K1",
    "Z3K3": "X82af0232",
    "Z3K4": "Z25",
    "Z1K4": "X7cc26b14"
  },
  "X82af0232": {
    "Z1K1": "Z12",
    "Z12K1": ["Xa3321b14", "Xf4422211"],
    "Z1K4": "X82af0232"
  },
  "Xa3321b14": {
    "Z1K1": "Z11",
    "Z11K1": "Z251",
    "Z11K2": "decimal representation",
    "Z1K4": "Xa3321b14"
  },
  "Xf4422211": {
    "Z1K1": "Z11",
    "Z11K1": "Z254",
    "Z11K2": "Dezimaldarstellung",
    "Z1K4": "Xf4422211"
  }
}
example (alternative 2)
{
  "type": "persistent object",
  "id": "Z70",
  "value": "X6a9c1723",
  "label": {
    "type": "multilingual text",
    "texts": [
      {
        "type": "text",
        "language": "English",
        "text": "positive integer"
      },
      {
        "type": "text",
        "language": "German",
        "text": "natürliche Zahl"
      }
    ]
  },
  "X6a9c1723": {
    "type": "type",
    "identity": "positive integer",
    "keys": ["X7cc26b14"],
    "validator": "validate positive integer",
    "temporaryzid": "X6a9c1723"
  },
  "X7cc26b14": {
    "type": "key",
    "value type": "string",
    "key id": "Z70K1",
    "label": "X82af0232",
    "default": "value required",
    "temporaryzid": "X7cc26b14"
  },
  "X82af0232": {
    "type": "multilingual text",
    "texts": ["Xa3321b14", "Xf4422211"],
    "temporaryzid": "X82af0232"
  },
  "Xa3321b14": {
    "type": "text",
    "language": "English",
    "text": "decimal representation",
    "temporaryzid": "Xa3321b14"
  },
  "Xf4422211": {
    "type": "text",
    "language": "German",
    "text": "Dezimaldarstellung",
    "temporaryzid": "Xf4422211"
  }
}
{
  "Z1K1": "Z2",
  "Z2K1": "Z70",
  "Z2K2": "X6a9c1723",
  "Z2K3": {
    "Z1K1": "Z12",
    "Z12K1": [
      {
        "Z1K1": "Z11",
        "Z11K1": "Z251",
        "Z11K2": "positive integer"
      },
      {
        "Z1K1": "Z11",
        "Z11K1": "Z254",
        "Z11K2": "natürliche Zahl"
      }
    ]
  },
  "X6a9c1723": {
    "Z1K1": "Z4",
    "Z4K1": "Z70",
    "Z4K2": ["X7cc26b14"],
    "Z4K3": "Z559",
    "Z1K4": "X6a9c1723"
  },
  "X7cc26b14": {
    "Z1K1": "Z3",
    "Z3K1": "Z6",
    "Z3K2": "Z70K1",
    "Z3K3": "X82af0232",
    "Z3K4": "Z25",
    "Z1K4": "X7cc26b14"
  },
  "X82af0232": {
    "Z1K1": "Z12",
    "Z12K1": ["Xa3321b14", "Xf4422211"],
    "Z1K4": "X82af0232"
  },
  "Xa3321b14": {
    "Z1K1": "Z11",
    "Z11K1": "Z251",
    "Z11K2": "decimal representation",
    "Z1K4": "Xa3321b14"
  },
  "Xf4422211": {
    "Z1K1": "Z11",
    "Z11K1": "Z254",
    "Z11K2": "Dezimaldarstellung",
    "Z1K4": "Xf4422211"
  }
}

--GZWDer (talk) 18:04, 4 August 2020 (UTC)

I agree with the use cases for Z1K4. In AbstractText I solved these use cases by either taking a hash of the object, or a string serialization - i.e. the object representation is its own identity. Sometimes, the evaluator internally added such a temporary ID and used that, IIRC. But that all seems to be something that is only interesting within the confines of an evaluation engine, right? And an evaluation engine should be free to do these modifications (and much more) as it wishes. And there such solutions will be very much needed - but why would we add those to the function model and to Z1 in general? We wouldn't store those in the wiki, they would just be used in the internal implementation of the evaluator - or am I missing something? So yes, you are right, this will be required - but if I understand it correctly, it is internal, right? --DVrandecic (WMF) (talk) 02:42, 5 August 2020 (UTC)