Talk:Abstract Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search


Might deep learning-based NLP be more practical?[edit]

First of all, I'd like to state that Abstract Wikipedia is a very good idea. I applaud Denny and everyone else who has worked on it.

This is kind of a vague open-ended question, but I didn't see it discussed so I'll write it anyway. The current proposal for Wikilambda is heavily based on a generative-grammar type view of linguistics; you formulate a thought in some formal tree-based syntax, and then explicitly programmed transformational rules are applied to convert the output to a given natural language. I was wondering whether it would make any sense to make use of connectionist models instead of / in addition to explicitly programmed grammatical rules. Deep learning based approaches (most impressively, Transformer models like GPT-3) have been steadily improving over the course of the last few years, vindicating connectionism at least in the context of NLP. It seems like having a machine learning model generate output text would be less precise than the framework proposed here, but it would also drastically reduce the amount of human labor needed to program lots and lots of translational rules.

A good analogy here would be Apertium vs. Google Translate/DeepL. Apertium, as far as I understand it, consists of a large number of rules programmed manually by humans for translating between a given pair of languages. Google Translate and DeepL are just neural networks trained on a huge corpus of input text. Apertium requires much more human labor to maintain, and its output is not as good as its ML-based competitors. On the other hand, Apertium is much more "explainable". If you want to figure out why a translation turned out the way it did (for example, to fix it), you can find the rules that caused it and correct them. Neural networks are famously messy and Transformer models are basically impossible to explain.

Perhaps it would be possible to combine the two approaches in some way. I'm sure there's a lot more that could be said here, but I'm not an expert on NLP so I'll leave it at that. PiRSquared17 (talk) 23:46, 25 August 2020 (UTC)

@PiRSquared17: thank you for this comment, and it is a question that I get asked a lot.
First, yes, the success of ML systems in the last decade have been astonishing, and I am amazed by how much the field has developed. I had one prototype that was built around an ML system, but even more than the Abstract Wikipedia proposal it exhibited a Matthew effect - languages that were already best represented benefitted from that architecture the most, whereas the languages that needed most help would get least of it.
Another issue is, as you point out, that within a Wikimedia project I would expect the ability for contributors to go in and fix errors. This is considerably easier with the symbolic approach chosen for Abstract Wikipedia than with an ML-based approach.
Having said that, there are certain areas I will rely on ML-based solutions in order to get them working. This includes an improved UX to create content, and this includes analysis of the existing corpora as well as of the generated corpora. There is even the possibility of using an ML-based system to do the surface cleanup of the text to make it more fluent - basically, to have an ML-based system do copy on top of the symbolically generated text, which could have the potential to reduce the complexity of the renderers considerably and yet get good fluency - but all of these are ideas.
In fact, I am planning to write a page here where I outline possible ML tasks in more detail.
@DVrandecic (WMF): Professor Reiter ventured an idea or two on this topic a couple of weeks ago: " may be possible to use GPT3 as an authoring assistant (eg, for developers who write NLG rules and templates), for example suggesting alternative wordings for NLG narratives. This seems a lot more plausible to me than using GPT3 for end-to-end NLG."--GrounderUK (talk) 18:22, 31 August 2020 (UTC)
@GrounderUK: With respect to GPT-3, I am personally more interested in things like for the (cross-linguistic) purposes of Abstract Wikipedia. You might be able to imagine strengthening Wikidata and/or assisting abstract content editing. --Chris.Cooley (talk) 22:33, 31 August 2020 (UTC)
@Chris.Cooley: I can certainly imagine such a thing. However it happens, the feedback into WikidataPlusPlus is kinda crucial. I think I mentioned the "language-neutral synthetic language that inevitably emerges" somewhere (referring to Wikidata++) as being the only one we might have a complete grammar for. Translations (or other NL-type renderings) into that interlingua from many Wikipedias could certainly generate an interesting pipeline of putative data. Which brings us back to #Distillation of existing content (2nd paragraph)...--GrounderUK (talk) 23:54, 31 August 2020 (UTC)
@DVrandecic (WMF): May we go ahead and create such an ML page, if you haven't already? James Salsman (talk) 19:00, 16 September 2020 (UTC)
Now it could be that ML and AI will develop with such a speed to make Abstract Wikipedia superfluous. But to be honest (and that's just my point of view), given the development of the field in the last ten years, I don't see that moment be considerably closer than it was five years ago (but also, I know a number of teams working on a more ML-based solution to this problem, and I honestly wish them success). So personally I think there is a window of opportunity for Abstract Wikipedia to help billions of people for quite a few years, and to allow many more to contribute to the world's knowledge sooner. I think that's worth it.
Amusingly, if Abstract Wikipedia succeeds, I think we'll actually accelerate the moment where we make it's existent unnecessary. --DVrandecic (WMF) (talk) 03:55, 29 August 2020 (UTC)
I will oppose introducing any deep learning technique as it is 1. difficult to develop 2. difficult to train 3. difficult to generalize 4. difficult to maintain.--GZWDer (talk) 08:06, 29 August 2020 (UTC)
It could be helpful to use ML-based techniques for auxiliary features, such as parsing natural language into potential abstract contents for users to choose/modify, but using such techniques for rendered text might not be a good idea, even if it's just used on top of symbolically generated text. For encyclopedic content, accuracy/preciseness is much more important than naturalness/fluency. As suggested above, "brother of a parent" could be an acceptable fallback solution for the "uncle" problem, even it doesn't sound natural in a sentence. While a ML-based system will make sentences more fluent, it could potentially turn a true statement into a false one, which would be unacceptable. Actually, many concerns raised above, including the "uncle" problem, could turn out to be advantages of current rule-based approach over ML-based approach. Although those are challenging issues we need to address, it would be more challenging for ML/AI to resolve those issues. --Stevenliuyi (talk) 23:41, 12 September 2020 (UTC)

Hello everyone. Somewhat late to the party, but I recently proposed a machine learning data curation that is germane to this discussion I think. I propose using machine learning to re-index Wikipedias to paraphrases. FrameNet and WordNet have been working on the semantics problem that Abstract is trying to address. The problem they face is the daunting task of manually building out their semantic nuggets. Not to mention the specter over every decision's shoulder, opinion. With research telling us that sentences follow Zipf's Law paraphrases become the sentence equivalent to words' synonyms. It also means our graph is completely human readable, as we retain context for each sentence member of the paraphrase node. You could actually read a Wikipedia article directly from the graph. This way, we can verify the machine learning algorithms. Just like a thesaurus lists all of the words with the same meaning, our graph would illustrate all of the concepts with the same meaning, and the historic pathways used to got to related concepts. Further, by mapping Wikidata meta to the paraphrase nodes, we would have a basic knowledge representation. The graph is directed, which will also assist function logic for translating content. Extending this curation to the larger Internet brings new capabilities into play. I came at this from a decision support for work management perspective. My goal was to help identify risk and opportunities in our work. I realized that I needed to use communication information to get to predictive models like Polya's Urn with Innovation Triggering. I realized along the way that this data curation is foundational, and must be under the purview of a collective rather than some for profit enterprise. So here I am. I look forward to discussing this concept while learning more about Abstract. Please excuse my ignorance as I get up to speed on this wonderfully unique culture.--DougClark55 (talk) 21:33, 11 January 2021 (UTC)

Parsing Word2Vec models and generally[edit]

Moved to Talk:Abstract Wikipedia/Architecture#Constructors. James Salsman (talk) 20:36, 16 September 2020 (UTC)

Naming the wiki of functions[edit]

We've started the next steps of the process for selecting the name for the "wiki of functions" (currently known as Wikilambda), at Abstract Wikipedia/Wiki of functions naming contest. We have more than 130 proposals already in, which is far more than we expected. Thank you for that!

On the talk page some of you have raised the issue that this doesn’t allow for an effective voting, because most voters will not go through 130+ proposals. We've adjusted the process based on the discussion. We're trying an early voting stage, to hopefully help future voters by emphasizing the best candidates.

If you'd like to help highlight the best options, please start (manually) adding your Support votes to the specific proposals, within the "Voting" sub-sections, using: * {{support}} ~~~~

Next week we'll split the list into 2, emphasizing the top ~20+ or so, and continue wider announcements for participation, along with hopefully enabling the voting-button gadget for better accessibility. Cheers, Quiddity (WMF) (talk) 00:18, 23 September 2020 (UTC)

Merging the Wikipedia Kids project with this[edit]

I also proposed a Wikipedia Kids project, and somebody said that I should merge it with Abstract Wikipedia. Is this possible?Eshaan011 (talk) 16:50, 1 October 2020 (UTC)

Hi, @Eshaan011: Briefly: Not at this time. In more detail: That's an even bigger goal than our current epic goal, and whilst it would theoretically become feasible to have multiple-levels of reading-difficulty (albeit still very complicated, both technically and socially) once the primary goal is fully implemented and working successfully, it's not something we can commit to, so we cannot merge your proposal here. However, I have updated the list of related proposals at Childrens' Wikipedia to include your proposal and some other older proposals that were missing, so you may wish to read those, and potentially merge your proposal into one of the others. I hope that helps. Quiddity (WMF) (talk) 20:06, 1 October 2020 (UTC)
@Eshaan011 and Quiddity (WMF): Ultimately, I agree with Quiddity but, in the real world, not so much. We have already discussed elsewhere the possibility of different levels of content and respecting the editorial policies of different communities. Goals such as these imply that language-neutral content will be filtered or concealed by default in some contexts. Furthermore, the expressive power of different languages will not always be equally implemented, so we will need to be able to filter out (or gracefully fail to render) certain content in certain languages. Add to this the fact that language-neutral content will be evolving over a very long time, so we might expect to begin with (more) basic facts (more) simply expressed in a limited number of languages. To what extent the community will be focusing on extending the provision of less basic facts in currently supported languages rather than basic facts in more languages is an open question. It does not seem to me to be unreasonable, however, to expect that we might begin to deliver some of the suggested levelled content as we go along, rather than after we have substantially completed any part of our primary goal. (Quite why we would take the trouble to find a less simple way of expressing existing language-neutral content is unclear; perhaps it would be a result of adopting vocabulary specific to the subject area (jargon). In any event, I would expect some editorial guidelines to emerge here.)--GrounderUK (talk) 15:31, 4 November 2020 (UTC)

Why defining input or output parameters for pure "functions" ?[edit]

Some functions may also take as input *references* to other functions that will be called internally (as callbacks that may be used to return data, or for debugging, tracing... or for hinting the process that will compute the final result: imagine a "sort()" function taking a custom "comparator" function as one of its inputs).

Note as well that formally, functions are not imperative about the role assigned to input and output parameters: imagine the case of inversible inferences, like "sum(2,3,:x)" returning "{x=5}" and "sum(:x,3,5)" returning "{x=2}", like in Smalltalk, Prolog and other IA languages: no need to define multiple functions if we develop the concept of "pure" functions *without* side effects.

So what is important is the type of all input and ouput parameters together, without the need to restrict one of them as an output: binding parameters to values is to assign them the role of input. The function will return one or more solutions, or could return another function representing the set of solutions.

And to handle errors/exception, we need an additional parameter (usually defined as an input, but we can make inferences as well on errors. Errors/exceptions are just another datatype.

What will be significant is just the the type signature of all parameters (input or output, including error types) and a way to make type inference to select a suitable implementation that can reduce the set of solutions.

-- verdy_p (talk) 09:30, 29 October 2020 (UTC)

A question about the logo voting[edit]

Question Question: I have a question about the logo voting. Is the page Abstract Wikipedia/Logo about Abstract Wikipedia or wiki of functions? (Or about both?) --Atmark-chan <T/C> 06:56, 3 December 2020 (UTC)

The page is a draft, for early propositions; for now there will be no specific logo for Abstract Wikipedia itself, which will not start before one year (and that will still not be a new wiki, but an overall project across Wikimedia projects, to integrate Wikifunctions, Wikidata and other tools into existing Wikipedia projects)
The content of that page will be updated in the coming discussions about the final organization of the vote. The draft is just there to allow prople to prepare and reference their proposals (I just made the first proposal, others are welcome, the structure for putting multiple proposals and organize them is not really decided. For this reason, that page is a draft to complete (and note that there's no emergency for now for the logo, even the early wiki for Wikifunctions will be in draft for many months and it is likely to change a lot in the coming year; we have plenty of time to decide a logo; note also that the current alpha test wikis use an early custom logo, seen in the Facebook page of the project, but very poor; it was based on some older presentations of the project before its approval by the WMF).
Wikifunctions however is not just for Wikipedia and will certainly have a use on all other non-wikipedia projects or even projects outside Wikimedia, including non-wiki sites).
The proposals to submit soon will be for the new wiki built very soon for Wikifunctions only (note: the name was voted, but is still not official, it will be announced in a couple of week, we are waiting for a formal decision by the WMF after legal review; this is independant of the logo). I don't think we need to include the project name in the logo (and during the evaluation of names, it was decided that it should be easily translatable, so likely the name will be translated: translating the project name inside the logo can be done separately in an easier way if the logo does not include this name, which can be composed later if we want it, or could be displayed dynamically on the wiki, in plain HTML, without modifying its logo; a tool could also automatically precompose several variants of the logo with different rendered names, and then display the composed logo image according to the user's language, if Mediawiki supports that).
How all other projects will be coordinated is still something not decided, and only Wikifunctions has been approved as a new wiki, plus the long-term project for the Abstract Wikipedia (which will require coordination with each Wikipedia edition and other development for the integration of Wikifunctions and Wikidata.
In one year or so, it will be time to discuss about Abstract Wikipedia and the logo (if one is needed) should then focus Wikipedia.
Note also that Wikifunctions will work with a separate extension for MediaWiki, which will have its own name too (WikiLambda) but it will be generic and not necessarily tied to Wikifunctions: this extension may be later integrable to other wikis, including outside Wikimedia: it will be a generic plugin for MediaWiki. But we are too far from this goal, and it will require some interests from other external wikis (they already use several other extensions, including notably SemanticMediawiki, which may or may not also be used later along with Wikifunctions, possibly for building AbstractWikipedia as well). verdy_p (talk) 10:40, 3 December 2020 (UTC)
@Verdy p: Oh, I see. Thank you! Atmark-chan <T/C> 08:05, 4 December 2020 (UTC)
I'm not convinced that there will be no new wiki for our language-neutral Wikipedia but I agree that none is yet planned! --GrounderUK (talk) 12:25, 4 December 2020 (UTC)

Confusion on Wikipedia[edit]

Apparently, there's a big confusion throughout multiple languages of Wikipedia about the real scope of Wikifunctions and Abstract Wikipedia. I tried to fix the Wikifunctions Wikipedia article and separate the items into Abstract Wikipedia (Q96807071) and Wikifunctions (Q104587954), but the articles in other languages need to be updated to better reflect how both the projects work. Luk3 (talk) 19:19, 30 December 2020 (UTC)

@Luk3: well done, i've updated the page in italian --Sinucep (talk) 19:49, 2 January 2021 (UTC)
Having this project running under the name Abstract Wikipedia is a needless way to engage in conflict with Wikipedians. Why not rename the project here into Wikifunctions now that we have a name? ChristianKl❫ 01:25, 16 January 2021 (UTC)
@ChristianKl: This topic explains exactly that the two projects are different in nature. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 12:22, 16 January 2021 (UTC)

Logo for Wikifunctions wiki[edit]

Wikifunctions needs a logo. Please help us to discuss the overall goals of the logo, to propose logo design ideas, and to give feedback on other designs. More info on the sub-page, plus details and ideas on the talkpage. Thank you! Quiddity (WMF) (talk) 23:01, 14 January 2021 (UTC)

  • The logo should be inviting to a broad public of users. It shouldn't produce a "this project is not for me" reaction in users who don't see themselves as technical.
Maybe a cute mascot could provide for a good logo. ChristianKl❫ 00:46, 16 January 2021 (UTC)

There's no consensus in Wikidata that Abstract Wikipedia is an extension of it[edit]

@DVrandecic (WMF) and Verdy p: The project description currently says: Abstract Wikipedia is an extension of Wikidata. As far as I'm concerned what's an exntension of Wikidata and what isn't is up to the Wikidata community. I thus removed the sentence. It was reverted. Do we need an explicit RfC on Wikidata to condem Abstract Wikipedia for overstepping boundaries to get that removed? ChristianKl❫ 13:00, 16 January 2021 (UTC)

  • @ChristianKl: Your edit was reverted because you seem to have used a weeks-old version of the page for your edit, and thus have yourself reverted all the recent edits for no reason at all. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 13:07, 16 January 2021 (UTC)
    Actually this edit was based on a much older version, than just one ago, it used the old version from 16 October, 3 months ago, and the Wikifunctions name was still not voted ! Other details have changed since then, there were some new announcements, changes in the working team, changes for some links (including translated pages). What you did also dropped all edits by the WMF team made since 3 months up to yesterday. verdy_p (talk) 17:45, 16 January 2021 (UTC)

@ChristianKl: I agree. I have qualified the original statement, quoting the relevant section of the Abstract Wikipedia plan. Hope it's okay now.--GrounderUK (talk) 15:02, 16 January 2021 (UTC)

I revisited your addition: note that it will break an existing translted paragraph, so to ease the work, I separated the reference, which also included a link not working properly with translations: beware notably with anchors (avoid using section headings directly, they vary across languages!) I used the stable (untranslated) anchor and located the target of the link more precisely. Your statement was a bit too elusive, as there's no intent for now to modify Wikidata before 2022 to add contents there. It's clear we'll have new special pages, but not necessarily new content pages in Wikidata.
The "abstract content" needed for Abstract Wikipedia is also different from what will be stored initially in Wikifunctions (which will be independant of any reference to Wikidata elements but will contain the implementations of functions, needed for transforming the "abtract content" into translated content integrable to any Wikipedia (or other wikis, multilingual or not...). The first thing will be to integrate Wikifunctions (still useful for reusable modules and templates), but this is still independant of Abstract Wikipedia still not developed this year: Wikifunctions will just be one of the tools usable to create LATER the "Abstract Wikipedia" and integrate it (don't expect a integration into Wikidata and Wikipedias before the end of 2022; I think it will be in 2023 or even later, after lot of experiments in just very few wikipedias; the integration in large wikipedias is very unlikely before about 5 years, probably not before 2030 for the English Wikipedia: the first goal will be to support existing small Wikipedias, including those in Philippines languages which seem to grow large and fast, but with lot of bot-generated articles.). verdy_p (talk) 18:49, 16 January 2021 (UTC)
Thanks. I'm not sure what you found "elusive"; my statement was a direct quotation from the linked source, which asserts that Wikidata community agreement is necessary for any eventual changes to Wikidata and that some alternative will be adopted if the community does not agree. Perhaps the plan itself is elusive, but additional speculation about timescales and the nature of the alternatives seems to obscure the point; it is also absent from the referenced source (and makes additional work for translators). You make a couple of points above that I don't agree with. It is not clear to me that new special pages will be required in Wikidata but, if they are, they will require agreement from the Wikidata comunity. I also believe that early examples of functions in Wikifunctions will (and should) be dependent on "reference to Wikidata elements", in the same way that some Wikipedia infoboxes are dependent on such elements.--GrounderUK (talk) 20:39, 16 January 2021 (UTC)
There's no "speculation " about timescale in what is to translate in the article. All this is already in the development plan. I only speculate a bit at end of my reply just before your reaction here (it seems clear that about 5-10 years will pass before we see a change in English Wikipedia to integrate the Abstract Wikipedia, simply because it does not really need it: the Abstract Wikipedia is clearly not the goal of this project; on the opposite, Wikifunctions will have its way in English Wikipedia quite soon, just because it will also facilitate the exchanges with Commons, also needing a lot of the same functions). I clearly make a difference between Wikifunctions (short term with a limited goal), and Abstract Wikipedia (no clear term, but in fact long term umbrella project needing much more than just Wikifunctions). verdy_p (talk) 03:25, 17 January 2021 (UTC)