Talk:Abstract Wikipedia

From Meta, a Wikimedia project coordination wiki

When will the Code of Conduct be drafted?[edit]

I noticed that there is no Code of Conduct yet for Wikifunctions. However, the beta is already out, and editors are starting to come in. Could somebody give details on what a Code of Conduct could look like, and when it would be released? 2601:647:5800:1A1F:CCA8:DCA6:63BA:A30A 01:00, 2 September 2022 (UTC)[reply]

There will be discussion about this before launch. We are already planning for it. -- DVrandecic (WMF) (talk) 19:42, 24 October 2022 (UTC)[reply]
Please see Abstract Wikipedia/Updates/2022-11-17 for a newsletter post on this topic, and request for input and ideas. Thanks! Quiddity (WMF) (talk) 02:35, 18 November 2022 (UTC)[reply]

Translation accuracy?[edit]

"In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A particular language Wikipedia can translate this language-independent article into its language. Code does the translation" -> this sounds like machine translation to me. How do we make sure that the translation is 100% accurate? It's impossible for the machine translation to be always correct. X -> machine translation -> Y. X & Y are 2 different languages. Depending on which languages they are, the accuracy could be as low as 50%. Nguyentrongphu (talk) 00:59, 12 November 2022 (UTC)[reply]

Hi @Nguyentrongphu. There are many slow ongoing discussions about how the system could/should work. In a nutshell, it will not be using a plain machine-translation system; instead, there will be some kind of system(s) for editors to write "abstract sentences", that use (rely upon) the structured data in Wikidata's Lexemes and Items, to create properly localized sentences. A recent overview of a few aspects, including comparisons to some existing Wikimedia tools, is in Abstract Wikipedia/Updates/2022-06-07. Following the links from that page will lead to many more details and discussions. I hope that helps! Quiddity (WMF) (talk) 19:46, 14 November 2022 (UTC)[reply]
The first approach is basically automatic translation using Wikidata items. The end results are almost identical to a typical machine translation.
The second approach looks to me like a version of machine translation with some tweakings: machine translation + some tweakings by humans + a lot of sentence simplification. Even then, it's still flawed in some ways. If translation can be done automatically correctly, the world wouldn't need translator or interpreter anymore. Human tweaking process is labor intensive though. Based on what I read, it's done manually sentence by sentence. Instead of tweaking the function, one can just use that time to just translate the article manually (probably faster), and the article would sound more natural (grammatically) and more correct (quality of translation). Sadly, I don't see any utility in this approach unless AI (artificial intelligence) becomes much more advanced in the future (20 more years, perhaps?).
If understanding the gist of an article is all one needs, Google translation is doing just fine (for that purpose) with 6.5 million articles available in English Wikipedia to "read" in any language. If the articles are 100% machine translated to another Wikipedia language -> they would be deleted. Nguyentrongphu (talk) 23:52, 14 November 2022 (UTC)[reply]
Re: Google/machine translation - Unfortunately, those systems only work for some languages (whichever ones a company decides are important enough, versus the 300+ that are supported in Wikidata/Wikimedia), and as you noted above, the results are very inconsistent, with some results being incomprehensible. We can't/won't/don't want to use machine translation, for all the reasons you've described, and more.
Re: "one can just use that time to just translate the article manually" - One benefit of the human-tweaked template-style abstract-sentences, and one reason why it is best if they are simple sentences, is that they can then potentially be re-used in many articles/ways.
E.g. Instead of having a bot that creates thousands of stub articles about [species / villages / asteroids / etc], as occurred at some wikis in the years past (e.g. one example of many) (and some of which have since been mass-deleted, partially because they were falling badly out of date), we can instead have basic info automatically available and updated from a coordinated place (like some projects do with Wikidata-powered Infoboxes). And instead of having to constantly check if a new fact exists for each of those thousands of articles in hundreds of languages (such as a newer population-count for a village, or endangered-classification for a species), it could be "shown if available".
As an over-simplified example: An article-stub about a species of animal could start with just the common-name and scientific-name (if that is all that is available). But then it could automatically add a (human-tweaked/maintained) sentence about "parent taxon", or "distribution", or "wingspan" or "average lifespan" when that info is added to Wikidata for that species. Or even automatically add a "distribution map" to the article, if that information becomes available (e.g. d:Q2636280#P8485) and if the community decides to set it up that way.
I.e. the system can multiply the usefulness of a single-sentence (to potentially be used within many articles in a language), and also multiply the usefulness of individual facts in Wikidata (to many languages).
It also provides a starting point for a manually-made local article, and so helps to overcome the "fear of a blank page" that many new and semi-experienced editors have (similarly to the way that ArticlePlaceholder is intended to work, e.g nn:Special:AboutTopic/Q845189). I.e. Abstract Wikipedia content is not intended as the final perfect state for an article, but rather to help fill in the massive gaps of "no article at all" until some people decide to write detailed custom information in their own language.
If you're interested in more technical details (linguistic and programming), you might like to see Abstract Wikipedia/Updates/2021-09-03 and Abstract Wikipedia/Updates/2022-08-19.
I hope that helps, and apologize for the length! (It's always difficult to balance/guess at everyone's different desires for conciseness vs detail). Quiddity (WMF) (talk) 03:51, 15 November 2022 (UTC)[reply]
Thank you! I like your very detailed answer. I think I understand everything now. Abstract Wikipedia is basically an enhanced version of machine translation (plus human tweaking) with the ultimate goal of creating millions of stubs in less developed Wikipedias. While it certainly has its own merits, I'm not so sure if the benefits outweigh the cost (a lot of money + years of efforts invested into it). First, good quality articles can't be composed of just simple sentences. Second, creating millions of stubs is a good seeding event, but bots can do the job just fine (admittedly, one has to check for new information once in a while; once every 5 years is fine). Plus, machine translation can also be fine tuned to focus on creating comprehensible stubs, and that has been done already. Third, it's true that Google translation does not include all languages, but it contains enough to serve 99.99% of the world population. Fourth, any information one can gain from a stub, one can also get from reading Goole translation on English Wikipedia. Stubs are not useful except for being a seeding event. Again, that job has been done by bots for many Wikipedias for more than 10 years. Sadly, with the current utility of Abstract Wikipedia, one can't help to feel that this is a wasteful venture. Money and efforts can be better spent elsewhere to get us closer to "the sum of all human knowledge". I don't know the solution myself, but this is unlikely the solution we've been looking for. Nguyentrongphu (talk) 22:30, 17 November 2022 (UTC)[reply]
@Nguyentrongphu Thanks, I'm glad the details were appreciated! A few responses/clarifications:
Re: stubs - The abstract articles will be able to go far beyond stubs. Long and highly detailed articles could be created, with enough sentences. And then when someone adds a new abstract-sentence to an abstract-article, it will immediately be available in all the languages if/when they have localized the elements in that sentence's structure. -- I.e. Following on from my example above: Most species stubs start off with absolutely minimal info (e.g. w:Abablemma bilineata), but if there was an abstracted sentence for "The [animal] has a wingspan of [x] to [y] mm." (taken from w:Aglais_io#Characteristics), they could then add it to the Abstract Wikipedia article for "Abablemma bilineata", and the numerical facts into Wikidata (via d:Property:P2050), and suddenly the articles in hundreds of languages are improved at once!
Re: bots - Bots are good at simple page creations, or adding content to rigid structures, but not so good at updating existing pages with new details in specific ways, because us messy and inconsistent humans have often edited the pages to change things around.
Re: machine-translation and Enwiki - The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias, which don't have machine-translation support. It also excludes monolingual speakers from contributing to a shared resource. And they have to know the English (etc) name for a thing in order to even find the English (etc) article. -- E.g. the article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades, with our current system. See for example this image.
Re: "good quality articles can't be composed of just simple sentences" - I agree it probably won't be "brilliant prose" (as Enwiki used to refer to the Featured Article system (w:WP:BrilliantProse)), but simple sentences can still contain any information, and that is vastly better than nothing.
I hope that helps to expand how you see it all, and resolve at least some of your concerns. :) Quiddity (WMF) (talk) 00:00, 18 November 2022 (UTC)[reply]
"It also provides a starting point for a manually-made local article, and so helps to overcome the 'fear of a blank page'" + "far beyond stubs" -> you're contradicting yourself. It can't be that far beyond stubs.
Abstract sentences can only work if all articles involved share similar basic structure. For example, species, village, asteroid or etc. For example, all species share some basic information structure, but things quickly diverge afterward (after the introduction). With this constraint in mind, it's impossible to go far beyond stubs (introduction level at best). It does sound like wishful thinking to me, which is not practical.
"Because us messy and inconsistent humans have often edited the pages to change things around" -> Abstract Wikipedia will also face this problem too. "Adding a new abstract-sentence to an abstract-article" -> what if someone has already added that manually or changed an article in some ways beforehand? It's impossible for machine to detect whether or not an abstract sentence (or a sentence with similar meaning) has been added since there are infinite different ways that someone else may have already changed an article. Plus, the adding location is also something of concern. If an article has been changed in some ways beforehand, how does machine know where to add? Adding in randomly will make the article incoherent.
Far beyond stubs + the fact that Abstract Wikipedia is only possible with simple sentences -> sounds like Abstract Wikipedia is trying to create Simple French Wikipedia, Simple German Wikipedia, Simple Chinese Wikipedia and etc (similar to Simple English Wikipedia). This is a bad idea. Nobody cares about Simple English Wikipedia; non-native English speakers don't even bother with it. This is an encyclopedia, not Dr Seuss books.
"And they have to know the English (etc) name for a thing in order to even find the English (etc) article" -> Google translation does come in handy in these situations (help them find out the English name). Again, Google translation supports enough languages to serve 99.99% (estimation) of the world population.
"The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias" -> we need more man power for this huge goal and task. Abstract Wikipedia is unlikely to solve this problem. Local knowledge is unlikely to fit the criteria to utilize abstract sentences. Local knowledge is not simply a species, village, asteroid or etc.
"The article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades" -> Google translation works so far. Local language version -> Google translation -> translate to a reader's native language. That's good enough to get the gist of an article.
I'm not talking about Featured Article system. I'm talking about this. It's impossible to even reach this level with Abstract Wikipedia. We need a human to do the work to actually achieve Good Article level.
"It also excludes monolingual speakers from contributing to a shared resource" -> this shared resource is heavily constrained by abstract sentences. The criteria to utilize abstract sentences is also quite limited. Also, each Wikipedia language needs someone (or some people) to maintain abstract sentences. Plus, building and maintaining abstract sentences requires a very intensive process (manually translating it is easier, much more efficient and sound better instead of just simple sentences). It won't make a big impact as one would hope for, not any more impact than articles created by bots. Even today, many Wikipedias still retain millions of articles created by bots as seeding events.
Abstract Wikipedia is useful only for the languages that are not supported by Google translation. Spending too much money, time + efforts to serve the 0.01% of the world population is not a good idea. I'm not saying to ignore them all together, but this is not a good, efficient solution. It ultimately comes down to benefit vs cost analysis that I mentioned earlier. There is no easy solution, but we (humanity) need to discuss a lot more and thoroughly to move forward.
P/S: this is a good scholarly debate, which is very stimulating and interesting! I like it! Nguyentrongphu (talk) 23:45, 18 November 2022 (UTC)[reply]

Link to the beta instance doesn't work[edit]

I've tried to connect to https://wikifunctions.beta.wmflabs.org/wiki/Wikifunctions:Main_Page but it doesn't work. Is there a new url? PAC2 (talk) 04:08, 25 July 2023 (UTC)[reply]

Hi @PAC2. It should be working fine, and currently loads for me. Perhaps it was a temporary problem? Please could you check again, and if you still see a problem, then let me know what specific error message it provides (or what the browser does and how long it takes). Please also share any other relevant details about your connection (e.g. if you use a VPN, or many browser-extensions that affect connections). Also, check one of the other beta-cluster sites, such as https://wikidata.beta.wmflabs.org/, to see if it's all of them or just one. Thanks! Quiddity (WMF) (talk) 19:59, 25 July 2023 (UTC)[reply]
It works again. Thanks for the feedback PAC2 (talk) 05:18, 26 July 2023 (UTC)[reply]

Questions/thoughts about components of WikiLambda[edit]

Hi all, first of all really excited for this project, I think this can have a lot of potential. I have some questions and thoughts that came up to me while implementing an executor for Rust, and I wanted to share them here. Here they are:

  • What is the role of the function orchestrator? What inputs does it take, and what does it output? I assume an API definition exists somewhere, but it would be better if explained in the README or somewhere on-wiki. (I'm also interested in efforts to rewrite the function orchestrator)
  • What is the input format and output format of an evaluator? This is for the evaluator components that read stdin and print to stdout, not the function evaluator API. Am I right in assuming the input should be like this?
  • What does the API of the function evaluator look like? Does it accept a Z7 object and needs to evaluate some values before proceeding or is it just responsible for dispatching evaluation based on programming language requested?
  • Since the function evaluator runs on Node.js, it would be nice if evaluator impls for languages are allowed to compile to wasm and return the value returned by wasm (i don't think wasm has introspection abilities, so it would have to be a self-contained binary that handles serialization of return values itself)

0xDeadbeef (talk) 06:18, 29 July 2023 (UTC)[reply]

What is the role of the function orchestrator? You can find an overview of the WikiLambda architecture described here, with more detail on the architecture page. The main goal of separating the orchestrator from the evaluator is: the evaluator runs user provided code and will run in a very restricted setting, whereas the orchestrator is under developer control and can do things like calling out to Websites such as Wikidata. If you are implementing your own version, there is no need to follow that architecture, you can also have just one system that doesn't split (in fact my previous implementation of a similar system was monolithic like that). -- DVrandecic (WMF) (talk) 20:28, 8 August 2023 (UTC)[reply]
WASM instead of Node. Yes, I agree, that is a very interesting path, and it is one of our potential ways forward. We are experimenting right now. If you have thoughts or experiences, they would be very welcome. Some early thoughts are on T308250, I want to write up my own experiments about this there too. -- DVrandecic (WMF) (talk) 20:31, 8 August 2023 (UTC)[reply]

Questions/thoughts on the function model[edit]

  • It looks like Z881/typed lists are currently implemented as nested tuples (or in RustSpeak, type List<T> = Option<(T, List<T>)>), according to this test. There seems to be only a built-in impl. Are there plans to make impls based on sum-types and composition of type constructors? Specifically, an Optional/Maybe monad seems helpful.
  • What is the situation for optional fields? It looks like Z14/implementations can have omitted fields. Are there any plans to formalize the model so that fields are required? (those fields can use the optional monad)
  • is it okay for my executor to reject Z881/typed lists where elements are not of the same type? I saw test cases for the javascript and python executor that pass parameters like this. It seems beneficial to enforce elements are of the same type for a List like this, for better correspondence with more strongly typed programming languages.
  • from that point, since types are first class values, we could have type constructors that take a list of types as input. This could help building n-ary tuples for example.
  • Why is Z883/typed map implemented as an inline Z4/type when passed to the function evaluator? Both Z881/typed list, Z882/typed pair are implemented as Z7/function calls. It would be cleaner to parse if Z883/typed map is also a Z7/function call.

0xDeadbeef (talk) 06:18, 29 July 2023 (UTC)[reply]

"Are there plans to make impls based on sum-types and composition of type constructors?"
yes, eventually, that is the plan to have something like Option or Maybe or Either. It might already partially work. You can try things out on the Beta. Agreed that this would be helpful in several places, just for now, we don't promise yet that it works. See also phab:T287608. Thoughts are appreciated.
"Are there any plans to formalize the model so that fields are required?"
yes. Currently, on user-defined types, all keys are required (for the internal types, we cheat a bit). We want to somehow allow this declaratively. There are a number of possibilities to make it so, e.g. by having an either, or a maybe, or something like that. See also phab:T282062. We need to fix this before we can start using types with optional keys (or functions with optional arguments). The same here, we have not yet really landed on a solution, and we are looking for a good solution that ideally vibes well with out current implementation. --DVrandecic (WMF) (talk) 21:56, 29 September 2023 (UTC)[reply]
"is it okay for my executor to reject Z881/typed lists where elements are not of the same type?"
almost. If the typed list says "typed list of Z1/Object", then this means that the elements can be anything, i.e. it is not typed at all. So ["hello", "this", "is", Z41/true] is OK if it is a typed list of Z1. For a typed list of any other type, e.g. a typed list of Z6, yes, in that case all elements have to be instances of the given type, e.g. Z6. I don't think typed lists of Z1 will be easily serializable and deserializable into native code.
"from that point, since types are first class values, we could have type constructors that take a list of types as input. This could help building n-ary tuples for example."
yes, that should eventually be possible, even though I feel squeamish about it. Currently, generic types are not well supported in the front end though, so it will not work right now. But eventually it should.
"Why is Z883/typed map implemented as an inline Z4/type when passed to the function evaluator? Both Z881/typed list, Z882/typed pair are implemented as Z7/function calls. It would be cleaner to parse if Z883/typed map is also a Z7/function call."
good question, I need to play around with that part myself a bit. I will for now refer to @CMassaro (WMF). --DVrandecic (WMF) (talk) 21:56, 29 September 2023 (UTC)[reply]
Hmm, can you expand on the question? The evaluator accepts either Z4 or Z7 for inputs of any of these types. If it produces one of these types as output, it always makes the type a Z7. Have you found a case where this isn't true? Or is this related to what the orchestrator passes to the evaluator? CMassaro (WMF) (talk) 18:42, 2 October 2023 (UTC)[reply]

Is this still a thing? It's been tagged as a draft since January 2021. * Pppery * it has begun 03:54, 31 July 2023 (UTC)[reply]

Ditto with Abstract Wikipedia/Smoke tests - are these drafts going to be finished some day? * Pppery * it has begun 03:55, 31 July 2023 (UTC)[reply]
I'm looking into this. Sorry for the delay in replying, and clarifying those pages. Quiddity (WMF) (talk) 21:56, 9 August 2023 (UTC)[reply]

Login[edit]

Hi, may I suggest supporting Wikipedia login in wikifunctions.org, please. Fgnievinski (talk) 16:50, 31 July 2023 (UTC)[reply]

Wikipedia login worked for me there (and I have wikifunctions in Special:CentralAuth/Lockal now). Lockal (talk) 17:51, 31 July 2023 (UTC)[reply]
You're right, it worked for me, too. I just didn't realize the login credentials would be the same. I've opened a ticket at phabricator suggesting mentioning "global login" in the login page: https://phabricator.wikimedia.org/T343153 Fgnievinski (talk) 18:40, 31 July 2023 (UTC)[reply]

Clarification required: what is the meaning of "abstract"[edit]

What is the intended meaning of "abstract" in "Abstract Wikipedia"? Is it the "abstract" as in "abstract art", or does it mean "summary" like in scientific publications? MilkyDefer 08:02, 9 August 2023 (UTC)[reply]

You can read the definition in the glossary. --Ameisenigel (talk) 16:23, 10 August 2023 (UTC)[reply]

wiktionary forms[edit]

hi do wikifunction have plan to creat wiktionary forms to each wiktionary example en wiktionary Amirh123 (talk) 18:10, 11 September 2023 (UTC)[reply]

Hi. The plan is for functions to be able to be used within all other Wikimedia wikis (see the related FAQ entry and the related documentation). It will be up to the related communities to determine where and how functions are used. Quiddity (WMF) (talk) 20:44, 14 September 2023 (UTC)[reply]

Outdated?[edit]

It seems to me that a content page entitled "Abstract Wikipedia" should include the current state of the project. The page describes the project schedule, but does not describe what been done and what remains to be done. For example, did the "Abstract Wikipedia part of the project ... start in roughly 2022"? Did the Wikifunctions Beta launch in 2022? Finell (talk) 04:17, 3 October 2023 (UTC)[reply]

@Finell Thanks for reporting this, and sorry for taking this long to answer. Yes, the main page needs to be updated, and we will do it soon. We just have been caught in a lot of work behind the scenes since the launch of the project. Sannita (WMF) (talk) 15:17, 8 February 2024 (UTC)[reply]

Suggestions[edit]

I have some criticisms. In this article, these examples are given:

subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, English)

English : Wikipedias are encyclopedias.

subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, German)

German : Wikipedien sind Enzyklopädien.

Subclassification(Wikipedia, Encyclopedia)

English : Wikipedias are encyclopedias.

German : Wikipedien sind Enzyklopädien.

However, what bothers me is the claim that this is language-neutral, when it's so obviously Anglocentric. If it were German-centred instead, for example, the code would look like this instead:

Zeichenfolge_für_die_Unterklassifizierung_von_n_n_Sprache(n_Wikipedia, n_Enzyklopädie, Englisch)

Zeichenfolge_für_die_Unterklassifizierung_von_n_n_Sprache(n_Wikipedia, n_Enzyklopädie, Deutsch)

Unterklassifikation(Wikipedia, Enzyklopädie) (thanks, google translate!)

So, what's the point of lying? Also, I think that the functions should be stratified (to avoid nasty things like self-referential paradoxes), like predicate logic or set theory. A predicate is a special case of a function whose output is T or F. Logical operators are similar in this regard, except that the input is also T or F.

If we're mirroring predicate logic, then we could instead have:

Encyclopedia(Wikipedia) or Enzyklopädie(Wikipedia)

Similarly, the set {x|Encyclopedia(x)} (or {x|Enzyklopädie(x)}) can be defined.

Quantifiers should also be included, so that one could state that for example, all humans are mortal. Now, I have an idea for stratification (if you dislike it, feel free to modify it- this is purely illustrative. If I'm unclear, feel free to ask for clarification- I'm $h!t at explaining things :))

0th order: Objects

1st order: Functions (incl. predicates) between objects (we'll state that these functions quantify over objects, in analogy to predicates in predicate logic), statements that only quantify over objects, and sets defined by said functions. All of these collectively will be referred to as first-order objects.

2nd order: Functions between first-order objects (but no higher), statements quantifying over first order objects (but no higher), and sets defined by said functions. All of these collectively will be referred to as second-order objects.

3rd order: Functions between second-order objects (but no higher), statements quantifying over second-order objects (but no higher), and sets defined by said functions. All of these collectively will be referred to as third-order objects.

etc.

Transfinite induction is obvious:

ω-order: Functions between nth-order objects (for n<ω), statements quantifying over nth-order objects (for n<ω), and sets defined by said functions. All of these collectively will be referred to as ω-order objects. (this can be applied to other limit ordinals too)

All functions, statements, and sets will be assigned the lowest ordinals consistent with the definitions provided Username142857 (talk) 17:34, 2 January 2024 (UTC)[reply]