Talk:Abstract Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Sub-pages[edit]

Spreadsheets functions[edit]

Hello,

I often use Spreadsheets and I like Wikifunctions. From my point of view Spreadsheet functions syntax is language independent. The most syntax of the functions there like IF or MID are translated into many languages and so user can edit the functions in their language. From my point of view this is a chance for Wikifunctions to get more contributions after I think that editing Spreadsheet functions is something what can be done by much more people than more complex programming taks. If I understand the function model and am able to write functions in Wikifunctions I can help to create something what helps to bring spreadsheets functions into a Wikifunction. This is something I say after I tried to convert a Spreadsheet function into a programm in R. After that works I think it can also work in other programming languages. I am not good in Programming and so I think I can only create a tool for converting a Spreadsheet function with the skills I have at the moment but not a gadget or something like that what is more integrated in Mediawiki. Please tell me what you think about that. --Hogü-456 (talk) 21:09, 1 February 2021 (UTC)

@Hogü-456: Thanks for your comment. I am also rather excited about the idea of bring spreadsheets and Wikifunctions close together. But I am not sure what you are suggesting:
  1. it should be possible for someone who has the skills to write a formula into a spreadsheet to contribute functions to Wikifunctions
  2. it should be possible for a spreadsheet user to use functions from Wikifunctions
I agree with both, but I wanted to make sure I read your comment right. --DVrandecic (WMF) (talk) 23:20, 1 February 2021 (UTC)
I suggest the first thing and the second thing is also interesting. I dont know if there is a equivalent in Spreadsheets for all the arguments in Wikifunctions. So I think it is possible for the most things but not for all. --Hogü-456 (talk) 18:53, 2 February 2021 (UTC)
I agree both are interesting. There is also a third possibility, which is having a spreadsheet implementation of a Wikifunctions function. In spreadsheets, arguments are ranges, arrays, formulas or constants, and different spreadsheet functions have different expectations of the types of argument they work with. For example, in Google Sheets, you can SUM a 2-dimensional range but you can't CONCATENATE it. In Wikifunctions, any such limitations would be a result of the argument's type. In principle, a Wikifunctions function could concatenate a 2D or 3D array, or one with as many dimensions we care to define support for. GrounderUK (talk) 20:01, 2 February 2021 (UTC)
Interesting points. We will definitely take a very close look at spreadsheets once the UX person has joined us, in order to learn from how spreadsheets does allow many people to program. I think that the mix of guidance by the data and autocomplete is extremely helpful, and we can probably do something akin to that: if we know what the function contributor is trying to get to, and know what they have available, we should have pretty good constraints on what could be done. --DVrandecic (WMF) (talk) 00:38, 4 February 2021 (UTC)
Any idea when the UX person will be joining?--GrounderUK (talk) 08:08, 3 March 2021 (UTC)
The process is ongoing, we'll announce it through the weeklies when we know it. --DVrandecic (WMF) (talk) 01:30, 4 March 2021 (UTC)
I like the concept of using Wikifunctions with an extension for Speadsheets (but not only: lot of programming languages could have such "pure functions" integrated while being executed offsite, over the cloud, and not not just over Wikifunctions, which would just be the place where functions are found as a repository, described, then located with the evaluator running on Wikimedia servers... or elsewhere, such as Amazon Lambda, Azure functions, IBM cloud, or other public clouds,n or private clouds like nextcould. For this to work, I think that Wikifunctions should use a common web API that allows chosing the location where evaluators will run (including for the application using these functions, or the OS running that application, or the user profile using that application, the possibility to set preferences; a user could have in his profile a set of prefered locations for the evaluators). In summary what we need is: (1) a repository with lookup facility, that repository should describe the functions in the same computer language via its interface, that API will be able to collect sets of locations where these functions can be evaluate; (2) locating a function description and locating an evaluator for it requires resolvers (based on URNs); (3) searching a function is more or like like a classic search engine, except that it searches in structured metadata, i.e. special documents containing sets of properties, or a database, just like what Wikidata is already; the same occurs with other common repositories for software updates on Linux with APT, DNF, YUM or EMERGE, or for programming environments like CPAN or NODE.JS, or LUAROCKS and also in GitHub with a part of its services or with various connectors (Wikifunctions could as well provide a connectror for GitHub...)
So we should separate the repository of function metadata (i.e. Wikifunctions, containing lot of info and other services like talk pages and presentations, logos, Wikimedia user groups via subscribed Wikiprojects...) from the repository of locations/sites where a function can be executed (either because it is installed locally, or because the site offers the way to accept a task to be uploaded there and registered in a cache for later reuse), in order for a client (like Wikifunction itself or any application using Wikifunctions) to be able to accept jobs to be executed on that site to produce a result (of course this requires security: connecting a new evaluation site has constraints, including legal ones for privacy: calling a function with parameters will easily allow the evaluator to collect private data); as well we must avoid the denial of service (so that the repository of evaluator will detect sites that are not responding to any evaluation request, or just return corrupted or fake answers: this requires evaluation of trust and reliability, just like in any distributed computing environment (this is not so much a deal if evaluation sites cannot be freely be added but are added by admins of the locator server for all possible evaluators.
In summary, separate the description of functions (in its metadata language) from its evaluation (except for a few builtin functions that are resolved locally and possibly runnign on the same host or only on a specific small set of hosts with secure connections, and accepting to delegate execution only to the same final client of the function, if that client wants to run the function himself with his own resources, for example for debugging inside his web browser, with Javascript or WebAsm). verdy_p (talk) 01:13, 5 March 2021 (UTC)

There are a few ideas like this Wiki Spreadsheet design, which depend on functions being defined in some shared namespace on a project, or globally. As a result, spreadsheet functions may be a good class to expand to, in addition to those useful on Wikipedia pages. –SJ talk  21:09, 20 April 2021 (UTC)

@Sj: I've merged your comment up into this existing thread, I hope that's ok. :)
I'm also reminded of Wikipedia and Wikidata Tools (see TLDR in github), which seems closely related. Quiddity (WMF) (talk) 03:52, 21 April 2021 (UTC)
@Quiddity: more than ok! Thanks for catching that. Spreadsheets and wikicalc remain the future... and a necessary building block for even more interesting transparent dynamic refactoring. –SJ talk  18:10, 24 April 2021 (UTC)

License[edit]

Hello, I don't see much communication about the essential point of license. Since it's a direct continuation of the Wikidata agenda, is it correct that it will impose CC-0 everywhere it will be integrated within the Wikimedia environment? A clear yes/no would be appreciated. --Psychoslave (talk) 22:57, 5 March 2021 (UTC)

All appologies, I didn't saw that this talk page had archives and thus already had this point discussed in Talk:Abstract_Wikipedia/Archive_1#License_for_the_code_and_for_the_output and Talk:Abstract_Wikipedia/Archive_2#Google's_involvement. So it looks like I have some homework to do before possibly further comment. At first glance the answer seems to be "it's not decided yet". --Psychoslave (talk) 23:06, 5 March 2021 (UTC)


Follow up on the earlier topic « License for the code and for the output »

Given my understanding of this, which I'm not sure is correct at all, then the generated text can't be copyrighted, but the data and code to generate the text can be protected. — @Jeblad:

I this this is an analogous problem of the compilation problem : the binary usually does not inherit from the compiler licence, the result depends on the source code licence, see. the FSF FAQ on the GPL licence : https://www.gnu.org/licenses/gpl-faq.en.html#GPLOutput More generally, when a program translates its input into some other form, the copyright status of the output inherits that of the input it was generated from. … it seem the licence of the result depends on the license of the source data, not the licence of the source code of the functions.

Which could be the lexicagraphical datas, and the abstract form of articles, I guess, but at some point the distinction of the code and the datas can somewhat be blurry …

Also @Denny, Deryck Chan, Psychoslave, DVrandecic (WMF), and Amire80: TomT0m (talk) 10:53, 7 March 2021 (UTC)

I agree with the principles laid out above and don't have any specific preference to give. Deryck C. 16:56, 7 March 2021 (UTC)
Thank you for pinging me TomT0m. Do you have any question on the topic, or specific point on which you would like some feedback?
For what it's worth, "I am not a lawyer but…", your exposure of the subject seems correct to me. I would simply add with a bit more granularity you might add that :
  • it depends on the terms of use of your toolchain. If you use some SaaS that basically say "all your data are belong to us" and between two cabalistic attorney incantation say that the end output will be placed "under exclusive ownership of Evil-Corp.™ and we may ask you to pay big money at any time for privileges like accessing it in some undefined future", the terms might well be valid. But to the best of my knowledge, there's no such a thing in the Wikimedia compiler toolchains, houra,
  • you can't legally take some work on which you don't have personal ownership, throw it through some large automaton chain like, and claim ownership on the result: you have to make a deal with the owner of each peace of work you took as input. All of them. Or none, if none of them pay attention or don't care about what you do. Until they do.
  • the world is always more complicated (even when you took into account that the wold is always more complicated): there is no such thing as "the copyright law". Each legislation as its myriad of laws and exceptions, which may apply on a more or less wide area, with a lot of other locale rules which may or not have preemption on the former, etc.
But once again, that's just more "focused details", the general exposure you made seems correct to me. --Psychoslave (talk) 16:24, 8 March 2021 (UTC)

Thanks for asking the question! Yes, that's still something we need to figure out. We are working together with Wikimedia's legal department on the options. A decision must be made before the launch, but it will still take as a while to get to something that's ready to be shown to the communities. Thank you for your patience! --DVrandecic (WMF) (talk) 17:54, 8 March 2021 (UTC)

Example Article?[edit]

There's a lot of talk about mathematical functions here; there's less talk in functions that produce sentences. Would there be interest in having an "example" article to show what Abstract Wikipedia might look like? Specifically:

  • Write a pseudo-code article on a high-profile topic. For example, for simple:Chicago, (Q1297 IS Q515 LOCATED IN Q1204. Q1297 IS Q131079 LIST POSITION [three])
  • Evaluate the functions manually to translate it to several languages (ideally including languages such as ht:Chicago with bad coverage, if we have sufficient language materials to do so).

Does this seem like a reasonable thing to do? power~enwiki (talk) 20:08, 11 March 2021 (UTC)

@Power~enwiki: Hi. Yes that sounds like a good idea. We can create a detailed example of at least one complex paragraph sometime in the next few weeks, which should be enough to extrapolate from. And then anyone could build out further examples for discussion and potential problem-solving.
For now, we do have a couple of example sentences from an early research paper in Abstract Wikipedia/Examples, and 2 very early mockups at Abstract Wikipedia/Early mockups#Succession (2 sub-sections there) which might help a little. Thanks for asking. Quiddity (WMF) (talk) 23:29, 11 March 2021 (UTC)

A few more questions for the crowd.

  1. Are we going to try to have each sentence be 1 line, or will these be multiple lines?
  2. Wikidata is great for nouns. For verbs, it doesn't seem as good. Many concepts ("to destroy", "to fix") don't have encyclopedia articles at all. Are these available in Wikidata somehow? power~enwiki (talk) 22:49, 14 March 2021 (UTC)
  3. When writing an article, should pronouns be used at all? It seems better to refer to a Q-item (or an ARTICLETOPIC macro), and to have the natural-language generation determine when to replace an explicit subject with a pronoun.

Thoughts? power~enwiki (talk) 22:49, 14 March 2021 (UTC)

@Power~enwiki I think d:Wikidata:Lexicographical data might help. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 10:34, 15 March 2021 (UTC)

@Power~enwiki: I’m not sure I understand your first question, but...

  1. I don’t think we should have limits on the length or complexity of sentences. For more complex sentences (like the previous one), a version using simpler sentences should be considered. (Long sentences are fine. Complex sentences are fine.)
  2. Further to what @1234qwer1234qwer4: said, Wikidata lexemes (may) have a specific property (P5137) linking a sense to a particular Wikidata item. Currently, destroy doesn’t, but be links to both existence ( Q468777) and being ( Q203872), for example.
  3. I agree that it would generally be better to use unambiguous references. In the translations (natural language renderings), I suggest we might use piped links for this. For example, “Some landmarks in the city are...”. We might also use, for example, <ref>[[:d:Q1297| of Chicago]]</ref> where there is ellipsis, as in “As of 2018, the population[1] is 2,705,994.”

References

--GrounderUK (talk) 14:55, 15 March 2021 (UTC)

Regarding "1-line" v "multi-line": My question is whether the syntax will look more like SQL or more like JSON (or possibly like LISP). I suppose any can be one-line or multi-line. Regarding verbs: Lexemes are language-specific; it's currently annoying to search Wikidata for them but I don't think they should be in the language-agnostic Abstract Wikipedia text anyhow. power~enwiki (talk) 03:26, 16 March 2021 (UTC)

@Power~enwiki: Thanks for clarifying. There’s a chicken-and-egg problem here. I don’t think @Quiddity (WMF): is intending to include me in his “we” and I have a open mind on the question of “pseudo-code”. I suggest we [inclusive and impersonal] should avoid something that looks like “labelized Wikifunctions”. This is because “we” should try to “translate” the pseudo-code into that format, as well as into natural languages (and/or Wikitext that looks very like the Wikitext one might find in the Wikipedia in that language). So the pseudo-code should be more like “our” language of thought (labelled “English”, for example) but using an unambiguous identifier for each semantic concept. This means we can have many “pseudo-codes”, one for each natural language we select. Such a pseudo-code should not look too much like a translation into that target language or its Wikitext, of course. Without endorsing its content, I refer interested parties to Metalingo (2003)--GrounderUK (talk) 12:22, 16 March 2021 (UTC)
  • I'd like to second the suggestion of a relatively fleshed out example. One paragraph would be fine, but I think it's important to also try rendering the same abstraction into a couple languages other than English (including at least one non-Indo-European language). As I said at Talk:Abstract Wikipedia/Examples, I think it's easy to fall into a trap of coming up with functions that accidentally have baked into them features that are peculiar to the English language, or which at least fail to generalize to some languages.
Another worthwhile exercise would be taking a couple of the functions used in the example, and coming up with several test cases for those functions, and then seeing whether, in principle, a single renderer for language X could produce cogent results for all of those test cases. For example, maybe there's a function possible(X). And so we come up with test cases like:
  1. possible(infinitive_action_at_location(see(Russia), Alaska))
  2. possible(exists(thing_at_location(Alien life, Mars))
  3. possible(action(US President, declare(state_of_war_against(China)))))
And it's straightforward (modulo some fine details) to imagine how a renderer for this function could be implemented in English. But there are languages that distinguish grammatically between the possibility that something is true (epistemic modality: either there's life on Mars or there isn't - we don't know which is true, but we have some reason to think there might be) and someone having the ability to do something (the President having the ability/power to declare war against another country, an arbitrary person being able to see Russia from Alaska). So in fact this is a bad abstraction, since a renderer in, say, Turkish needs to distinguish between these senses, and can't do so with the information given. Colin M (talk) 17:09, 18 April 2021 (UTC)

I started fleshing out the start of an article here: Jupiter. Happy if we continue to work on this and try to figure out if we can find common ground. It would be great to do some Wizard of Oz style test and try to render that by hand for some languages. The only non-PIE language I have a tiny grip on is Uzbek - I am sure we can find more competent speakers than me to try this out. --DVrandecic (WMF) (talk) 00:53, 20 April 2021 (UTC)

This is great. I would suggest linking to it from the main examples page for more visibility (though I understand if you want to do more work on it first). One suggestion I would have for a human-as-renderer test might be to do some replacement of lexemes with dummies in such a way that the renderer won't be influenced by their domain knowledge. e.g. instead of Superlative(subject=Jupiter, quality=large, class=planet, location constraint=Solar System), something like... Superlative(subject=Barack Obama, quality=large, class=tree, location constraint=France).
In this case, for the renderers to have a hope of getting the right answers, we would probably need to also give them some documentation of the intended semantics of some of these functions and their parameters. But I see that as a feature rather than a bug. Colin M (talk) 17:35, 20 April 2021 (UTC)
Yeah, I probably should link to it. I didn't because it is not complete yet, but then again, who knows when I will get to completing it? So, yeah, I will just link to it. Thanks for the suggestion. --DVrandecic (WMF) (talk) 00:10, 21 April 2021 (UTC)

Schemas[edit]

I am interested in Wikifunctions and I like the idea of Abstract Wikipedia. I created easy structured sentences with variable parts for some times in Spreadsheets. I think a chance of Abstract Wikipedia is that it could improve the data quality in Wikidata. I think if it is known what is important about a topic and someone miss something and understands how the text is generated, then maybe people enter the information into Wikidata. For that it is helpful to have clear structures and forms can help to enter information. At the moment this is not so easy. Have you thinked about how the content what is important about a topic can be defined. As far as I know there are shape expressions in Wikidata. --Hogü-456 (talk) 20:35, 22 March 2021 (UTC)

@Hogü-456: Yes, shape expressions are the way to go, as far as I understand the Wikidata plans. Here's the Wikidata project page on Schemas. My understanding is that Shex can be used both for creating forms as well as for checking data - I would really like to see more work on that. This could then also be used to make sure that certain conditions on an item are fulfilled, and thus we know that we can create a certain text. Fully agreed! --DVrandecic (WMF) (talk) 23:15, 19 April 2021 (UTC)

Info-Box (German)[edit]

Who takes care of the entries in the Info-Box? At least in the German Info-Box the items differ from the real translations. —The preceding unsigned comment was added by Wolfdietmann (talk) 09:32, 15 April 2021 (UTC)

@Wolfdietmann: Hi. In the navigation-box, there's an icon in the bottom corner that leads to the translation-interface (or here's a direct link). However, that page doesn't list any entries as being "untranslated" or "outdated". Perhaps you mean some of the entries are mis-translated, in which case please do help to correct them! (Past contributions are most easily seen via the history page for Template:Abstract_Wikipedia_navbox/de). I hope that helps. Quiddity (WMF) (talk) 18:33, 15 April 2021 (UTC)
@Quiddity (WMF): Thank´s a lot. Your hint was exactly what I needed.--Wolfdietmann (talk) 09:29, 16 April 2021 (UTC)

Effects of Abstract Text to the Job Market[edit]

Hello,

I have thinked about what could happen if abstract Text works good and is used in many contexts also out of the Wikimediaprojects. What does that mean for jobs. I asked me is Abstract Text a innovation that reduces the need of personnel for example for writing technical instructions. Changes are something what happen and they are not bad. I think it is important that there is a support for employees that are affected by that change and to make sure that they have another job after the change. From my point of view is Wikifunctions here a important part because it can help enable people to learn the skills that are good to know, to do other jobs. From my point of view programming is something that is interesting and can help in many parts. I suggest to create a page with some recommendations for potential users of Abstract text to make sure they are aware of the changes that can come through with that change for their employees and that they get the knowledge they need. What do you think about that. What is my responsibility as a volunteer who is interested in that project and plans to participate if it is live, until now I read through the most pages here and tried to understand how it works, to make sure that I do not support a increasing unemployment through optimiziation. --Hogü-456 (talk) 19:55, 17 April 2021 (UTC)

@Hogü-456: Thank you for this thoughtful comment. In a scenario where Abstract Wikipedia and abstract content are so successful to make a noticeable dent on the market for technical and other translators, it would also create enormous amounts of value for many people by expanding potential markets and by making more knowledge available to more people. In such a world, creating and maintaining (or translating natural language texts to) abstract content will become a new, very valuable skill, that will create new job opportunities. Imagine people who create a chemistry or economics text book in abstract content! How awesome would that be, and the potential that this could unlock!
I actually had an interview with the Tool Box Journal: A Computer Journal For Translation Professionals earlier this year (it is in issue 322, but I cannot find a link to that issue). The interesting part was that, unlike with machine learning tools for translation, the translator that interviewed me really found it interesting, because they, as the translator, could really control the content and the presentation, unlike with machine learned systems. He sounded rather eager and interested in what we are building, and not so much worried about it. I think he also thought that the opportunities - as outlined above - are so big, if all of this works at all. --DVrandecic (WMF) (talk) 21:06, 21 April 2021 (UTC)

Recent spam on https://annotation.wmcloud.org/[edit]

A few suspicious accounts have been created today, and started creating spam pages on the wiki (see the recent changes log). TomT0m (talk) 18:20, 18 April 2021 (UTC)

Thanks for the heads-up. Quiddity (WMF) (talk) 03:53, 21 April 2021 (UTC)

Regular inflection[edit]

Hello,

in the last weeks I tried to add the inflactions/Beugungen of german nouns that consist of more than one noun as Lexemes in Wikidata. At the overview about the phases I have seen that it is part of the second phase to make it possible to automatically create regular inflections. I created a template as a csv-File that helps me to extract the possible words out of a longer word. In the template I extract all possible combinations with a length beetween 3 and 10 characters. After that I check which combinations match with a download of the so far existing german lexems for nouns with their forms and there I then check if it is going to the end and for these word I extract then the part before the last word with the first character of the last word and match to that additional the existing forms from the Lexemes. For that I have a script in R and a spreadsheet. I want to work at a script for creating the forms of a noun starting with german at the Wikimedia Hackathon. Has someone created something similar or have you thinked about that also for other languages and how the rules are in that language.

At the Wikimedia Remote Hackathon 2021 I suggested a session what is a conversation about how to enable more people to learn coding also with a focus to the Wikimedia Projects and in the phabricator ticket for that I added a link to the mission statement of Wikifunctions after I think this is a project what has this as a goal to make functions accessable to more people and I have several ideas about functions that I think are helpful and I plan to publish them in Wikifunctions if I am able to write them. If you are interested you can attend at the Conversation.--Hogü-456 (talk) 21:26, 12 May 2021 (UTC)

@Hogü-456: Thanks! This is exactly the kind of functions I hope to see in Wikifunctions. Alas, it is still a bit too early for this year's Hackathon for us to participate properly, i.e. for Wikifunctions to be a target where such functions land. But any such work will be great preparation for what we want to be able to represent in Wikifunctions. We hope to get there later this year, definitely by next year's Hackathon.
If you were to write your code in Python or JavaScript, then Wikifunctions will be soon ready to accept and execute such functions. I also hope that we will cover R at some point, but currently there is no timeline for that.
If you are looking for another resource that has already implemented inflections for German, there seem to be some code packages that do so. The one that I look to for inspiration is usually Grammatical Framework. The German dictionary is here: http://www.grammaticalframework.org/~john/rgl-browser/#!german/DictGer - you can find more in through their Resource Grammar Library: http://www.grammaticalframework.org/~john/rgl-browser/
One way or the other, that's exciting, and I hope that we will be able to incorporate your work once Wikifunctions has reached the point where we can host these functions. Thank you! --DVrandecic (WMF) (talk) 00:43, 14 May 2021 (UTC)

Idea: Fact-id for each fact?[edit]

Can each fact statement get its own id? i.e., make a Wiki of Facts along with abstract Wikipedia.

  • Each fact should get its own fact-id, so that people can share the id to support claims made in discussions elsewhere. Similar to Z-id or Q-id of wikidata. This proposal requests fact-id for each statement or facts.
  • It will create a structured facts wiki, in which each page about a topic will list bulleted list of facts, with each facts having its own id. References are added to it to support the claims.
  • It presents Facts directly without hiding it in verbal prose. Cut to the chase!
  • Example 1: Each fact statement of a list like Abstract Wikipedia/Examples/Jupiter will get its own id. Example 2: A page will list facts like w:List of common misconceptions with each fact in it getting its own id.
  • This wiki will become the go-to site to learn, find, link, support and verify facts/statements/claims. And over time, Wiki of Facts will have more reliability and credibility than any other format of knowledge. (elaborated at WikiFacts) -Vis M (talk) 04:54, 27 May 2021 (UTC)