Jump to content

Abstract Wikipedia/Updates/Office hours 2021-12-20

From Meta, a Wikimedia project coordination wiki


(IRC or Telegram usernames)

  • Hogü-456
  • James_F
  • Jan Ainali
  • Jon Harald Søby
  • Lucas Werkmeister
  • Nikki
  • nox
  • quiddity
  • Sannita
  • vrandecic


<quiddity> Hello and welcome to the next Wikifunctions and Abstract Wikipedia office hour!
<quiddity> We are here to let you know what we are doing, where we are at, to answer any questions, and to start or have discussions on various topics.
<Sannita> o/ hi all
<vrandecic> Hi, let me here who's there for the office hour!
<vrandecic> Hi Sannita!
<vrandecic> Start saying hi, as I send out our summary since the last office hour
<Jon Harald Søby> 👋 (half-way, kids are about)
<Jan Ainali> o/
<vrandecic> In the last few months, we have developed all the different evaluation modes, as planned: running built-ins, running 'native' user-written code, and running compositions of other functions. The prototype was fun to play with and explore, and provided a glimpse into what Wikifunctions could become, some day soon.
<vrandecic> Our development has not been on the timeline we originally hoped for - but that timeline was assuming that we would have more personpower from the beginning. We only recently achieved that planned size. Julia and David joined recently as engineers, and Mariya as a technical program manager. We hope to increase our development velocity accordingly.
<vrandecic> Phase Eta is taking longer than I hoped, which is mostly due to my optimistic assumptions. We are changing the whole data model on the fly, and need to do the dance between changing the front end and back end, and most of the time nothing seems to work - but it slowly dances towards the function model being in place everywhere.
<quiddity> (Phases: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases )
<vrandecic> Aishwarya has worked very hard on the designs and is now handing over the first designs to be implemented for launch. The designs look beautiful and the tests of them with experienced and new users were very encouraging.
<vrandecic> We still won’t make a prediction when we will launch. We really hope rather sooner than later, but we need to finish what we need to finish.
<vrandecic> Licensing: the last month saw the licensing discussion. The majority of voices seems to be reflected by the following decision:
<vrandecic> - All contributions to Wikifunctions and the wider Abstract Wikipedia projects will be published under free licenses.
<vrandecic> - Textual content on Wikifunctions will be published under CC BY-SA 3.0.
<vrandecic> - Function signatures and other structured content on Wikifunctions will be published under CC 0.
<vrandecic> - Code implementations in Wikifunctions will be published under the Apache 2 license.
<vrandecic> - Abstract Content for Abstract Wikipedia will be published under CC BY-SA 3.0.
<vrandecic> I have seen in your reactions over the last few days that this decision makes some of you unhappy. If you have suggestions on how to improve the decision, I am happy to listen to now.
<vrandecic> That's for my summary of where we are.
<vrandecic> If there are questions, points you would like to discuss, things we can clarify, ideas, we are looking, now is the time! :)
<Hogü-456> I have read something in the last days that if a Function Library licensed under the GPL license is used the result must also be licensed under that license. So do you plan to support more programming languages in the future. I am interested in R and it is licensed under the GPL-License and now not sure if that is then possible to implement. Maybe also I understand it wrong.
<Nikki> I'm not sure what we would be able to suggest that wouldn't involve changing the decision
<nox> Hello, I am JS, just someone curious about the project. So the core team is 4 people? How many people are working full time on the project?
<vrandecic> Hogü-456: is R itself under the GPL license, or is an open source library to be included under GPL?
<vrandecic> Because GCC, e.g. is a GPLed code, but that doesn't mean that code compiled with GCC must be GPL.
<Hogü-456> R itself is under the GPL license.
<quiddity> @nox the Foundation team is listed at https://www.mediawiki.org/wiki/Abstract_Wikipedia_team - but we also have a number of volunteer developers contributing in many areas, from tools to the extension to discussions.
<vrandecic> Right, the GPL license of R should have no effect on written R code.
<Jan Ainali> I am mostly confused about why the argument that output cannot get copyright was left out from the summary
<vrandecic> Nikki: good question. I am not sure myself - the discussion was on the talk page, and the summary summarizes parts of that. I understand that folks are unhappy with it, but I am not sure what we could do differently at this point. I am looking for ideas that might bridge the gap here.
<quiddity> Nikki, I keep thinking about how these overall topics encompass *all* aspects (and more!) of this infographic I made a few years ago: https://commons.wikimedia.org/wiki/File:The_source_of_many_disagreements.png - hence there are well-reasoned points about all options.
<vrandecic> Jan: I think it's because there is disagreement whether that is the case or not
<vrandecic> Nox: We have five full time developers, plus a designer, part-time contractors, and other support.
<Jan Ainali> Well if it's invalid, I would be thrilled to see some arguments around that from someone that understands Abstract Wikipedia
<Jan Ainali> Clearly, my examples in the discussion were not understood by everyone I discussed with
<vrandecic> The third point in the pro section for CC0 in the summary, doesn't it talk about that? https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-12-16
<vrandecic> "it was said that a lot of the Abstract Content probably would not pass the threshold required to be copyrightable..."
<Jan Ainali> Yeah, I think abstract content might be copyrightable, but not the output.
<Jan Ainali> Am I confusing the terminology?
<vrandecic> No, not at all
<vrandecic> But that ties with a different point: if the input is copyrighted, and the transformation is comparably lossless, then the output of the transformation needs to follow that copyright too
<Sannita> +1
<Jan Ainali> Well if the it through transformation loses the threshold of originality then that statement is not true
<vrandecic> E.g. if you take a picture with copyright, and run a filter on it, that is not too lossy, the output would have some copyright from the original picture
<vrandecic> Yes, if the transformation is too lossy that would be the case. And I think each individual sentence meets the treshold for copyright not as frequently.
<vrandecic> But the composition of sentences, even if the individual sentences are likely not copyrighted, the composition can have copyright again
<Jan Ainali> Sure, but if the original is not raw content but an algorithm, then what is rendered is not the form of the algorithm but a transformation of the idea of it. And ideas cannot be copyrighted (https://copyrightservice.co.uk/services/knowledge-base/kb_idea), only an expression of it
<vrandecic> The selection, ordering, and weighting of ideas expressed in a text though can be copyrighted
<vrandecic> I agree that this is not legally firm ground
<Sannita> I think this is one of those cases that usually involves a court, several lawyers, and some years to be solved
<vrandecic> There are not many cases establishing precedence
<Jan Ainali> In some special cases it migth be true. But in many cases it will not be. And for those cases it will be copyfraud to claim copyright.
<Hogü-456> I am interested in using the functions from Wikifunctions and also later the possibility to generate Abtract content locally on my computer. Do you plan to support that at a later point. I for myself try to be careful before entering data at an external page and I think that there are also other people who are not interested in using external services, when it is possible to have it locally.
<vrandecic> Jan: That's not my understanding of what copyfraud is
<James_F> Hogü-456: I'll try to answer if I can.
<Jan Ainali> It's the first point here: https://en.wikipedia.org/wiki/Copyfraud#Definition
<James_F> Hogü-456: Yes, it'll be theoretically possible for the system to support federated content and querying, but we've not built or planned out any of that yet and I don't think it'd be easy to get to it very soon.
<James_F> Hogü-456: We're going to launch with the wiki which will let you call the functions, so you could 'locally' call the API to get the results, and if you copied the content locally and ran the service and wiki locally you could run the entire thing internally.
<vrandecic> Jan: I was typing: "We have plenty of content in Wikipedia that comes from PD sources. And is published under CC BY-SA. Such as the Gettysburg address."
<vrandecic> Jan: but I see your link and need to rethink
That is good to know and I have time and hopefully it will be somewhen possible.
I think the important part is volume. Do we think that most content will be generating output through clever functions or will it mostly be "hand written" articles?
<James_F> Hogü-456: For situations where external access is impossible (e.g. you're a robot on Mars or a secure air-gapped system) that would be possible but a lot of work. Where it's just not wanted (e.g. trying to reduce external contact in your 'DMZ' servers) we could probably provide a batching/proxying system that could off-load the queries for common functions and run locally (a local orchestrator/evaluator pair) with much less work.
<vrandecic> Jan: I am hoping for "hand written" articles
<Jan Ainali> I was assuming the bulk (like 99,99% of all species) will be generated through clever  functions
<vrandecic> If it's just a function call or two, that's not copyrightable
<vrandecic> I think we are getting to the core
<Jan Ainali> And for the first ten years, I assume that will also go for >95% of the humans and geographical places
<vrandecic> If the function implementation is in Apache (which doesn't matter in this case actually), CC 0 on the data used coming from Wikidata, then a single function call for a species, that's not the output I am worried about
<vrandecic> but that's exactly what you are worried about!
<vrandecic> So, the result of that, indeed, we shouldn't make a claim of copyright on that.
<vrandecic> And we won't make any claim of copyright on the result of calling a function in general.
<vrandecic> (I am thinking out loud here, and I hope legal is not reading along, but what I am saying might be wrong)
<vrandecic> We are incorporating uncopyrighted content in Wikipedia already
<vrandecic> we might have the Gettysburg Address, or we might have text from an out-of-copyright Encyclopedia Britannica
<vrandecic> maybe all we need to do is just not make a claim on the license of the output of abstract content
<vrandecic> we say, abstract content itself is CC BY-SA
<vrandecic> ok
<vrandecic> we have these functions here
<vrandecic> you can run them, or we can run them for you
<Jan Ainali> I think this might be the right approach
<Sannita> I have a question about Phase Eta, mostly curious: you said you're revisiting the data model on the fly, was that planned or... ?
<Jan Ainali> Perfectly fine!
<Hogü-456> I think if you want to make it easier for people to use functions and through that enable more people to create programs from my point of view it is important to offer also knowledge about User Interfaces and how to create them.  In WikidataQueryService there are code snippets available how to embed a query. Do you think it is possible to create something like that for Wikifunctions or a collection of snippets of parts of UserInterfaces in several programming languages.
<vrandecic> OK, Jan - here, let's drop from the decision the point of licensing the output. We will have next year the discussion of the location for the Abstract Content, and then we will need to make this decision, but we can keep it out now.
<vrandecic> Sannita: no, we always planned to change the data model at this phase. What is happening on the fly is the developmental work to switch from one data model to the other, and which - expectedly - breaks stuff constantly.
<Jan Ainali> Good, then we also have some more time for research and philosophical thoughts
<James_F> Hogü-456: Yes, we'll definitely have "embed this answer" buttons at some point.
<vrandecic> Nikki: I think Jan demonstrated what can be done in the office hour and how things can still change
<James_F> Hogü-456: Initially our main use case will be for Wikipedia articles and other Wikimedia wikis, where you'll be able to call `{{#function:Z123456|Hello|world!}}` or similar.
<James_F> Hogü-456: But yes, adding snippets for different languages to embed calls into someone's WordPress blog or Android app or Windows module would definitely be a good idea.
<James_F> (We'd need to warn about API terms and conditions and load and so on, but that's a common issue with our other APIs too.)
<vrandecic> Nikki, since I have you here - I am surprised that you'd be unhappy with Apache. My understanding of the license space is that it actually gives you, as a reuser, more guarantuees and peace of mind about using code under Apache than it is when using CC0
<vrandecic> My understanding is that Apache implies the right to use any patents that the publisher of the code might have regarding that particular code, whereas CC0 does not such thing
<vrandecic> (Again, I might be completely off)
<quiddity> (note: 5 mins left in the hour. Then some of us will need to attend other meetings. But as always, please continue these discussions!)
<vrandecic> And that you can use Apache licensed code snippets anywhere in your code just as you could with CC0 licensed snippets
<vrandecic> Thanks to everyone!
<vrandecic> Thanks for the questions, and the discussion!
<Nikki> my problem with licensing is that I hate having to deal with making sure I'm meeting whatever requirements a license has the apache one seems to have requirements about including a copy of the license and adding prominent notices about having changed the files and there's some long paragraph about "NOTICE" files and that's not even everything
<vrandecic> Nikki: what about this idea: we write some text that describes in as simple as possible terms what needs to be done to follow the license when reusing code
<vrandecic> because I am totally with you, I want to make it as simple as possible for the code to be used
<vrandecic> As painless as possible
<quiddity> We are at time for the hour. Please do keep discussing though! The team just won't be as immediately responsive. We will do another one of these in the new year. Thanks to everyone for participating, as always.

<vrandecic> One of the points of a function repository is that it should be easy and painless for a developer to take a function and reuse it
<Nikki> that would help, yes, but I presume it would still stop me from being able to reuse the code without changing the license of my own code
<vrandecic> No, I don't think that is the case
<vrandecic> Apache is not viral, i.e. you can use code under Apache in a code base published under a different license
<vrandecic> (Unlike say the GPL, which indeed would require that under certain circumstances)
<vrandecic> There might be corner cases where a license does not allow Apache (it was raised that GPL2 I think was incompatible with Apache2, but GPL3 is not), but that's due to the GPL, to the best of my understanding. But again, cases are rare in this space.
<vrandecic> (And most of us here are not lawyers)
<vrandecic> So, I'll get some text that our actual lawyers are happy with and that we will publish about "what do I need to do in order to reuse code from Wikifunctions" that is short and snappy
<vrandecic> That should help everyone
<Nikki> my understanding was that you can't include apache code in a cc0 (or unlicense, 0bsd, mit-0 or similar license that doesn't require attribution) project because that would remove the restrictions that the apache license originally applied
<vrandecic> IANAL, but I think you can take code from an Apache2 codebase and add it to your MIT or CC0 codebase. That individual file, or that individual codesnippet, would not be relicensed under MIT or CC0, true, but you and no one else would have restrictions in shipping, compiling, or using the resulting codebase, and the license of the snippet would have no effect on the license of the surrounding code base and vice versa..
<Nikki> if that's true then licensing is even more complicated than I thought 😶
<Nikki> anyway, I guess I'll have to wait and see what the lawyer-approved instructions say (and if actual lawyers could clarify whether what you said is true or not, that would be good, since I'm surely not the only person who'll want to know)
<Nikki> (but I still think it's silly to place any sort of requirements on code that's designed to be reused, regardless of how easy the instructions are)
<vrandecic> Thanks. I'll do that. It will take a while to get the answer, but I will take up this task.
<vrandecic> (As said above, the main advantage of Apache over CC0 is that Apache gives you patent security, whereas CC0 does not - it gives you effectively more rights. That's my limited understanding. I, too, am unhappy that all of this is so complicated)
<Nikki> I don't even understand what the patent stuff is about 😕 and yeah, how I feel about copyright in general can be summarised as "*runs away screaming*" 😅
<vrandecic> if you want I can try to summarize my own minimal understanding of the patent stuff, but only if you want. I don't want to take up even more of your time.
<Nikki> oh, please do!
<vrandecic> patent rights are a neighboring right to copyright (similar to say trademarks, database rights, etc) to protect intellectual property. They can cover an idea (which copyright can not) for a limited amount of time.
<vrandecic> so a specific code text is covered by copyright anyway, but in addition to it the idea implemented in the code can also be covered by a patent.
<vrandecic> most licenses only deal with copyright, and simply ignore patents
<vrandecic> apache takes the patent rights into account explicitly, and says "well, if i hold patents, but publish this under apache, then the reuser of my code is also licensed to use my patents that are relevant to this code"
<Nikki> I see... so it's not any better for me for anything I write myself since I don't have any patents, but when reusing someone else's code, there's a slim chance that it could be written by some evil person who will let me use the code but then go after me for infringing on their patents by using it?
<vrandecic> yes. it is more interesting for code coming from companies
<vrandecic> because they often have large patent portfolios
<Lucas Werkmeister> well, as I understand it, you could still have patent rights on any ideas that you incorporated into your code, even if you haven’t registered patents yet?
<vrandecic> maybe. i am unsure that once it is public that you can apply a patent retroactively.
<Nikki> but since, even if I did, I would have no interest in using them, it would still not be any better for me in any useful way

(Arbitrary cut-off, 2 hours after beginning. Discussion continues.)