Abstract Wikipedia/Google.org Fellows evaluation - Answer
Authors: Cai Blanton, Cory Massaro, David Martin, Denny Vrandečić, Genoveva Galarza Heredero, James Forrester, Julia Kieserman, Stef Dunlap
Introduction
[edit]We are thankful for the thorough critique and evaluation of Wikifunctions and Abstract Wikipedia by the Google.org fellows. We understand that the document was written from a genuine desire to see the project of Abstract Wikipedia succeed and thrive, and from a strong belief in the importance of the goal of Abstract Wikipedia.
Wikidata went through a number of very public iterations, and faced literally years of criticism from Wikimedia communities and from academic researchers. The plan for Abstract Wikipedia had not faced the same level of public development and discussion. This is likely partly due to the success of Wikidata, but also because Abstract Wikipedia is considerably more complex than Wikidata.
The proposal for Abstract Wikipedia and Wikifunctions took years to be developed, and was considerably changed due to numerous critiques and suggestions from experts in their respective fields, researchers, and Wikimedia community members, and also based on prototype implementations and issues that were uncovered that way. But the process was often not as public as with Wikidata, due to much of the work happening in very different circumstances than it did with Wikidata.
Because of that, it is particularly welcome to have this thorough evaluation from highly skilled technical experts of the proposal. Barely anyone outside of the development team itself has dived into the Abstract Wikipedia and Wikifunctions proposal as deeply as the authors of this evaluation.
This also allows us to use this opportunity to improve the shared understanding of Wikifunctions and Abstract Wikipedia, and to explain and make explicit some design choices which have been implicit so far. We are thankful for that opportunity.
The reader should read the evaluation first before proceeding with our answer. We will follow the outline of the evaluation and answer the individual points.
Part I: Abstract Wikipedia
[edit]What is Wikifunctions, and what is it for?
[edit]This text about the purpose of the project is historically inaccurate, and leads to significant confusion.
The Foundation’s Board mandate they issued to us in May 2020 was to build the Wikifunctions new wiki platform (then provisionally called Wikilambda) and the Abstract Wikipedia project. This was based on the presentation given to them at that meeting (and pre-reading), and publicly documented on Meta. That documentation at the time very explicitly called out as “a new Wikimedia project that allows to create and maintain code” and that the contents would be “a catalog of all kind[s] of functions”, on top of which there would “also” (our emphasis) be code for supporting Abstract Wikipedia. This was also part of the project presented to the community and that gathered support for the project.
The evaluation document starts out from this claim – that Wikifunctions is incidental to Abstract Wikipedia, and a mere implementation detail. The idea that Wikifunctions will operate as a general platform was always part of the plan by the Abstract Wikipedia team.
This key point of divergence sets up much of the rest of this document for fallacies and false comparisons, as they are firmly rooted in, and indeed make a lot of sense within, the reality posed by this initial framing misconception.
Is Wikifunctions necessary for Abstract Wikipedia?
[edit]We would generally agree that a separable project for the platform on which to run code is not strictly necessary for an initial attempt at building Abstract Wikipedia; instead, Abstract Wikipedia could have been built as a functional extension of an existing project, like Wikidata.
At the current time, the Abstract Wikipedia team is mostly focused on the creation and launch of Wikifunctions (besides one of our engineers supporting volunteers in the NLG workstream), which is a goal in and of itself. When we are nearing the launch of Wikifunctions, we will begin allocating more dedicated time and resources to the design of Abstract Wikipedia. At that time, we will determine further decisions regarding the right technological base for our architecture, e.g. where the content of Abstract Wikipedia will be stored. Since that time is not yet upon us, we have not yet compared in detail the feasibility of being based on Wikidata versus Wikifunctions versus another option, because we have neither completed Wikifunctions nor can predict at what state other technologies may be at that point.
It would be quite natural to host constructors in Wikidata, rather than a separate platform
It seems like that would be another project on its own; a constructor would need its own data type in Wikidata. The Scribunto prototype doesn’t do this; it has an ad hoc format for constructors, based on JSON – which, by the way, is much closer to the Wikifunctions data model than to that of Wikidata.
The evaluation appears to significantly under-estimate the complexity of the existing Wikibase code within which a “fork” would operate, and the production maintenance concerns around adding such novel and unpredictable functionality to the already over-burdened Wikidata production stack. Extending Wikidata itself is being considered, and though no decision has been made, one possibility is indeed that the “Abstract Wikipedia” content, i.e. the user-written calls to functions, might indeed happen on Wikidata.org as an extension of that community’s existing creation, editorial, and curation processes. This is a conversation we still have to have with the wider Wikimedia and especially the Wikidata communities.
The evaluation seems to assume that setting up a fork of the Wikidata software and adapting it to include the representation of other data-types such as constructors is: 1) a faster approach and/or 2) a simpler approach. This doesn’t seem to consider the size and complexity of this platform or the adequacy of its architecture to represent this kind of knowledge. Such a claim without an actual analysis of Wikidata/Wikibase features and how they could be used for Abstract Wikipedia benefit doesn’t stand.
For example, just consider how difficult it has been to introduce entity schemas in Wikidata and make them widely used. Abstract Wikipedia’s constructors are a very different data model than the Wikidata data model that is based on individual, independent statements.
Some renderers are most naturally implemented as functions, rather than direct representations of user-contributed data. Also, when getting morphological data, constructors must be able to call functions, since the most natural way to express some morphological data is in code. As a simple example, take a Sanskrit verb: in its finite forms, Sanskrit has present, imperfect, perfect, aorist, and future tenses (and multiple forms of each); and optative, conditional, and imperative moods beyond the indicative. It has three persons and three numbers, and most of the above can take both active and either middle or passive voices. Each of these forms may be infixed in various ways to produce causatives and desideratives. Then participles: there are eight participial forms that inflect like nouns, meaning each one has three numbers and eight cases, and these can also be infixed. This produces over a thousand forms, and that’s before phonotactic changes. It’s not clear how a constructor hosted in Wikidata would call such a function.
For Abstract Wikipedia, it would suffice to create a platform which allows creating NLG rendering functions
There are sufficiently general NLG systems which could cover all (written) human languages (for example: Grammatical Framework, templatic system)
Some of our own colleagues, like Maria Keet, have noted the inadequacy of those systems for many of the languages Abstract Wikipedia is intended to serve.
In particular, according to Keet, Grammatical Framework notably lacks the flexibility to handle certain aspects of Niger-Congo B languages’ morphology. The traditional answer to this kind of critique would be to say, “Let’s just organize an effort to work on Grammatical Framework.” The design philosophy behind Abstract Wikipedia is to make as few assumptions about a contributor’s facility with English and programming experience as possible. Organized efforts around Grammatical Framework (or otherwise) are not a bad idea at all. They may well solve some problems for some languages. However, the contributor pools for under-resourced languages are already small. Demanding expertise with specific natural languages like English and also specific programming paradigms contracts those pools still further.
Grammatical Framework and other tools may well become part of the Abstract Wikipedia ecosystem, but, in an ideal situation, community members would be able to contribute without knowing anything about those tools.
Furthermore, we should treat Abstract Wikipedia’s conception as a community-driven, crowd-sourced effort as an asset, not an obstacle. While existing general NLG systems “cover” many use cases for natural languages, they are usually built on a set of assumptions which don’t always apply in the Wikidata ecosystem: clean, well-curated, predictable data.
This is also why it doesn’t suffice just to build NLG renderers. A core hypothesis of Abstract Wikipedia is that the languages which it is most important to serve benefit least from a black-box approach, i.e. a specific framework encoded separately from the encoded grammatical and linguistic knowledge, restricting and defining them. Again, existing renderers and NLG systems can provide some of the tools on which Abstract Wikipedia is built, but Abstract Wikipedia itself needs to do something different and broader in order best to meet its goals, adapt to its unique disadvantages, and avail itself of its unique advantages.
As the evaluation correctly points out, there are many extant NLG systems, and some, such as Grammatical Framework, already have a community and have demonstrated their effectiveness across numerous languages. Ariel Gutman suggested another solution, so did Mahir Morshed, both specifically for Abstract Wikipedia.
The problem is: which one to choose? Grammatical Framework has a great track record, but, as mentioned above, Maria Keet pointed out problems with Grammatical Framework for Niger Congo B languages. SimpleNLG would be a great choice, but here, too, similar problems surfaced. Indeed, Ariel’s template proposal also benefited from Maria’s input, and became feasible for Niger Congo B languages. But does this mean we know that the proposal is solid enough for all other languages? How do we know that we don’t have to make significant changes to the selected system to support Japanese honorifics, to support evidentials, or to support some other features we haven’t thought about, maybe in a Polynesian or an Algonquian language?
We don’t think it is a coincidence that even after working for half a year with a group of experts on the topic, the evaluation still refrains from saying *which* system to choose.
By building the solution in a wiki, we can pivot. We can figure out we made a mistake and switch, without having to fork an existing NLG solution. The community can try out multiple directions, all built on a single common framework, i.e. Wikifunctions. This allows for (1) familiarity with the framework to carry over and (2) substantial portions of work to be reused, as directions evolve.
Also, by reducing Abstract Wikipedia to a framework that is strictly aimed at natural language generation, we would not be able to produce other kinds of output: we wouldn’t be able to share functions such as calculating the age of a person based on the date of birth, or geoshapes to list all animals that live in an administrative area, or even to create something as simple as tables or charts, which are all very useful for an encyclopedia and form part of the Wikimedia movement’s vision for Wikifunctions, distinct from the plans around Abstract Wikipedia.
In short, this section argues that we should just choose a specific, existing or new NLG framework instead of allowing for the flexibility of Wikifunctions. We argue that the discussions during the fellowship have indicated that none of the existing frameworks are sufficient, and that we simply don’t know yet what the right framework would be. Whereas the evaluation argues that the decision to set up Abstract Wikipedia on top of Wikifunctions puts Abstract Wikipedia at an unnecessary risk, we argue that locking in a solution early is akin to playing the lottery, and is thus an unacceptable risk. Especially since the risk would very likely be borne by the languages that are already underrepresented the most.
Would Wikifunctions expedite Abstract Wikipedia?
[edit]Indeed, as mentioned above, one such system, Grammatical Framework, which has been mentioned before as a possible candidate, currently supports (to a certain degree) about 45 languages of various linguistic families.
A solution designed by a small group of Westerners is likely to produce a system that replicates the trends of an imperialist English-focused Western-thinking industry. Existing tools tell a one-voice story; they are built by the same people (in a socio-cultural sense) and produce the same outcome, which is crafted to the needs of their creators (or by the limits of their understanding). Tools are then used to build tools, which will be again used to build more tools; step by step, every level of architectural decision-making limits more and more the space that can be benefitted by these efforts.
Grammatical Framework would probably give a quick start to creating the basis for a NLG system perfectly suited to “about 45 languages of various linguistic families”–less than 10% of all existing written languages. However, while designing a system that is explicitly focused at covering the knowledge gap of languages that are underrepresented on the Internet, basing the architecture on a framework that gives support to less than the 10% most represented languages defies–by design and from the very start–the bigger purpose of the project.
This would situate the most important building block of Abstract Wikipedia–language technologies and multilingual support–under the sole responsibility of a separate free software community and hence under a whole different set of goals and values. This seems too big a risk for not only the success of Abstract Wikipedia but also for the success of the Wikimedia Movement Strategy, by dropping support precisely to the languages that have been historically left behind.
The challenges that Grammatical Framework faces to cover language diversity are known and publicly published. As it is widely extended in the current software industry–of course mostly determined by the software market and not at all absent in most open source projects–, the decision of what new languages to include next is purely strategic. The factors that condition the decision making are, among others, the commercial potential of the new language or the lack of technical challenge. This is, again, not a surprise and anyone that understands the heavy dependency that software has on funding can empathize with these criteria. However, this steps away from the values lined up behind every decision that we must make at the Wikimedia Foundation to keep walking towards our Movement goals.
We have seen exactly these issues raised regarding Niger Congo B languages.
This is not meant to say that Grammatical Framework will not be very valuable to Abstract Wikipedia. However, we don’t think we should tie ourselves too tightly to an existing system such as Grammatical Framework. Rather, we should allow space for other possible solutions to develop.
It is also notable that even though existing NLG systems exist, as mentioned by the evaluation, the NLG workstream spent considerable time on developing yet another NLG system, instead of using an existing one. This indicates that the solution space for NLG is not sufficiently explored yet.
Indeed, enabling community contributors to create their own NLG systems would inevitably lead to reduplication of efforts and waste of resources.
The argument of duplicated effort is not new to Wikimedia projects, and we see a lot of duplicated efforts across the projects. But the projects also have mechanisms to rein these in, and to bring work together. Particularly the idea that we would have different constructors for different languages, and not just different renderers, would be very troublesome. But that problem can happen with a “One Ring” solution as well, since the problem would be in the creation of the constructors.
The sound design of an NLG system should start with a design of the data on which it operates, the specification of the Abstract Content
The evaluation suggests that we should start with the specification of the Abstract Content. This claim is not only debatable, but also has broad political implications, to which we must be sensitive. Indeed, we find the suggestion entirely backwards.
It assumes the existence of a single authority in whom is vested the power to design “the data.” Unfortunately, as every existing NLG system demonstrates, no single authority possesses sufficient emotional intelligence, breadth of linguistic knowledge, or subtlety of cultural awareness to create a system whose functioning is equally sound for all humans who interact with it. Abstract Wikipedia has both the burden and the opportunity to challenge this assumption.
Abstract Wikipedia’s data cannot be “designed” in the conventional way. Its data is an emergent and changing entity, responsive to the consensual participation of many volunteers from many different backgrounds. To presume well-designed data a priori would in fact presume that Abstract Wikipedia has no data to work with at all. So this NLG system must start with an appreciation for diversity in both its data and its contributorship.
This is not to say that Abstract Wikipedia cannot or should not influence the treatment of data by existing communities; indeed, it can, should, and likely will. But this influence cannot and should not arise from the presumed authority of an authoritarian cadre. It should instead arise consensually and pluralistically from many interconnected interactions among diverse contributors.
A pervasive and skewed assumption underlies much of the modern software space. Tech has enshrined as a term of art the word “user,” and with it presumptions about subjectivity. Under this assumption, a small number of technologists occupy subject position; they produce software which is to be passively consumed by users. Users’ ability to participate in the production or refinement of technology is nonexistent; they can neither see into nor consent to the technology on offer for their consumption; their attitudes and desires are proxied by market forces. The extent of users’ ability to participate in production is by proffering or withholding their consent, and this is often wrested from them by various coercive or addictive mechanisms.
Abstract Wikipedia must eschew this assumption and adopt a radically inclusive model. This means that every step of the process of natural language generation should be visible, defeasible, and subject to participation by the broader community. As much as possible, the line between producer/consumer or technologist/user must be made to evaporate.
Again, this is not to say that Abstract Wikipedia cannot or should not be modeled on the great body of existing expertise in the domain of NLG; indeed, it can, should, and likely will. But this influence can extend only to minor questions like algorithms and architecture, not to foundational assumptions about relationships of production.
For Wikidata, there was a similar argument floating around before the project started: should we have a single specified ontology, decided on by a small number of experts working for the Foundation? Or should we let the community own the ontology entirely?
We think that decision is similar to the one we are facing here. Should we have a single natural language generation system pre-defined, or shall we allow the community to own and grow their own solution?
As the contribution base will grow, it is likely that contributors would signal missing features (or bugs) in the selected NLG system. Depending on the platform on which the base code of the system has been developed (e.g. in Gerrit, or in Scribunto) either the Abstract Wikipedia team, or volunteer contributors could step in and implement those missing features. Since the base should mostly be natural-language agnostic, this should be manageable even by a small team of software engineers.
This argument is fundamentally flawed. It assumes that starting from a fairly constrained system and expanding to significantly complex features is simple “even by a small team of software engineers” or even possible. However, these “missing features” would be caused by the constrained nature of the proposed architecture and the way to “add these features” would involve either major architecture changes, or parallel systems being added into the flow–and, as a result, lacking any architectural integrity. While understanding the story of language-related software technologies and the reasons behind them not covering a wider range of languages that are generally underrepresented on the Internet, a common challenge that appears is that different languages have such deep foundational differences that they might be impossible to represent from an architecture thought from the familiar perspective of Indo-European languages. This challenge often means creating parallel systems and architectures, or rebuilding the whole foundations of the product.
For Abstract Wikipedia, this is neither desirable nor realistic. A small team of software engineers should be able to manage features and bugs in the engine on which the system runs, be that Wikifunctions or otherwise, up to and including specialized functionality that engine offers to support NLG. Features and bugs in abstract content, underlying data, etc. should as much as possible be managed by community members, as they are the ones with linguistic and cultural expertise.
In the case of Wikifunctions, this statement is true. In short, this section argues that by using an existing system we would be much faster in reaching the goals of Abstract Wikipedia. We agree that it would seem so, but the cost of that would be too high, both in terms of risks, particularly for smaller languages, as well in terms of lack of community ownership of the solution.
Is Wikifunctions adequate to developing Abstract Wikipedia?
[edit]It is clear that Wikifunctions does not provide a good enough UI and debugging capabilities for maintaining such a large software system.
It is worth noting that the current UI for Wikifunctions, like the UI of many technical products, is not static. We have articulated as a team that we wouldn’t consider the current version “finished” by any metric. However even if we did, we are operating under the assumption that there will be feature requests, bug reports, and insights revealed by users as the system is used and continues to grow that we will respond to. Right now there are ways we know it isn’t good enough and we are working on closing that gap. But there are also ways we can’t anticipate that we will need to learn in the form of user feedback and we as a team have long been committed to keeping people on staff as maintainers of Wikifunctions until we feel it can stand without it.
Thus, we intend to remain agile and will plan future feature development based on feedback and analytics post-launch.
There needs to be a clear distinction between what the finished product of Wikifunctions will be–since it currently does not exist–and what the current state of the Wikifunctions project is. The architectural and design decisions of a system that is going through its development phase should not be criticized solely on the basis of its current unfinished state. Wikifunctions’ UI is currently going through its design/development phase – at different stages for different components; the current status of the project is that of an experimental POC more than of a finished Wiki.
One of the concerns voiced by the team–staff and fellows–while discussing the possibility of doing a “soft launch” like the one that was finally done for Beta cluster was exactly this: the risk of the project being observed and heavily criticized during its embryonic state. The User Interface of Wikifunctions that launched for beta had very little–if not zero–to do with what we will launch as a finished Wikifunctions.
Wikifunctions does not provide a good enough UI and debugging capabilities
We agree with this statement at this time, but as noted above the product is unfinished. Progressing on our roadmap, which aims to improve this precise aspect, is the current priority of the team. This is also the reason why we were happy to offer our Google Fellows the chance of contributing in the improvement of this UI to reach our goals successfully. There are still three ongoing fellowships that are very actively, honestly and passionately collaborating with their UI/UX and design skills to this goal.
While technically possible, writing an NLG system as a combination of pure-functional functions is less intuitive for the average programmer
Sure. But we object to this being anywhere near relevant. The “average programmer” should not be considered the main contributor of Wikifunctions or Abstract Wikipedia. What we need to be thinking about is whether a non-technical contributor, a person that is not so familiar with programming, could easily learn or adapt to a functional model.
Functional programming is a different way of thinking and it generally poses a challenge for a mind that has already absorbed the imperative programming paradigm. However, functional programming is closer to other areas of study, such as mathematics and any disciplines that require probabilistic or scientific computation. For people who have no prior relationship with programming, a functional paradigm might be as intuitive as an imperative one, if not more so. Many engineering schools decide to teach functional programming as their first language, before other more industry-oriented imperative programming types are taught and end up re-shaping the mind. One example of this is the MIT course “Electrical Engineering and Computer Science”, which taught functional programming as a first language to new students and only then introduced–and specialized on–imperative paradigms. This educational model set a standard that was copied in multiple international engineering schools (including those of some of the team members).
It is also interesting to note that Grammatical Framework, which has been mentioned above as a strong contender for the One Ring solution, is in fact a fully functional system.
A typical development environment for an NLG system should allow the different components of the system to access a global state (shared memory)
Please see the discussion about global state below.
Randomness is required to allow some variation in the output of the NLG system, while non-determinism is inherent when relying on external data sources, such as the Wikidata lexicographical data, or the system’s time (which can be useful, for example, to calculate and verbalize the age of a person)
Please see the discussion on non-determinism below.
It should be noted that from our point of view a system that randomly changes parts of the text would probably not be regarded favorably as a source of Wikipedia article text.
Part II: Technical design of Wikifunctions
[edit]Support for multiple programming languages
[edit]This extra composition layer constitutes an entirely distinct language on its own
This is correct. Here we briefly point out that an alternative means of creating compositions is available; we shall have recourse to this example on several occasions, as it can remedy many of the pain points and deficiencies of the current implementation.
Within Python code (and, soon, within JavaScript) in our system, it’s possible to write compositions like this:
def Z10001(K1, K2):
suffix = W.Call('Z802', K1, 'ing', 'ed')
return K2 + suffix
W.Call makes a subsequent call to the orchestrator (in this case, the “If” function) and returns its result. In this way, W.Call in native code can serve as an alternative implementation of compositions.
The point of the composition layer is to support writing implementations in different natural languages.
Security
[edit]Let’s first accept that, to accomplish Abstract Wikipedia’s goal, we must execute code not written by staff members or contractors of the Wikimedia Foundation on Wikimedia Foundation hardware. This is true whether the team adopts the Scribunto approach, Grammar Framework, Wikifunctions, or any other reasonable approach. We must therefore assume the risk that at some point in Abstract Wikipedia’s life-cycle there will be a CVE (Common Vulnerability and Exposure) that affects a part of Abstract Wikipedia’s technology stack, and that some malicious agent will exploit it. We therefore will take security precautions that any responsible production system with sensitive data should take.
First, executed code will be run in Linux containers. Containers are a combination of technologies, primarily at the Linux kernel level, that limit what resources its processes can access on the host system. Paradoxically, early container runtimes needed to be run as root, so if a process inside the container could “break out” of the container, it could then assume root powers on the system. With the modern development of user namespaces, containers no longer need to be executed as privileged users, thus making the risk of container breakout identical to other non-privileged user remote code execution. Extra layers of security can be added by running the containers with a separate kernel that proxies a “safe” subset of system calls to the host kernel. Indeed, this has already been implemented in Wikifunctions through gVisor by the fellows, for which we are thankful.
Second, if we assume a malicious process will gain non-privileged access to a machine, we should ensure that this machine has limited access to sensitive information. Unlike code run under the Scribunto framework, user-written code on Wikifunctions never runs on a box with access to any sensitive information. Indeed part of the mitigation is to use a container orchestrator that separates resources automatically based on auditable configuration. The Wikimedia Foundation is in the process of migrating most production workloads to Kubernetes, an open source container orchestrator platform, which is becoming an industry standard for container orchestration. The release engineering, site reliability engineering, and security teams are building up expertise in how to configure Kubernetes for best practices in production with sensitive data. An anticipated, extra-cautious approach is to use node segmentation, thereby only having Wikifunctions user code execution occur on machines that are not running any other processes, let alone sensitive ones (e.g. with production data or financial information), and ensure that the network is segmented in such a way that malicious code could not use the node as a jumpbox to more privileged machines.
While what we have outlined are only some potential mitigations, the point is that strategies exist to ensure that even in the event of an inevitable container breakout, the risk to privileged access of sensitive data is minimal. We must consider these mitigations regardless of the underlying natural language generator Abstract Wikipedia employs and therefore do not see the work of mitigating the risk of executing functions using multiple languages to be more than mitigating the risk of only one language. Moreover, having multiple implementations of the same function, adds some amount of robustness in the system in the event that Wikifunctions needs to temporarily disable one type of executor due to a published CVE.
Non-determinism
[edit]In order to allow for variation randomness is not required. Pseudo-randomness is more than enough and can be accomplished in a pure-functional system. For the rest, this is a bit of a dishonest claim. Any fully-featured programming language needs to be able to read input, whether that be from files on disk, via REST requests, etc. These operations’ output is inherently unpredictable and every programming language can do them.
But in any case, the point is moot. Wikifunctions does not currently enforce pure functionality. It’s entirely possible to write a function such as
def Z10004():
import random
return random.choice(['randomly', 'chosen', 'word'])
It would be nearly impossible to reliably detect and prevent situations like this. While there are ways to sort-of permit randomness in pure-functional languages, it’s more likely that Wikifunctions will, in a controlled way, at some point rescind the claim that it is a pure-functional language.
As with all Wikimedia projects, we don’t expect to “guarantee” anything, and certainly not NP-hard introspection of user-written code for non-determinism, which we agree would be almost impossible. Nor do we expect Wikifunctions to always be slavishly committed to a pure functional model. We will rely on our editing community to determine what content is allowed, to spot non-deterministic code patterns and avoid them (such as by taking in a function’s sources of non-determinism as an input), to report when functions don’t seem to work or cache they way they would expect, and to work with us to build support for non-deterministic functionality where appropriate. This ad hoc accuracy is something we consider a feature, not a bug.
More bug-prone; harder to debug; and harder to optimize
[edit]One hope is that programming language communities will be interested in maintaining their programming language environment. This would spread out the implementation and maintenance burden.
debugging an error requires competence in all the different programming languages that comprise the functionality
Debugging in Wikifunctions should always be confined to the individual function: there should never be the need to debug across language borders, as the functions should define what they return. This is one of the main advantages of functional languages without global state.
Support for multiple languages also makes it harder to optimize performance.
The argument about making it harder to optimize performance is correct. It is hard – but potentially very interesting and very fruitful.
In fact, the system as a whole – marshaling function calls through the web, potentially between different languages – already puts us in a very different place than traditional programming systems, performance-wise.
We will have different optimization strategies. And they can in fact benefit from having several implementations, as the system will be able to choose the best performing implementation. The system will eventually even be able to synthesize implementations (composition is highly synthesizable), and be able to heavily rely on caching (due to functional transparency).
Wikifunctions also allows us to constantly monitor the overall performance. If we find a function where all implementations are particularly resource intensive, we can bubble that up to the community and ask them for alternative implementations of that specific function. We can have a fully data-driven approach to surface necessary improvements, and then answer these with highly optimized implementations – which at the same time will have the benefit of being testable against existing implementations and with existing testers. There will be a strong incentive to individual community members, who might regard this as an interesting challenge, and at the same time they will measurably decrease our runtime cost.
All the problems described in this section are assuming that debugging needs to cross between functions, but the whole system is based on the assumption that this is indeed not necessary, but that we can debug locally.
A smarter evaluation system is expected to be able to deal with all the performance problems mentioned there.
Managing change is onerous and time-consuming
[edit]This is true, as it is for any system that allows users to write code and make use of critical libraries. This doesn’t fundamentally block anything because, without it, we can sidestep this issue by only allowing a subset of libraries and pinning all code to specific versions. Although there are people who will likely complain and find this annoying, it isn’t going to prevent function creation in the vast majority of cases. Not only that, there is nothing in the current design that precludes us from adding this feature later. There are plenty of examples of tools that allow for version management out there, so we know for a fact this should be doable.
Also, as mentioned above, we hope to be able to share this work with a wider community of enthusiasts for specific programming languages.
This, however, presupposes a massive duplication of effort. To have the necessary level of redundancy every piece of the system will have to be implemented multiple times.
Yes, but this is also something that some contributors might genuinely think as enjoyable.
Fragmentation
[edit]It’s worthwhile here to establish some of the explicit goals and non-goals of both Wikifunctions and Abstract Wikipedia.
It is a goal of Wikifunctions to serve as a library of functions. Therefore, it is desirable to implement many functions in multiple programming languages.
It is a non-goal of Abstract Wikipedia that all of its functionality be available in every programming language. This means that common functionality for NLG does not need to be implemented in all languages; it suffices to implement it in a single implementation.
Following that point: we propose here a simple technique to avoid the network boundary when calling functions implemented in the same programming language. Because it’s a non-goal that NLG helper methods be implemented in more than one programming language, any subset of contributors who wish to use helper functions written in a single programming language can avoid the most expensive part of Wikifunctions’s composition model. Note that this doesn’t preclude calling functions written in other programming languages – it just means that calling functions from the same language will be more efficient. In this way, Wikifunctions can achieve the best of both worlds: the flexibility to call functions written in any programming language but also the efficiency of calling functions without incurring the huge cost of network I/O.
It is a non-goal to insist a priori on a single standard. It is a goal to provide the tools for contributors to find standards that work. It may be that subsets of languages will settle on some tool, but it’s likely that a set of tools will emerge that is so useful as to be shared among all or most languages’ NLG efforts. It’s really not the Abstract Wikipedia’s team job to decide that. It is the team’s job to ensure that the engine which drive’s Abstract Wikipedia’s NLG maximally enables communities to find these ideal solutions.
While it is, in a broad sense, a goal to foment a developer community around Wikifunctions, it is a non-goal to do so in the traditional way. Let’s first observe that Wikifunctions (meaning the code library hosted on the platform, and hopefully not the software itself) rigorously transgresses many best practices. A developer can’t use their chosen programming environment but must (for now) use the on-Wiki UI. A developer has to give objects unreadable names, and then go through the time-consuming process of separately labeling them. A developer doesn’t have full control of the bits of code they write, and the history of a piece of code will follow a Wiki-like process rather than a version control-like process. These are all intentional trade-offs to make Wikifunctions more approachable for non-programmers.
Common idioms emerge, get librarized.
Exactly. Supporting librarization is a major goal of Wikifunctions.
Questionable benefits
[edit]The case for the value added by supporting multiple languages is weak
If we were to limit the goal of Wikifunctions, it would be plausible that the need to support multiple programming languages could go away. This support is an important part of Wikifunctions’s other purposes, such as serving as an algorithmic reference tool.
One advantage of allowing several implementations in potentially different programming languages is not mentioned here: it increases the stability of the system and reduces the potential for edit warring. If we allow contributors to each add their preferred algorithm for a specific function, and then have the system decide which one to take, there won’t be conflicts between the contributors about what the one true implementation for a specific function should be.
Also by having several functions that all must return the same results, we can check the functions against each other. This will make it more difficult to inject vandalism, as it would need to be injected in several different programming languages at once.
Several implementations also provide us with a rich potential to increase speed and efficiency.
It also allows for a larger pool of potential contributors. One of our growth strategies will be to explicitly aim for enthusiasts of individual programming languages, which might constitute one of the pillars of the Wikifunctions community.
Let’s turn to focus on the subset of Wikifunctions’s functionality that will support Abstract Wikipedia. Support for multiple programming languages has no bearing on Wikifunctions’s suitability to support Abstract Wikipedia. To be clear: as envisioned, people may implement NLG functions in any supported coding language, but they need not do so, and it may even be the case that the community vastly prefers a single language anyway. Programming language diversity is at worst a neutral feature from the perspective of NLG.
The current design does not fulfill the goal of supporting multiple languages to provide users a programming environment that feels familiar and allows them to leverage their existing programming skills
This is not a high priority goal. Wikifunctions is trying to include a different demographic: people who are not necessarily expert coders but would like to do things with code. A contributor who is already a strong coder should have no trouble adapting to a slightly different paradigm; a contributor who is not yet a strong coder will presumably not have a “familiar” environment.
The purely-functional model means that parts of the standard library or common third-party libraries cannot be exposed
This isn’t true. The only requirements for native code are that 1) a snippet expose an object with a specific name, and 2) that that object be callable like a function. But within a code block, it’s entirely possible to define other functions or classes; import/require standard libraries (or third-party libraries that are explicitly installed in the executors), etc.
Arbitrary third-party libraries can’t be used, but that’s a result of the security model, not the function model. We’ve considered it a desirable property that third-party code be installed in the execution environment on an ad hoc basis, after a process of deliberation. It may well be the case that community members want nltk
installed in the Python executor to support NLG; that decision should be made after careful assessment of security concerns, and it should be implemented by the engineering team.
If a class from a given library in a given programming language is particularly useful, instances of that class can be used as inputs/outputs to functions, and therefore shared. At present, it’s possible but cumbersome to do this; it will become much easier when contributors can write custom serialization/deserialization code. That’s already slated to happen, if not before launch, then soon after.
Custom serialization/deserialization is also part of a larger discussion about global state, executor runtimes, and code reuse, q.v.
Users cannot use global state and cannot make network calls
Within a single implementation, global state can indeed be defined and used as normal for each programming language. That global state cannot be shared among function calls (without serializing and passing the global state as an argument), but that restriction is shared with alternative solutions like Scribunto.
Regarding network calls, the position of the Abstract Wikipedia team so far has been that the inability to make network calls is a desirable property of the system.
Selective access to certain resources, like Wikidata itself, can be wrapped in builtin function calls, of which native code can avail itself using the alternate composition model.
Eventually we will want to be able to support network calls, but this will be well after launch and will require a larger discussion to be had with security. Alternative proposed solutions like Scribunto suffer from the same problem.
Programming for Wikifunctions means having to work with the Z-Object system. Code that deals with Z-Objects looks unnatural and unidiomatic in every implementation language.
This is a good point. Let’s first bound the discussion, then investigate the cases where this is a problem and try to propose solutions.
The ZObject language will not be exposed to the casual contributor. In many (perhaps most) cases, contributors will see labelized versions of ZObjects.
The ZObject language will be mildly important when implementing things like custom validators, a task which will likely be undertaken by more expert users; even there, the presence of the ZObject language is relatively minimal, but the situation can definitely improve!
The ZObject language is most visible in native code, and it is especially visible when native code interacts with the mechanics of Wikifunctions, e.g. when implementing a validator. This is, indeed, a big problem, and the AW team should prioritize i18n in native code. We will commit to improving this situation.
The goal for the ZObject syntax is for it to remain mostly an internal representation. Yes, the expert contributor will be able to use the APIs, and will see the function implementation code in its ZObject format. Yes, there’s always the risk of abstraction leak. But for the majority of the contributor’s experience, the proposed serializers and ongoing design work should allow them to create function implementations as simple as as:
def integer_division(dividend, divisor):
return dividend.base_10_value // divisor.base_10_value
Or even simpler:
def integer_division(dividend, divisor):
return dividend // divisor
Considering the function offered as an example of the extraordinary complexity in the essay:
def Z10001(Z10001K1):
return ZObject(
Z10001K1.Z1K1,
Z10000K1=str(int(Z10001K1.Z10000K1)+1))
We are moving towards a UI that can allow the contributor to implement that function like this:
def increment(number):
return Integer( number.base_10_value + 1 )
Or even:
def increment(number):
return number + 1
Accomplishing this would require the following features:
- Already existing serialization/deserialization for a collection of built-in types, or user-contributed serialization/deserialization for custom types, which has already been proposed and discussed. This can both make sure that the input values are transformed into the correct builtin representations in each programming language. It could also take care that whatever is outputted by the user-contributed code in the UI is passed through a user-contributed type constructor and generates a valid ZObject of the correct type specified in the function definition.
- Labelization and de-labelization in the Vue components that help the contributor write functions. This is already implemented as a global front-end feature and we would just need to integrate this labelization/de-labelization process in the code-editor component (load the labelized value in the editor component and transform the value into its de-labelized format when sending it to the backend)
- For more complex ZObjects, this would probably need a more advance front-end code editor component that, using the keyboard prompt of the contributor, after writing a name of an input parameter, it displays to the user all the possible keys/properties that the input type has (basically an ad-hoc autocompletion feature embedded in the Vue code editor component)
These features are either already contemplated for our backend and front-end design, or technically feasible and appropriate for external collaborators–such as Google Fellows–to create and/or contribute to.
If the focus of Wikifunctions is on the usability of the system for non-programmers who will interact with the system via graphical interfaces, that is all the more reason to choose an efficient, standardized language and type system for the backend, rather than inventing a new one
The closest thing to an efficient, standardized open source language for graphical interfaces to programming is the Scratch-family.
Coincidentally, just a few weeks ago Denny met and talked with Jack Rusher at the Foundation office. He was particularly excited about Wikifunction’s ability to have implementations via composition, and about the openness on top of this approach (i.e. composition is just an abstract syntax tree). For him it felt like an invitation to people in the visual coding community to create interfaces on top of Wikifunctions, not just Scratch, but also such as Vlosure or Lisperanto. The hope is that we can give this whole field a push, and provide maybe a number of different interfaces to understand and write compositions.
Function model
[edit]As previously stated, we do not really think of function composition as a new programming language, but rather as an abstract syntax tree to compose function calls together, but this difference may be purely terminological.
Besides that, this section has a lot of good points, and it is our belief that the fellowship has improved many of the issues the system had, and has made the outstanding issues more tangible.
In the end, the function model in the ZObject system merely allows nesting function calls and express data. How hard can that be? ;)
It is certainly not our intention to ignore previous work in this area. On this topic we remain very open to further help, and welcome researchers and practitioners for further support. It is our hope that the formalization created during the fellowship will become an important stepping stone to let more people from the programming languages community join our effort, as it went a far way of translating our previous documents into something more accessible to that particular community.
The language confuses syntactic and semantic notions, which defeats the purpose of types, makes the definition of the semantics harder, and causes problems of confluence in the language where the result of an evaluation depends on the order of evaluation
This is very, very true. Most of the problems due to that arise in compositions, which the alternative implementation above addresses. However, there are some other flaws in the function model.
The semantics of the language are not clear and are hard to define
Very true! There’s not much to demur at here. We are thankful for the fellow’s valiant efforts to improve this situation.
The notion of validation is not well-defined due to the evaluation strategy and the recursive definition of objects and types
Indeed. For this reason, after the critically helpful work of fellows such as Ali and Mary, the system is largely moving away from validation, except in cases that meet a more conventional definition of validation, e.g. writing a function to check that a string is a valid integer literal.
The language mixes types and objects in an unrestricted way and allows arbitrary computations in types
Yes! This is bad.
Currently, the situation is kind of like C++ templates, except worse because the mixing is totally unrestricted and type resolution happens dynamically (in C++, Turing-complete template calculations are set off between <angle brackets>, and they happen in a separate preprocessing step, so at least most of the really mind-blowing stuff happens before compilation).
Ideally, we could work toward a model more similar to TypeScript, where type-space and object-space are totally separate and there is a separate class of function for operating on type-space. Ali’s proposal to make functions themselves a generic type, along with some attendant restrictions, is something that could be applied to all generic functions (e.g., so that they do not support arbitrary computation but a strictly-controlled type algebra). That would go a long way!
Javascript is poorly suited for writing an interpreter due to its dynamic nature and its inexpressive data type representations, making extensibility, refactoring, and maintainability hard
Absolutely. We have written a task to consider switching to TypeScript later on. That would solve many of the mentioned problems but still allow us to use the service-template-node framework sanctioned by the WMF.
A more radical but possibly better solution would be to switch to a more performant, compiled language, like C++, or Go, or even Rust since it’s so fetch, at least for the component that implements a programming language. This would be a great project for a group of motivated volunteers, or future fellowship!
Using text JSON to exchange data is inefficient. Basic objects already have huge representations compared to using serialized protocol buffers
Great point. We’ve already moved away from raw JSON in the interface between the orchestrator and the evaluator, which is where this was causing the gravest problems; we now serialize objects with Avro. We’ve also vastly reduced the size of objects by allowing certain representations to remain compact, rather than being expanded (especially types, which only need to be expanded ephemerally for type-checking purposes).
We could improve the situation still further by adopting Avro serialization (or Thrift, Protocol Buffers, whatever) between the PHP layer and the orchestrator, at a cost of removing the plans for a public API end-point for the orchestrator de-coupled from the MediaWiki ecosystem.
Speed
[edit]This section is mostly correct. We hope that the system will be fast enough.
As mentioned above, the introduction of several implementations for every function is in fact also a step towards leading to an overall faster system, as the backend can choose which implementation to take, and can collect data about slow implementations and request support from the community to help improving the overall system performance by writing faster implementations.
One misunderstanding about the performance goals is present in this section though: we are comparing the potential performance with the current system, not with a potentially different system. As is easy to see, the current system can take months and years, or even fail, to propagate changes in the world: an update to the Romanian Wikipedia documenting a change in the world can take a long time to propagate to the Japanese Wikipedia, since it is a process that involves people reading and editing articles. If the Abstract Wikipedia solution leads to a solution that even takes a day or two, it would still vastly outperform the current system.
Scribunto
[edit]We agree that an initial attempt at building Abstract Wikipedia could conceivably have been built as a new wiki using Scribunto, with some investment to modernise the code, to run (some) calls to Scribunto on its own k8s cluster isolated from the entire MediaWiki stack, and with no database access for security, and also some effort to provide external and cross-wiki calling (all work that is also needed for the current WikiLambda approach!).
We were particularly concerned that its monolithic approach to code and functionality offers no engagement pattern in the user experience for the same level in non-dominant languages. The human language to which any given programming language most aligns is going to be English for all current major languages, and that’s not a particular concern here.
This approach would certainly solve this document’s focus on the “zero to one” feasibility of the project, which we appreciate and recognise. However, it sets aside or disregards the “one to infinity” feasibility around building a sustainable community that isn’t wholly dominated by the main post-colonial European languages, and indeed would stack the deck against non-dominant wiki participation by extending the existing comingling of code, functionality, bug reports, and documentation.
The more separable approach we have taken, splitting the definition and use of functions, is indeed moderately more complex and divergent from existing paradigms; code supporting Abstract Wikipedia could have been built in a simpler, more familiar system like Scribunto, but at a long-term scaling cost. Certainly, it would be possible to extend the Scribunto platform to support multiple languages, and that some of those languages be visual, but this would not approach the objectives around separation of concerns. Had we taken this approach for providing the Wikifunctions platform, there would also have been significant developmental stability concerns about our adding major functionality changes to such a critical part of Wikimedia’s existing production stack from security, resilience, and performance perspectives, even behind feature flags.
One major consideration is that the Scribunto solution relies on Lua as a single language, whereas Wikifunctions intentionally aims for a multitude of languages (which is a major point of critique of the evaluation). The assumption is that by not limiting to a language such as Lua (which is not a particularly popular language to start with) we will be able to more easily find the people necessary to contribute the grammatical knowledge to the natural language generation libraries of Wikifunctions necessary for Abstract Wikipedia. To put it differently: a Scribunto solution would require finding enough Lua developers with a strong enough understanding of their natural language’s grammar to write all the necessary renderers, and who have enough time to contribute to an Open Knowledge project. Wikifunctions on the other hand expands the potential pool of contributors quite a bit compared to that.
The evaluation acknowledges little to no value to the effort that Wikifunctions is investing in being multilingual in terms of natural languages, even for implementations. It is unclear as to how a solution that is built on top of Scribunto would succeed in being multilingual, and not just be an English project which the other languages may be invited to use, if they understand it, but would be mostly blocked from contributing to.
As an aside, we gently disagree that large numbers of Lua scripts being manually copied to other wikis is necessarily a sign of Scribunto having “been successfully adopted”, so much as generally being an example of the existing anti-pattern of smaller communities often needing to manually copy large sections of other wikis’ meta-content just to replicate a single citation template, infobox, or other moderately complex content presentation. A shared resource for all of this work could indeed be a great boon, but is outside the scope of the Abstract Wikipedia team.
Recommendations
[edit]Abstract Wikipedia should be decoupled from Wikifunctions
We agree that the two projects should have two development teams. It would be great to have two funded teams working on Wikifunctions and Abstract Wikipedia, instead of both projects being worked on by the same team.
However, a complete decoupling of the two projects, as suggested, would lead to substantial duplication of engineering effort, as the decoupled projects would inevitably have to solve many of the same problems.
More generally, there is also a case to be made for synergy between the two coupled projects. The needs of Abstract Wikipedia will drive and validate the evolution of Wikifunctions, and the unique features of Wikifunctions (i.e., support for non-expert contributors, non-English-centric contribution framework, ability to integrate contributions in multiple programming languages) can help to accelerate the growth and participation of the Abstract Wikipedia community and provide solutions that might not be readily available in a single-language framework.
Overall, our biggest worry is the following: Abstract Wikipedia will require, for each language, a large number of renderers to be written. If we were to follow the recommendations for Wikifunctions, the contributors who will write these renderers need to fulfill the following conditions:
- they need to be programmers, specifically in Lua
- they need to read and write English
- they need to be well versed in the grammar of their language
- they need to have the time and willingness to contribute to an Open Knowledge project
It is often likely that people who fulfill the first two requirements rarely fulfill the last requirement, as their skills are often in high demand. We hope that with Wikifunctions as it is currently planned we can reduce the conditions considerably:
- they need to understand computational thinking good enough to either program or use composition, OR they need to know a language well enough to write example sentences for the testers and check the outputs of the renderers
- they need to have the time and willingness to contribute to an Open Knowledge project
By separating testers and implementations from testers, we allow the decoupling of the skill sets for writing tests and for writing implementations. Sharing the skill set would still be of huge benefit, which is why it is so important to allow as many people as possible to write renderers and implementations. We know how empowering it can be to be able to use your own native language instead of having to translate everything to English. We think that is crucial.
This is only possible with Wikifunctions, which is why a full decoupling strikes us as infeasible.
Most of the other recommendations boil down to “decide on the answer and implement it”. The recommendations suggest deciding on a single programming language, on the set of constructors, a single approach to NLG, etc. Instead we have opted to plan for and developing an environment where we can, together with the community,
- embrace a diversity of programming languages (which also provides us with more robustness in the system overall)
- allow for implementations and contributions in a diversity of natural languages, and not only in a single natural language
- experiment and try out different approaches to natural language generation, with the flexibility to pivot and fallback to a powerful, unconstrained environment
- organically grow the set of constructors, not only in the beginning of the project, but also throughout, and keep extending it without being constrained by an early, possibly preliminary set of selected constructors
We understand that this freedom, flexibility, and openness comes with a higher initial development cost, but it ultimately allows for a substantial co-creation with the community.
Conclusion
[edit]We have or plan to implement many of the recommendations the fellows have made regarding security, the function model, the system’s usability, etc. Many of those did not make it in either the evaluation or this answer, as both documents focused on the remaining differences, and less on the agreements.
After consideration, we chose not to implement the recommendations which would considerably curtail the ability of the community to co-create the necessary solutions for Abstract Wikipedia. In particular, we will continue to be a multilingual project, to support an open and extendable set of constructors, and to not pre-determine a single NLG solution.
We also chose not to implement a decoupling of Wikifunctions and Abstract Wikipedia.
We are thankful to the fellows for this opportunity to reach and express a better shared understanding of the goals and values of the project. We understand that their recommendations come from their desire to see Abstract Wikipedia succeed, and we share that desire wholeheartedly.