Abstract Wikipedia/WebAssembly notes

Abstract Wikipedia
(Discussion)
General
Overview — Vision Glossary — FAQ Function model Updates (subscribe)
Development plan
Summary — Name Goals Organization — Requirements Architecture — Components Tasks — Implementation phases (single page version of all above)
Notes, drafts, discussions
Overview of notes – Pre-generic function model – Object creation requirements – First evaluation engine – Function evaluator call – Representation of dates – Representation of languages – Representation of errors – Error messages on ZObjects – Reserved ZIDs – Canonical and normal – User stories – NLG system architecture proposal – Template Language for Wikifunctions Wikidata Abstract Representation Discussion topics External outreach Ideas Licensing discussion Related and previous work – Natural language generation – Theories of semantics and meaning representations – Workshops ZObject
Examples & mockups
Examples Examples of functions Mockups Example: Jupiter Examples: Function composition
Data tools
Data Lexicographical coverage
Historical
Papers, press, and videos Historic proposal Naming contest — Logo contest Archives

This is a collection of notes about WebAssembly (hereinafter WASM), the WebAssembly System Interface (hereinafter WASI), the current integration of these technologies into Wikifunctions, and a series of thoughts on alternative designs we might explore in future.

WASM, WASI, What and Why[edit]

WASM is an assembly-like language. It is supported as a compilation target for several programming languages, including C and Rust. WASM code can be run in most modern browsers. It can also be run as a standalone application via a WASM runtime. There are several WASM runtimes out there; we've tried wasmtime but are now exclusively using wasmedge.

WASI is what allows WASM to interact with an operating system. WASM itself provides some functions for common operations—file read/write, network connections, etc.—which are not natively implemented but must explicitly be hooked to an OS's corresponding syscalls. WASI does this by offering selective access to certain syscalls. wasmedge, like most WASM runtimes, has a built-in WASI extension.

The Wikifunctions executors currently run in WASM runtimes. After discussions with Security and SRE, the Abstract Wikipedia team agreed to use WASM runtimes as a quick and relatively unobtrusive way to address some common attack vectors associated with the running of arbitrary code (which is what the executors are built to do). This is not a perfect solution, but other solutions would have required either potentially unsafe changes from other teams (e.g., changes to Blubber to allow the AW evaluator service to run as root) or large and problematic changes to the evaluator service itself (e.g., making the executor a single long-running service instead of a cheap and ephemeral one, plus attendant changes to coordinate asynchronous calls, implement retry logic, restart the executor process upon failure, etc.). Thus, WASM was agreed upon as a straightforward fix that would comply with security and infrastructural constraints.

Current Status[edit]

For both Python and JavaScript, we have settled on a similar pattern. We build from source the interpreter for each programming language. The interpreter is a .wasm binary which can be run using a WASM runtime (in this case, wasmedge). For performance reasons, we compile that binary (also using wasmedge) when building the evaluator service image.

WASM runtimes for these programming languages can more easily be found in hipster implementations. Thus, for Python, we use RustPython (the standard interpreter is CPython). For JavaScript, we use wasmedge-quickjs, which implements JavaScript using the QuickJS engine (rather than the more common V8).

Alternatives and Ideas for the Future (JavaScript)[edit]

quickjs-emscripten[edit]

quickjs-emscripten is specifically built for the execution of untrusted code. Its execution model is superficially similar to TensorFlow's computation graph. The QuickJS interpreter runs in an isolated context; data is passed between the main JS process and this context via forward-declaration; then a safe eval() function runs arbitrary computation over that data within the isolated context; then returned values are "unwrapped" from the context and surfaced in the main process.

You can see a past attempt to use this approach here. Unfortunately, this is a relatively obscure library with little support, and its WASM runtime does not have the security profile we'd like (at least not out of the box), so we abandoned this approach.

Javy[edit]

Javy works by transpiling JavaScript code to WASM, which can then be run via wasmedge or any other WASM runtime. This approach could thus potentially be faster by allowing us to transpile the executor code directly to WASM, leaving only the community-contributed code to run "slowly" in an eval() statement.

This approach suffers a bit from the perspective of modularity. It presumes that each module will be packaged as a .wit file, but there are heavy restrictions on what can be exported. Saliently, modules cannot export functions that have arguments or return values. However, if speed becomes a concern, we may want to revisit Javy. If .wit imports have not matured by then, we can work around the modularity issues using rollupjs.

cloudflare/workerd[edit]

Cloudflare's workerd was built to solve the same problem that has motivated our adoption of WASM in Wikifunctions: it is a server-side WASM runtime for JavaScript, which is exactly what Wikifunctions is now implementing. One potential advantage here is that workerd uses the more-familiar V8 engine (the other solutions mentioned here use QuickJS). However, workerd is extremely heavyweight, so it seemed too cumbersome to try to integrate it as a first attempt to run JS on WASM.

Additional Resources[edit]

This summary of the QuickJS spec and C APIs has been very helpful.