JavaScript engine internals: NaN-boxing

In researching code plugin sandboxing with WebAssembly, my thoughts naturally turned to how to let people actually program for that environment without forcing use of low-level C++ or such… so naturally, now I’m writing a small JavaScript-like runtime engine and compiler that targets WebAssembly. ;)

My explorations so far are themselves written in C++, with tiny JS samples hand-translated into C++ code that calls the runtime’s classes. If I continue the project, I may switch to directly emitting Wasm source from the translator, with an eye towards future Wasm improvements like native garbage-collected reference types. (Right now I have to implement garbage collection myself, which will be the subject of another blog post!)

The first step of implementing a JavaScript engine is to implement a representation for values, which is tricky because JS values can be any of several distinct types:

  • undefined
  • null
  • boolean
  • number (float64)
  • object reference (string, Symbol, Object, etc)

Many other dynamic languages like PHP and Python similarly allow a value to hold multiple competing types like this. There are two main ways to implement a value type like this.

Tagged unions

The first is to use a “tagged union” struct, something like this:

enum TypeTag {
  TAG_UNDEFINED,
  TAG_INT32,
  TAG_DOUBLE,
  ...
  TAG_OBJECT
};

struct Value {
  TypeTag tag;
  union {
    int32_t int32_val;
    double double_val;
    Object* object_val;
  };
};

Then add accessor/mutator functions which check the tag and perform appropriate accesses or conversions. This is simple, but has the downside of requiring 16 bytes per value (to maintain 8-byte alignment on the double-precision float or a 64-bit pointer or int64). That means most reads require two loads, writes require two stores, and you can’t pass a value in a register reliably. (WebAssembly’s local variables and function arguments are sort of like CPU registers in how they’re addressed, so would be the equivalent.)

To get around these, a common trick in high-performance dynamic language runtimes like LuaJIT and all major JavaScript engines is known as NaN boxing.

NaN boxing the SpiderMonkey way

Floating point numbers are a complex beast! A 64-bit double-precision float can be broken down into several sets of bits:

  • 1 bit for sign
  • 11 bits for exponent part
  • 52 bits for binary fractional part

If the exponent bits are all 1, then you have a special value:

  • if the fractional bits are all 0, then it represents infinity (positive or negative, depending on the sign bit)
  • if any fractional bit is 1, then it represents “not a number” (NaN), used as a special signal that an operation was invalid or a non-value is being represented.
  • and oh by the way, regular operations only ever produce a canonical NaN value with just the topmost fractional bit set.

The interesting part is that means you have an entire number space within the range of representable NaN values that will never be produced by legit math operations on legit numbers!

  • 1 bit for sign (set to 1 for a signalling NaN that won’t be produced by math ops)
  • 11 bits for exponent (all set to 1)
  • 1 bit to force it to NaN rather than Infinity if rest is 0s
  • 51 bits for tag and payload

Firefox’s SpiderMonkey engine divides this up into 4 bits of tag (16 types) and 47 bits of address, which is enough for x86_64 (where bit 48 will always be 0 in user space and bits 49-64 aren’t needed … yet) but apparently causes problems on AArch64 and Sparc64. (They work around this by deliberately mapping memory in the lower 47 bits of address space with mmap.)

I’ve chosen to experiment with a similar technique, but only using 3 bits of tag (8 types) to leave a clean 48 bits of address. Though when building to WebAssembly, you only need 32 bits of address so it doesn’t make much difference. ;)

When reading a value of as-yet-unknown type, you mask off the top X bits and compare against some constant values:

  • each type other than double-precision its has a particular tag value, so you just == for it
  • actual double-precision values can be detected quickly by forcing the sign bit on and then doing a uint64 compare against a cutoff value: any legit double including canonical NaNs will be <= 0b1111111111111000’0000000000000000’0000000000000000’0000000000000000 once sign/signalling bit is set.
  • if there are multiple type tags for pointer types, you can put them at the high end of the tag range and do a uint64 compare against a cutoff value

One downside is that every object dereference has to be masked from the stored value, and typical JavaScript code does a lot of objects. It’s a bit-op so it’s cheap, though… and in WebAssembly you can actually wrap the 64-bit value down to 32-bits without masking, since pointers are 32 bits. :)

JavaScriptCore style

WebKit’s JavaScriptCore uses a different technique on 64-bit native architecture.

When storing a double value, the bit pattern is treated as a uint64 and has a constant added to it to “rotate” the NaN patterns such that the top 1 bits in a range of the signaling NaNs turn into 0s.

  • object references have 0x0000 in the top 16 bits, meaning a 48-bit pointer can be read directly with a 64-bit read — no bit-masking penalty on dereference!
  • int32 values have 0xffff in the top 16 bits, making common integer operations easy with a mask
  • true/false/undefined/null are stored as special fake pointer values in the object pointer space
  • most double-precision float values sit in the 0x0001-0xfffe space, and have to subtract the constant to get back to their original bit pattern.
  • -Infinity has to be specially stored in the object space because the shift puts it in the wrong place

Sounds clever!

I haven’t checked what Chrome’s V8 and Edge’s ChakraCore do, but I believe they are similar to one or the other of the above.

Security considerations

The downside of all this magic bit-fiddling is that you have to be careful at API boundaries between the JS world and the external world it’s connected to to canonicalize all incoming NaN values in doubles, or it would be possible to create arbitrary object pointer references from within the scripting language — a huge security hole.

So, you know, be careful with that. ;)

 

Thoughts on WebAssembly as a plugin sandbox

I’ve been thinking about ways to run user-supplied untrusted code in the browser, with an eye towards things like interactive demo programs in Wikipedia articles and user interface & editor extensions. Just running JavaScript provided by the user is wildly unsafe — it can dig into your web page’s UI and submit server requests on your user’s behalf without permission, for instance — and sandboxing things into iframes can be a bit funky and hard to fully lock down.

Current web browsers have another language they can run, though, which is WebAssembly — and WebAssembly makes much stricter sandboxing guarantees:

  • Sandboxed code has literally no way to access memory or objects not provided to it
    • no DOM access
    • no network access
    • no eval()
  • Imported objects can only be numbers (read-only) or functions (call-only)
  • Only numbers can be passed as arguments or returned to/from foreign functions
  • Maximum memory usage can be set at compile time
  • Compiled modules can be cached offline in indexedDB and reloaded safely

This means there’s no way an arbitrary wasm binary can access your document.cookies or submit a form, unless you pass in a function that allows that.

This also means that unless you provide an API to the wasm code, it can’t actually do anything other than calculate numbers. It also means that whatever API you do provide becomes a security boundary — you must make sure that you don’t introduce a function that can be exploited contrary to the security guarantees you want to make!

There are also a couple things that wasm by itself doesn’t solve:

  • The halting problem — like JS code, wasm code can loop forever if it wants and there’s no way for the caller to interrupt it.

But wait — you can run a wasm binary in a Web Worker thread. And you can terminate Web Worker threads from the main thread! Depending on your API, if asynchronous messaging around the calls is workable this might be a good way to avoid hanging the main thread in the face of a misbehaving plugin.

So what’s the downside? Well, a few things to consider:

  • wasm APIs must be wrapped to idiomatically transfer strings or data buffers with your main JS.
  • Getting the emscripten compiler to produce “bare wasm” from C/C++ without assuming some JS imports are available seems tricky.
  • C and C++ may not be the friendliest languages for plugin writers on text/GUI-heavy systems!
  • Note rustc also has wasm compilation support, but the same concern applies.
  • Any other language runtimes/libraries you use have to be built into the wasm binary… unless you rig up some kind of cross-module linking and ship multiple modules which link together, which seems super hard right now.

Something to consider for later. :D

 

emscripten fun: porting libffi to WebAssembly part 1

I have a silly dream of seeing graphical Linux/FOSS programs running portably on any browser-based system outside of app store constraints. Two of my weekend side projects are working in this direction: an x86 emulator core to load and run ELF binaries, and better emscripten cross-compilation support for the GTK+ stack.

Emulation ideas

An x86 emulator written in WebAssembly could run pre-built Linux binaries, meaning in theory you could make an automated packager for anything in a popular software repository.

But even if all the hard work of making a process-level emulator work, and hooking up the Linux-level libraries to emulated devices for i/o and rendering, there are some big performance implications, and you’re probably also bundling lots of library code you don’t need at runtime.

Instruction decoding and dispatch will be slow, much slower than native. And it looks pretty complex to do JIT-ing of traces. While I think it could be made to work in principle, I don’t think it’ll ever give a satisfactory user experience.

Cross-compilation

Since we’ve got the source of Linux/FOSS programs by definition, cross-compiling them directly to WebAssembly will give far better performance!

In theory even something from the GNOME stack would work, given an emscripten-specific gdk backend rendering to a WebGL canvas just like games use SDL2 or similar to wrap i/o.

But long before we can get to that, there are low-level library dependencies.

Let’s start with glib, which implements a bunch of runtime functions and the GObject type system, used throughout the stack.

Glib needs libffi, a library for calling functions with at-runtime call signatures and creating closure functions which enclose a state variable around a callback.

In other words, libffi needs to do things that you cannot do in standard C, because it needs system-specific information about how function arguments and return values are transferred (part of the ABI, application binary interface). And to top it off, in many cases (emscripten included) you still can’t do it in C, because asm.js and WebAssembly provide no way to make a call with an arbitrary argument list. So, like a new binary platform, libffi must be ported…

It seems to be doable by bumping up to JavaScript, where you can construct an array of mixed-type arguments and use Function.prototype.apply to call the target function. Using an EM_ASM_ block in my shiny new wasm32/ffi.c I was able to write a JavaScript implementation of the guts of ffi_call which works for int, float, and double parameters (still have to implement 64-bit ints and structs).

The second part of libffi is the closure creation API, which I think requires creating a closure function in the JavaScript side, inserting it into the module’s function tables, and then returning the appropriate index as its address. This should be doable, but I haven’t started yet.

Emscripten function pointers

There are two output targets for the emscripten compiler: asm.js JavaScript and WebAssembly. They have similar capabilities and the wrapper JS is much the same in both, but there are some differences in implementation and internals as well as the code format.

One is in function tables for indirect calls. In both cases, the low-level asm.js/WASM code can’t store the actual pointer address of a function, so they use an index into a table of functions. Any function whose address is taken at compile time is added to the table, and its index used as the address. Then, when an indirect call through a “function pointer” is made, the pointer is used as the index into the function table, and an actual function call is made on it. Voila!

In asm.js, there are lots of Weird Things done about type consistency to make the JavaScript compilers fast. One is that the JS compiler gets confused if you have an array of functions that _contain different signatures_, making indirect calls run through slower paths in final compiled code. So for each distinct function signature (“returns void” or “returns int, called with float32” etc) there was a separate array. This also means that function pointers have meaning only with their exact function signature — if you take a pointer to a function with a parameter and call it without one, it could end up calling an entirely different function at runtime because that index has a different meaning in that signature!

In WebAssembly, this is handled differently. Signatures are encoded at the call site in the call_indirect opcode, so no type inference needs to be done.

But.

At least currently, the asm.js-style table separation is still being used, with the multiple tables encoded into the single WebAssembly table with a compile-time-known constant offset.

In both cases, the JS side can do indirect calls by calculating the signature string (“vf” for “void return, float32 arg” etc) and calling the appropriate “dynCall_vf” etc method, passing first the pointer and then the rest of the argument list. On asm.js this will look up in the tables directly; on WASM it’ll apply the index.

(etc)

It’s possible that emscripten will change the WASM mode to use a single array without the constant offset indirection. This will simplify lookups, and I think make it easier to add more functions at runtime.

Because you see, if you want to add a callback at runtime, like libffi’s closure API wants to, then you need to add another entry to that table. And in asm.js the table sizes are fixed for asm.js validation rules, and in WASM current mode the sub-tables are definitely fixed at compile time, since those constant offsets are used throughout.

So currently there’s an option you can use at build time to reserve room for runtime function pointers, I think I’ll have to use it, but that only reserves *fixed space* of a given number of pointers.

Next

Coming up next time: int64 and struct in the emscripten ABI, and does the closure API work as expected?

String concatenation garbage collection madness!

We got a report of a bug with the new 3D model viewing extension on Wikimedia Commons, where a particular file wasn’t rendering a thumbnail due to an out-of-memory condition. The file was kind-of big (73 MiB) but not super huge, and should have been well within the memory limits of the renderer process.

On investigation, it turned out to be a problem with how three.js’s STLLoader class was parsing the ASCII variant of the file format:

  • First, the file is loaded as a binary ArrayBuffer
  • Then, the buffer is checked to see whether it contains binary or text-format data
  • If it’s text, the entire buffer is converted to a string for further processing

That conversion step had code that looked roughly like this:

var str = '';
for (var i = 0; i < arr.length; i++) {
    str += String.fromCharCode(arr[i]);
}
return str;

Pretty straightforward code, right? Appends one character to the string until the input binary array is out, then returns it.

Well, JavaScript strings are actually immutable — the “+=” operator is just shorthand for “str = str + …”. This means that on every step through the new loop, we create two new strings: one for the character, and a second for the concatenation of the previous string with the new character.

The JavaScript virtual machine’s automatic garbage collection is supposed to magically de-allocate the intermediate strings once they’re no longer referenced (at some point after the next run through the loop) but for some reason this isn’t happening in Node.js. So when we run through this loop 70-some million times, we get a LOT of intermediate strings still in memory and eventually the VM just dies.

Remember this is before any of the 3d processing — we’re just copying bytes from a binary array to a string, and killing the VM with that. What!?

Newer versions of the STLLoader use a more efficient path through the browser’s TextDecoder API, which we can polyfill in node using Buffer, making it blazing fast and memory-efficient… this seems to fix the thumbnailing for this file in my local testing.

Just for fun though I thought, what would it take to get it working in Node or Chrome without the fancy native API helpers? Turns out you can significantly reduce the memory usage of this conversion just by switching the order of operations….

The original append code results in operations like: (((a + b) + c) + d) which increases the size of the left operand linearly as we go along.

If we instead do it like ((a + b) + (c + d)) we’ll increase _both_ sides more slowly, leading to much smaller intermediate strings clogging up the heap.

Something like this, with a sort of binary bisection thingy:

function do_clever(arr, start, end) {
    if (start === end) {
        return '';
    } else if (start + 1 === end) {
        return String.fromCharCode(arr[start]);
    } else {
        var mid = start + Math.floor((end - start) / 2);
        return do_clever(arr, start, mid) +
               do_clever(arr, mid, end);
    }
}

return do_clever(arr, 0, arr.length);

Compared to the naive linear append, I’m able to run through the 73 MiB file in Node, and it’s a bit faster too.

But it turns out there’s not much reason to use that code — most browsers have native TextDecoder (even faster) and Node can fake it with another native API, and those that don’t are Edge and IE, which have a special optimization for appending to strings.

Yes that’s right, Edge 16 and IE 11 actually handle the linear append case significantly faster than the clever version! It’s still not _fast_, with a noticeable delay of a couple seconds on IE especially, but it works.

So once the thumbnail fix goes live, that file should work both in the Node thumbnailer service *and* in browsers with native TextDecoder *and* in Edge and IE 11. Yay!

ogv.js 1.5.7 released

I’ve released ogv.js 1.5.7 with performance boosts for VP8/VP9 video (especially on IE 11) and Opus audio decoding, and a fix for audio/webm seeking.

npm: https://www.npmjs.com/package/ogv
zip: https://github.com/brion/ogv.js/releases/tag/1.5.7

emscripten versus IE 11: arithmetic optimization for ogv.js

ogv.js is a web video & audio playback engine for supporting the free & open Ogg and WebM formats in browsers that don’t support them natively, such as Safari, Edge, and Internet Explorer. We use it at Wikipedia and Wikimedia Commons, where we don’t currently allow the more common MP4 family of file formats due to patent concerns.

IE 11, that old nemesis, still isn’t quite gone, and it’s definitely the hardest to support. It’s several years old now, with all new improvements going only into Edge on Windows 10… so no WebAssembly, no asm.js optimizations, and in general it’s just kind of ….. vveerryy ssllooww compared to any more current browser.

But for ogv.js I still want to support it as much as possible. I found that for WebM videos using the VP8 or VP9 codecs, there was a *huge* slowdown in IE compared to other browsers, and wanted to see if I could pick off some low-hanging fruit to at least reduce the gap a bit and improve playback for devices right on the edge of running smoothly at low resolutions…

Profiling in IE is a bit tough since the dev tools often skew JS performance in weird directions… but always showed that a large bottleneck was the Math.imul polyfill.

Math.imul, on supporting browsers, is a native function that implements 32-bit integer multiplication correctly and very very quickly, including all the weird overflow conditions that can result from multiplying large numbers — this is used in the asm.js code produced by the emscripten compiler to make sure that multiplication is both fast and correct.

But on IE 11 it’s not present, so a replacement function (“polyfill”) is used by emscripten instead. This does several bit shifts, a couple multiplications, blah blah details, anyway even when the JIT compiler inlines the function, it’s slower than necessary.

I hacked together a quick test to search the generated asm.js code for calls to the minified reference to Math.imul, and replace them with direct multiplication… and found significant performance improvements!

I also found it broke some of the multiplications by using wrong order of operations though, so replaced it with a corrected transformation that instead of a regex on the code, uses a proper JS parser, walks the tree for call sites, and replaces them with direct multiplication… after some more confusion with my benchmarking, I confirmed that the updated code was still faster:

This is about a 15-20% improvement, plus or minus, which seems a pretty significant bump!

Of course more modern browsers like current versions of Safari and Edge will use the Web Assembly version of ogv.js anyway, and are several times faster…

 

2009 Workstation recovery – libvpx benchmark

I’m slowly upgrading my old 2009-era workstation into the modern world. It’s kind of a fun project! Although the CPUs are a bit long in the tooth, with 8 cores total it still can run some tasks faster than my more modern MacBook Pro with just 2 cores.

Enjoy some notes from benchmarking VP9 video encoding with libvpx and ffmpeg on the workstation versus the laptop…

(end)

Light field rendering: a quick diagram

Been reading about light field rendering for free movement around 3d scenes captured as images. Made some diagrams to help it make sense to me, as one does…
 
There’s some work going on to add viewing of 3d models to Wikipedia & Wikimedia Commons — which is pretty rad! — but geometric meshes are hard to home-scan, and don’t represent texture and lighting well, whereas light fields capture this stuff fantastically. Might be interesting some day to play with a light field scene viewer that provides parallax as you rotate your phone, or provides a 3d see-through window in VR views. The real question is whether the _scanning_ of scenes can be ‘democratized’ using a phone or tablet as a moving camera, combined with the spatial tracking that’s used for AR games to position the captures in 3d space…
 
Someday! No time for more than idle research right now. ;)
Here, enjoy the diagrams.

Brain dump: JavaScript sandboxing

Another thing I’ve been researching is safe, sandboxed embedding of user-created JavaScript widgets… my last attempt in this direction was the EmbedScript extension (examples currently down, but code is still around).

User-level problems to solve:

  • “Content”
    • Diagrams, graphs, and maps would be more fun and educational if you could manipulate them more
    • What if you could graph those equations on all those math & physics articles?
  • Interactive programming sandboxes
  • Customizations to editor & reading UI features
    • Gadgets, site JS, shared user JS are potentially dangerous right now, requiring either admin review or review-it-yourself
    • Narrower interfaces and APIs could allow for easier sharing of tools that don’t require full script access to the root UI
  • Make scriptable extensions safer
    • Use same techniques to isolate scripts used for existing video, graphs/maps, etc?
    • Frame-based tool embedding + data injection could make export of rich interactive stuff as easy as InstantCommons…

Low-level problems to solve

  • Isolating user-provided script from main web context
  • Isolating user-provided script from outside world
    • loading off-site resources is a security issue
    • want to ensure that wiki resources are self-contained and won’t break if off-site dependencies change or are unavailable
  • Providing a consistent execution environment
    • browsers shift and change over time…
  • Communicating between safe and sandboxed environments
    • injecting parameters in safely?
    • two-way comms for allowing privileged operations like navigating page?
    • two-way comms for gadget/extension-like behavior?
    • how to arrange things like fullscreen zoom?
  • Potential offline issues
    • offline cacheability in browser?
    • how to use in Wikipedia mobile apps?
  • Third-party site issues
    • making our scripts usable on third-party wikis like InstantCommons
    • making it easy for third-party wikis to use these techniques internally

Meta-level problems to solve

  • How & how much to review code before letting it loose?
  • What new problems do we create in misuse/abuse vectors?

Isolating user-provided scripts

One way to isolate user-provided scripts is to run them in an interpreter! This is potentially very slow, but allows for all kinds of extra tricks.

JS-Interpreter

I stumbled on JS-Interpreter, used sometimes with the Blockly project to step through code generated from visual blocks. JS-Interpreter implements a rough ES5 interpreter in native JS; it’s quite a bit slower than native (though some speedups are possible; the author and I have made some recent tweaks improving the interpreter loop) but is interesting because it allows single-stepping the interpreter, which opens up to a potential for an in-browser debugger. The project is under active development and could use a good regression test suite, if anyone wants to send some PRs. :)

The interpreter is also fairly small, weighing in around 24kb minified and gzipped.

The single-stepping interpreter design protects against infinite loops, as you can implement your own time limit around the step loop.

For pure-computation exercises and interactive prompts this might be really awesome, but the limited performance and lack of any built-in graphical display means it’s probably not great for hooking it up to an SVG to make it interactive. (Any APIs you add are your own responsibility, and security might be a concern for API design that does anything sensitive.)

Caja

An old project that’s still around is Google Caja, a heavyweight solution for embedding foreign HTML+JS using a server-side Java-based transpiler for the JS and JavaScript-side proxy objects that let you manipulate a subset of the DOM safely.

There are a number of security advisories in Caja’s history; some of them are transpiler failures which allow sandboxed code to directly access the raw DOM, others are failures in injected APIs that allow sandboxed code to directly access the raw DOM. Either way, it’s not something I’d want to inject directly into my main environment.

There’s no protection against loops or simple resource usage like exhausting memory.

Iframe isolation and CSP

I’ve looked at using cross-origin <iframe>s to isolate user code for some time, but was never quite happy with the results. Yes, the “same-origin policy” of HTML/JS means your code running in a cross-origin frame can’t touch your main site’s code or data, but that code is still able to load images, scripts, and other resources from other sites. That creates problems ranging from easy spamming to user information disclosure to simply breaking if required offsite resources change or disappear.

Content-Security-Policy to the rescue! Modern browsers can lock down things like network access using CSP directives on the iframe page.

CSP’s restrictions on loading resources still leaves an information disclosure in navigation — links or document.location can be used to navigate the frame to a URL on a third domain. This can be locked down with CSP’s childsrc param on the parent document — or an intermediate “double” iframe — to only allow the desired target domain (say, “*.wikipedia-embed.org” or even “item12345678.wikipedia-embed.org”). Then attempts to navigate the frame to a different domain from the inside are blocked.

So in principle we can have a rectangular region of the page with its own isolated HTML or SVG user interface, with its own isolated JavaScript running its own private DOM, with only the ability to access data and resources granted to it by being hosted on its private domain.

Further interactivity with the host page can be created by building on the postMessage API, including injecting additional resources or data sets. Note that postMessage is asynchronous, so you’re limited in simulating function calls to the host environment.

There is one big remaining security issue, which is that JS in an iframe can still block the UI for the whole page (or consume memory and other resources), either accidentally with an infinite loop or on purpose. The browser will eventually time out from a long loop and give you the chance to kill it, but it’s not pleasant (and might just be followed by another super-long loop!)

This means denial of service attacks against readers and editors are possible. “Autoplay” of unreviewed embedded widgets is still a bad idea for this reason.

Additionally, older browser versions don’t always support CSP — IE is a mess for instance. So defenses against cross-origin loads either need to somehow prevent loading in older browsers (poorer compatibility) or risk the information exposure (poorer security). However the most popular browsers do enforce it, so applications aren’t likely to be built that rely on off-site materials just to function, preventing which is one of our goals.

Worker isolation

There’s one more trick, just for fun, which is to run the isolated code in a Web Worker background thread. This would still allow resource consumption but would prevent infinite loops from blocking the parent page.

However you’re back to the interpreter’s problem of having no DOM or user interface, and must build a UI proxy of some kind.

Additionally, there are complications with running Workers in iframes, which is that if you apply sandbox=allow-scripts you may not be able to load JS into a Worker at all.

Non-JavaScript languages

Note that if you can run JavaScript, you can run just about anything thanks to emscripten. ;) A cross-compiled Lua interpreter weighs in around 150-180kb gzipped (depending on library inclusion).

Big chart

Here, have a big chart I made for reference:

Offline considerations

In principle the embedding sites can be offline-cached… bears consideration.

App considerations

The iframes could be loaded in a webview in apps, though consider the offline + app issues!

Data model

A widget (or whatever you call it) would have one or more sub resources, like a Gadget does today plus more:

  • HTML or SVG backing document
  • JS/CSS module(s), probably with a dependency-loading system
  • possibly registration for images and other resources?
    • depending on implementation it may be necessary to inject images as blobs or some weird thing
  • for non-content stuff, some kind of registry for menu/tab setup, trigger events, etc

Widgets likely should be instantiable with input parameters like templates and Lua modules are; this would be useful for things like reusing common code with different input data, like showing a physics demo with different constant values.

There should be a human-manageable UI for editing and testing these things. :) See jsfiddle etc for prior art.

How to build the iframe target site

Possibilities:

  • Subdomain per instance
    • actually serve out the target resources on a second domain, each ‘widget instance’ living in a separate random subdomain ideally for best isolation
    • base HTML or SVG can load even if no JS. Is that good or bad, if interactivity was the goal?
    • If browser has no CSP support, the base HTML/CSS/JS might violate constraints.
    • can right-click and open frame in new window
    • …but now you have another out of context view of data, with new URLs. Consider legal, copyright, fairuse, blah blah
    • have to maintain and run that second domain and hook it up to your main wiki
    • how to deal with per-instance data input? Pre-publish? postMessage just that in?
      • injecting data over postMessage maybe best for the InstantCommons-style scenario, since sites can use our scripts but inject data
    • probably easier debugging based on URLs
  • Subdomain per service provider, inject resources and instance data
    • Inject all HTML/SVG/JS/CSS at runtime via postMessage (trusting the parent site origin). Images/media could either be injected as blobs or whitelisted by URL.
    • The service provider could potentially be just a static HTML file served with certain strict CSP headers.
    • If injecting all resources, then could use a common provider for third-party wikis.
      • third-party wikis could host their own scripts using this technique using our frame broker. not sure if this is good idea or not!
    • No separate content files to host, nothing to take down in case of legal issues.
    • Downside: right-clicking a frame to open it in new window won’t give useful resources. Possible workarounds with providing a link-back in a location hash.
    • Script can check against a user-agent blacklist before offering to load stuff.
    • Downside: CSP header may need to be ‘loose’ to allow script injection, so could open you back up to XSS on parameters. But you’re not able to access outside the frame so pssssh!

Abuse and evil possibilities

Even with the security guarantees of origin restrictions and CSP, there are new and exciting threat models…

Simple denial of service is easy — looping scripts in an iframe can lock up the main UI thread for the tab (or whole browser, depending on the browser) until it eventually times out with an error. At which point it can potentially go right back into a loop. Or you can allocate tons of memory, slowing down and eventually perhaps crashing the browser. Even tiny programs can have huge performance impact, and it’s hard to predict what will be problematic. Thus script on a page could make it hard for other editors and admins to get back in to fix the page… For this reason I would  recommend against autoplay in Wikipedia articles of arbitrary unreviewed code.

There’s also possible trolling patterns: hide a shock image in a data set or inside a seemingly safe image file, then display it in a scriptable widget bypassing existing image review.

Advanced widgets could do all kinds of fun and educational things like run emulators for old computer and game systems. That brings with it the potential for copyright issues with the software being run, or for newer systems patent issues with the system being emulated.

For that matter you could run programs that are covered under software patents, such as decoding or encoding certain video file formats. I guess you could try that in Lua modules too, but JS would allow you to play or save result files to disk directly from the browser.

WP:BEANS may apply to further thoughts on this road, beware. ;)

Ideas from Jupyter: frontend/backend separation

Going back to Jupyter/IPython as an inspiration source; Jupyter has a separation between a frontend that takes interactive input and displays output, and a backend kernel which runs the actual computation server-side. To make for fancier interactive displays, the output can have a widget which runs some sort of JavaScript component in the frontend notebook page’s environment, and can interact with the user (via HTML controls), with other widgets (via declared linkages) and with the kernel code (via events).

We could use a model like this which distinguishes between trusted (or semi-trusted) frontend widget code which can do anything it can do in its iframe, but must be either pre-reviewed, or opted into. Frontend widgets that pass review should have well-understood behavior, good documentation, stable interfaces for injecting data, etc.

The frontend widget can and should still be origin-isolated & CSP-restricted for policy enforcement even if code is reviewed — defense in depth is important!

Such widgets could either be invoked from a template or lua module with a fixed data set, or could be connected to untrusted backend code running in an even more restricted sandbox.

The two main ‘more restricted sandbox’ possibilities are to run an interpreter that handles loops safely and applies resource limits, or to run in a worker thread that doesn’t block the main UI and can be terminated after a timeout…. but even that may be able to exhaust system resources via memory allocation.

I think it would be very interesting to extend Jupyter in two specific ways:

  • iframe-sandboxing the widget implementations to make loading foreign-defined widgets safer
  • implementing a client-side kernel that runs JS or Lua code in an interpreter, or JS in a sandboxed Worker, instead of maintaining a server connection to a Python/etc kernel

It might actually be interesting to adopt, or at least learn from, the communication & linkage model for the Jupyter widgets (which is backbone.js-based, I believe) and consider the possibilities for declarative linkage of widgets to create controllable diagrams/visualizations from common parts.

An interpreter-based Jupyter/IPython kernel that works with the notebooks model could be interesting for code examples on Wikipedia, Wikibooks etc. Math potential as well.

Short-term takeaways

  • Interpreters look useful in niche areas, but native JS in iframe+CSP probably main target for interactive things.
  • “Content widgets” imply new abuse vectors & thus review mechanisms. Consider short-term concentration on other areas of use:
    • sandboxing big JS libraries already used in things like Maps/Graphs/TimedMediaHandler that have to handle user-provided input
    • opt-in Gadget/user-script tools that can adapt to a “plugin”-like model
    • making those things invocable cross-wiki, including to third-party sites
  • Start a conversation about content widgets.
    • Consider starting with strict-review-required.
    • Get someone to make the next generation ‘Graphs’ or whatever cool tool as one of these instead of a raw MW extension…?
    • …slowly plan world domination.