JavaScript engine internals part 5: conclusions

Thoughts from the last couple weeks of prototyping a JavaScript-like scripting language engine core…

Scripting inside WebAssembly

A JavaScript-like garbage-collected dynamically-typed scripting engine runtime can indeed be implemented in WebAssembly, with relatively sensible performance for ahead-of-time compiled code. (C++/emscripten-based prototype showing within 3-4x of native JS in node for a floating-point math test without too much micro-optimization yet.)

Garbage collection keepalive requires explicit handling of a GC-scanable stack. There’s no way to scan the native stack, as not everything spills from locals to the emscripten-managed stack in linear memory.
Tricks like NaN boxing work for squishing variant values into a 64-bit word, and can perform reasonably well.
C++/emscripten prototype engine works well enough, but the build is large using standard library (for hashmaps, vectors, strings, and some string formatting at present).
It may be worth investigating a dynamic linking system where the engine and the compiled code are separate .wasm files, but this may have performance implications on calls into the engine.

Sandboxing

Memory usage can be strictly limited at compile or instantiation time, but moderate DoS vectors exist in long loops and deep recursion (can mitigate with Worker thread isolation)
Sandboxing the whole runtime so you can swap out just the .wasm file with a user-provided engine would be a lot more work (avoiding emscripten’s JS-side wrappers and functions or reducing them to a bare safe interface implemented for all engines).
Using separate .wasm modules for the engine and the program would expose engine internals in linear memory to the program’s module. Be aware that this is not a secure boundary unless the compiler is trusted.

Areas for further work

Adding an actual ahead-of-time compiler instead of hand-translating JS to C++ source would be a lot more work.

Would have to decide what tools to use to parse JS source
What kind of lifecycle and toolchain to use (does this get compiled server-side? client-side?)
What to emit (C++ as input to emscripten, or direct to LLVM, or super direct to WebAssembly?)

An in-engine interpreter for a REPL loop would be a different sort of work, and might or might not be able to share code with compiler for parsing etc.

How would compiled scripts be debugged, other than by slipping in console.log() invocations and watching error messages?

Is it possible to emit source maps for use in browser developer tools so can step through watching the original source instead of Wasm disassembly? How workable is this with Wasm at this point?

Need to imagine some example plugin APIs that could be used for scenarios such as:

SVG diagram with some event handlers and animations attached to it
SVG diagram or Canvas implementing Logo-like turtle graphics with simple input commands
Text editor plugin that describes a simple search-and-replace dialog and implementation

Immediate plans

I’ll probably keep tinkering with this because it’s fun as heck to research how other scripting engines work and see how I can make it myself. ;) But it’s a looooong way from being usable, and I wouldn’t likely be able to push it all the way there any time soon myself.

Will probably prototype the “other side” of the plugin scenarios above and see if I can get more folks at Wikimedia excited about sandboxed plugins and widgets with secure, narrow, versionable interfaces. I need to think of a clever name for it! ;)

In the meantime …. other tasks beckon! TimedMediaHandler needs its fixes finished…