JavaScript engine internals part 2: tagged pointers

In my last post I explored NaN-boxing in JavaScript engines, showing how Firefox and Webkit manage to squish multiple types, including pointers and floating point, into a single 64-bit word.

In my WebAssembly-based JavaScript-like engine experiment I initially implemented NaN boxing similarly to how Firefox/SpiderMonkey do it. It works nicely, but has some downsides:

  • Extra operations needed to unmask a pointer when dealing with lots of objects
  • In-memory values which are pointers have to be manually masked when peeking in the debugger. Kind of a pain.
  • There’s no equivalent aliasing system available in the proposal for native garbage-collected types for WebAssembly.

Inspired by the GC proposal, I’ve tried switching from the 64-bit NaN boxing to native pointer-size (32-bit for Wasm32) words with a minimal tagging system to store integers, which is equivalent to what is available there.

Tagged pointers

So what’s a pointer anyway? It’s a number that points (ha!) to a location in memory where you store stuff. This has a very important consequence:

Because stuff has to be laid out in memory in a way it can be efficiently accessed, structs and classes containing pointers are themselves always located on a pointer-size-aligned boundary. That is, if pointers are 32 bits wide (4 bytes), the structs have to be at a memory location that’s evenly divisible by 4, or it won’t be efficiently accessible.

This means that every pointer to every garbage-collected object is a number divisible by 4… which means its bottom 2 bits are always 0 in binary representation.

Thus, if your value has the bottom bit set, it’s definitely not a pointer to an object. Knowing that, we can divide the 32-bit number space into 32-bit object pointers on the one hand, and on the other hand either 1 tag bit with a 31-bit payload, or 2 tag bits with a 30-bit payload.

The Wasm GC proposal includes an “int31ref” type which can be stored in an “anyref” slot alongside other GC’d reference types, which is meant to use such a scheme.

Implementing tags

In the Val wrapper class in my engine, I’ve adopted this system now and it’s got some trade-offs, but I kinda like it.

The tag check is now a single bit-mask against 1 rather than a giant constant like 0xffff000000000000, which feels cleaner. Integers can be promoted back to 32 bits by right-shifting one bit, while pointers can be consumed as-is once the test is confirmed (or if you know for sure they’re safe).

The biggest downside is that there’s simply not room in 31 bits for very large integers (abs value > 1 billion-ish) or for floating point numbers (64 bits, ay ay!) so they must be boxed on the heap.

Boxing

“Boxing” is the fun task of taking a nice atomic value type, and wrapping it in a giant heap-allocated object. Horribly inefficient if you use lot of them!

But, it can be automatically done in the accessors, and seems to work so far.

I would prefer to avoid using boxing for floats, but there’s just no room in the int space for them unless I do something crazy like using two tag bits and then storing a 24-bit or 16-bit float when a value can be represented that way… yikes! And you’d still have to box values that are out of range or would lose precision.

For the other JS types — undefined, null, boolean, etc — I’m using boxed types as well, but using a single well-known instance value. So all “undefined” values refer to the same Box<Undefined>* and all “booleans” refer to one of two Box<bool>*s, one for true and one for false. This avoids runtime allocation for those special types, while allowing them to be treated as objects for dynamic dispatch or making the type checks cheap pointer equality comparisons for static dispatch.