Rust error handling with Result and Option (WebAssembly ABI)

In our last adventure we looked at C++ exceptions in WebAssembly with the emscripten compiler. Now we’re taking a look at the main error handling system for another language targeting WebAssembly, Rust.

Rust has a “panic”/”unwind” system similar to C++ exceptions, but it’s generally recommended against catching panics. It also currently doesn’t work in WebAssembly output, where Rust doesn’t use emscripten’s tricks of calling out to JavaScript and is waiting for proper exception integration to arrive in Wasm.

Instead, most user-level Rust code expresses fallible operations using the Result or Option enum types. An Option<T> can hold either some value of type T — Some(T) — or no value — None. A Result<T, E> is fancier and can hold either a data payload Ok(T) or a Err(E) holding some information about the error condition. The E type can be a string or an enum carrying a description of the error, or it can just be an empty () unit type if you’re not picky.

Rust’s “?” operator is used as a shortcut for checking and propagating these error values.

For instance a function to store to memory (which might fail if the address is invalid or non-writable) you could write:

    // Looking up a memory page for writing could fail,
    // if the page is unmapped or read-only.
    pub fn page_for_write(&mut self, addr: GuestPtr)
    -> Result<&mut PageData, Exception> {
       // ... returns either Ok(something)
       // or Err(Exception::PageFault)
    }

    pub fn store_8(&mut self, addr: GuestPtr, val: u8)
    -> Result<(), Exception> {

        // First some address calculations which cannot fail.
        let index = addr_to_index(addr);
        let page = addr_to_page(addr);

        let mem = self.page_for_write(page)?;

        // The "?" operator checked for an Err(_) condition
        // and if so, propagated it to the caller. If it was
        // Ok(_) then we continue on with that value.
        mem.store_8(index, val);

        Ok(())
    }

This looks reasonably clean in the source — you have some ?s sprinkled about but they indicate that a branch may occur, which is good to help reason about the performance. This makes things more predictable when you look at it — addition of a destructor in C++ might make a fast path suddenly slow with emscripten’s C++/JavaScript hybrid exceptions, while the worst that happens here is a visible flag check which is not so bad, right?

Well, mostly. The beautiful, yet horrible thing about Rust is that automatic optimization lets you create these wonderful clean conceptual APIs that compile down to fast code. The horrible part is that you sometimes don’t know how or why it’s doing what, and there can still be surprises!

The good news is that there’s never any JavaScript call-outs in your hot paths, unlike emscripten’s C++ exception catching. Yay! But the bad news is that in WebAssembly, returning a structure like a Result<T,E> or an Option<T> that doesn’t resolve into a single word can be complicated. And knowing what resolves is complicated. And sometimes things get packed into integers and I don’t even know what’s doing that.

Crash course on enum layout

Rust “enums” can be much more fancy than C/C++ “enums”, with the ability not only to carry a discriminator of which enumerated value they carry, but to carry structure-like data payloads within them.

For Option<T>, this means you have a 1-bit discriminant between Some(_) or None and then the storage payload of the T type. If T is itself a non-zero or non-nullable type, or an enum with free space (a “niche”) then it can be optimized into a single word — for instance an Option<&u8> takes only a single pointer word of space because null references are not possible, while Option<*const u8> would take two words. When we use Result<(), E> for a return value on fallible operations that don’t need to return data, like the store_8 function above, we get that same benefit.

When you can return a single word, life is simple. At the low-level JIT implementation it’s probably passed in a register, and everything is awesome and fast.

However we have two words on data-bearing Result<T,E>s, and WebAssembly can’t return two words at once from a function. Instead, most of the time, the function is transformed to not use the return value, and instead send the address of a stack location to write the structure to memory, which means two memory stores and at least one memory read to check the discriminant value. Plus you have to manipulate the stack pointer, which you might not have needed otherwise on some functions.

This is not great, but is probably better than calling out to JavaScript and back. I haven’t yet benchmarked tight loops with this.

Packed enums in single words

I’ve also sometimes seen Result enums get packed into a single word, and I don’t yet understand the circumstances that this happens.

On a WebAssembly build, I’ve had it happen with Result<u16, E> where E is a small fieldless enum: it’s packed into a single 32-bit integer. But Result<u8, E> is sent through the stack, and Result<u32, E> is too, though it could have been packed into a single 64-bit integer.

On native x86-64 builds on macOS, I see the word packing for both Result<u16, E> and Result<u32, E>… meanwhile Result<u64, E> uses the stack, but Result<u8, E> uses two words returned in different registers.

I know the internals of Rust layout and calling conventions are officially undocumented and can change, but it’d be nice to know roughly what some of them are. ;)

Further information on Rust layout and ABI