Rust error handling with Result and Option (WebAssembly ABI)

In our last adventure we looked at C++ exceptions in WebAssembly with the emscripten compiler. Now we’re taking a look at the main error handling system for another language targeting WebAssembly, Rust.

Rust has a “panic”/”unwind” system similar to C++ exceptions, but it’s generally recommended against catching panics. It also currently doesn’t work in WebAssembly output, where Rust doesn’t use emscripten’s tricks of calling out to JavaScript and is waiting for proper exception integration to arrive in Wasm.

Instead, most user-level Rust code expresses fallible operations using the Result or Option enum types. An Option<T> can hold either some value of type T — Some(T) — or no value — None. A Result<T, E> is fancier and can hold either a data payload Ok(T) or a Err(E) holding some information about the error condition. The E type can be a string or an enum carrying a description of the error, or it can just be an empty () unit type if you’re not picky.

Rust’s “?” operator is used as a shortcut for checking and propagating these error values.

For instance a function to store to memory (which might fail if the address is invalid or non-writable) you could write:

    // Looking up a memory page for writing could fail,
    // if the page is unmapped or read-only.
    pub fn page_for_write(&mut self, addr: GuestPtr)
    -> Result<&mut PageData, Exception> {
       // ... returns either Ok(something)
       // or Err(Exception::PageFault)
    }

    pub fn store_8(&mut self, addr: GuestPtr, val: u8)
    -> Result<(), Exception> {

        // First some address calculations which cannot fail.
        let index = addr_to_index(addr);
        let page = addr_to_page(addr);

        let mem = self.page_for_write(page)?;

        // The "?" operator checked for an Err(_) condition
        // and if so, propagated it to the caller. If it was
        // Ok(_) then we continue on with that value.
        mem.store_8(index, val);

        Ok(())
    }

This looks reasonably clean in the source — you have some ?s sprinkled about but they indicate that a branch may occur, which is good to help reason about the performance. This makes things more predictable when you look at it — addition of a destructor in C++ might make a fast path suddenly slow with emscripten’s C++/JavaScript hybrid exceptions, while the worst that happens here is a visible flag check which is not so bad, right?

Well, mostly. The beautiful, yet horrible thing about Rust is that automatic optimization lets you create these wonderful clean conceptual APIs that compile down to fast code. The horrible part is that you sometimes don’t know how or why it’s doing what, and there can still be surprises!

The good news is that there’s never any JavaScript call-outs in your hot paths, unlike emscripten’s C++ exception catching. Yay! But the bad news is that in WebAssembly, returning a structure like a Result<T,E> or an Option<T> that doesn’t resolve into a single word can be complicated. And knowing what resolves is complicated. And sometimes things get packed into integers and I don’t even know what’s doing that.

Crash course on enum layout

Rust “enums” can be much more fancy than C/C++ “enums”, with the ability not only to carry a discriminator of which enumerated value they carry, but to carry structure-like data payloads within them.

For Option<T>, this means you have a 1-bit discriminant between Some(_) or None and then the storage payload of the T type. If T is itself a non-zero or non-nullable type, or an enum with free space (a “niche”) then it can be optimized into a single word — for instance an Option<&u8> takes only a single pointer word of space because null references are not possible, while Option<*const u8> would take two words. When we use Result<(), E> for a return value on fallible operations that don’t need to return data, like the store_8 function above, we get that same benefit.

When you can return a single word, life is simple. At the low-level JIT implementation it’s probably passed in a register, and everything is awesome and fast.

However we have two words on data-bearing Result<T,E>s, and WebAssembly can’t return two words at once from a function. Instead, most of the time, the function is transformed to not use the return value, and instead send the address of a stack location to write the structure to memory, which means two memory stores and at least one memory read to check the discriminant value. Plus you have to manipulate the stack pointer, which you might not have needed otherwise on some functions.

This is not great, but is probably better than calling out to JavaScript and back. I haven’t yet benchmarked tight loops with this.

Packed enums in single words

I’ve also sometimes seen Result enums get packed into a single word, and I don’t yet understand the circumstances that this happens.

On a WebAssembly build, I’ve had it happen with Result<u16, E> where E is a small fieldless enum: it’s packed into a single 32-bit integer. But Result<u8, E> is sent through the stack, and Result<u32, E> is too, though it could have been packed into a single 64-bit integer.

On native x86-64 builds on macOS, I see the word packing for both Result<u16, E> and Result<u32, E>… meanwhile Result<u64, E> uses the stack, but Result<u8, E> uses two words returned in different registers.

I know the internals of Rust layout and calling conventions are officially undocumented and can change, but it’d be nice to know roughly what some of them are. ;)

Further information on Rust layout and ABI

Exception handling in emscripten: how it works and why it’s disabled by default

One of my “fun” projects I’m poking at on the side is an x86-64 emulator for the web using WebAssembly (not yet online, nowhere near working). I’ve done partial spike implementations in C++ and Rust with a handful of interpreted opcodes and emulation of a couple Linux syscalls; this is just enough for a hand-written “Hello, world!” executable. ;)

One of the key questions is how to handle exceptional conditions within the emulator, such as an illegal instruction, page fault, etc… There are multiple potentially fallible operations within the execution of a single opcode, so the method chosen to communicate those exceptions could make a big impact on performance.

For C++, one method which looks nice in the source code and can be good on native platforms is to use C++’s normal exception facility. This turns out to be somewhat problematic in emscripten builds for WebAssembly, which is explained by looking at its implementation.

In fact it’s considered enough of a performance problem that catching exceptions is disabled by default in emscripten — throwing will abort the program with no way to catch it except at the surrounding JavaScript host page level. To get exceptions to actually work, you must pass -s DISABLE_EXCEPTION_CATCHING=0 to the compiler.

Throwing is easy

Let’s write ourselves a potentially-throwing function:

static volatile int should_explode = 0;

void kaboom() {
    if (should_explode) {
        throw "kaboom";
    }
}

Pretty exciting! The volatile int is just to make sure the optimizer doesn’t do anything clever for us, as real world functions don’t _always_ or _never_ throw, they _sometimes_ throw. At -O1 this compiles to WebAssembly something like this:

 (func $kaboom\28\29 (; 26 ;) (type $11)
  (local $0 i32)
  (if
   (i32.load
    (i32.const 13152)
   )
   (block
    (i32.store
     (local.tee $0
      (call $__cxa_allocate_exception
       (i32.const 4)
      )
     )
     (i32.const 1024)
    )
    (call $__cxa_throw
     (local.get $0)
     (i32.const 12432)
     (i32.const 0)
    )
    (unreachable)
   )
  )
 )

The function $__cxa_throw calls into emscripten’s JavaScript-side runtime to throw a native JS exception.

That’s right, WebAssembly doesn’t have a way to throw — or catch — an exception itself. Yet. They’re working on it, but it’s not done yet.

Calling best case

So now we want to call that function. In the best case scenario, there’s a function that doesn’t have any cleanup work to do after a potentially throwing call: no try/catch clauses, no destructors, no stack allocations to clean up.

void dostuff_nocatch() {
    printf("Doing nocatch stuff\n");
    kaboom();
    printf("Possibly unreachable stuff\n");
}

This compiles to nice, simple, fast code with a direct call to kaboom:

 (func $dostuff_nocatch\28\29 (; 32 ;) (type $11)
  (drop
   (call $puts
    (i32.const 1106)
   )
  )
  (call $kaboom\28\29)
  (drop
   (call $puts
    (i32.const 1126)
   )
  )
 )

If kaboom throws, the JavaScript + WebAssembly runtime unwinds the native call stack up through dostuff_nocatch until whatever surrounding code (if any) catches it. The second printf/puts call never gets made, as expected.

Catching gets clever

But if you have variables with destructors, as with the RAII pattern, or a try/catch block, things get weird fast.

void dostuff_catch() {
    try {
        printf("Doing ex stuff\n");
        kaboom();
    } catch (const char *e) {
        printf("Caught stuff\n");
    }
}

compiles to the much longer:

 (func $dostuff_catch\28\29 (; 27 ;) (type $11)
  (local $0 i32)
  (drop
   (call $puts
    (i32.const 1078)
   )
  )
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  (call $invoke_v
   (i32.const 1)
  )
  (local.set $0
   (i32.load
    (i32.const 14204)
   )
  )
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  (block $label$1
   (if
    (i32.eq
     (local.get $0)
     (i32.const 1)
    )
    (block
     (local.set $0
      (call $__cxa_find_matching_catch_3
       (i32.const 12432)
      )
     )
     (br_if $label$1
      (i32.ne
       (call $getTempRet0)
       (call $llvm_eh_typeid_for
        (i32.const 12432)
       )
      )
     )
     (drop
      (call $__cxa_begin_catch
       (local.get $0)
      )
     )
     (drop
      (call $puts
       (i32.const 1093)
      )
     )
     (call $__cxa_end_catch)
    )
   )
   (return)
  )
  (call $__resumeException
   (local.get $0)
  )
  (unreachable)
 )

A few things to note.

Here the direct call to kaboom is replaced with a call to the indirected invoke_v function:

  (call $invoke_v
   (i32.const 1)
  )

which is implemented in the JS runtime to wrap a try/catch around the call:

function invoke_v(index) {
  var sp = stackSave();
  try {
    dynCall_v(index);
  } catch(e) {
    stackRestore(sp);
    if (e !== e+0 && e !== 'longjmp') throw e;
    _setThrew(1, 0);
  }
}

Note this saves the stack position (in Wasm linear memory, for stack-allocated data — separate from the native/JS/Wasm call stack) and restores it, which allows functions that allocate stack data to participate in the faster call mode by not having to clean up their own stack pointer modifications.

If an exception is caught, a global flag is set which is checked and re-cleared after the call:

  ;; Load the threw flag into a local variable
  (local.set $0
   (i32.load
    (i32.const 14204)
   )
  )
  ;; Clear the threw flag for the next call
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  ;; Branch based on the stored state
  (block $label$1
   (if
    (i32.eq
     (local.get $0)
     (i32.const 1)
    )
    ....
   )
  )

If the flag wasn’t set after all, then the function continues on its merry way. If it was set, then it does its cleanup and either eats the exception (for catch) or re-throws it (for automatic cleanup like destructors).

Summary and surprises

So there’s several things that’ll slow down your code using C++ exception catching in emscripten, even if you never throw in the hot paths:

  • when catching, potentially throwing calls are indirected through JavaScript
  • destructors create implicit catch blocks, so you may be catching more than you think

There’s also some surprises:

  • in C++, every extern “C” function is potentially throwing!
  • “noexcept” functions that call potentially throwing functions add implicit catch blocks in order to call abort(), so they can guarantee that they do not throw exceptions themselves

So don’t just add “noexcept” to code willy-nilly or it may get worse.

It would be faster to use my own state flag and check it after every call (which could be hidden behind a template or a macro or something, probably, at least in part), since this would avoid calling out through JavaScript and indirecting all the calls.

Next I’ll look at another way in the Rust version, using Result<T, E> return types and the error propagation “?” shortcut in Rust…