October 2019 – brionv

// Looking up a memory page for writing could fail, // if the page is unmapped or read-only. pub fn page_for_write(&mut self, addr: GuestPtr) -> Result<&mut PageData, Exception> { // ... returns either Ok(something) // or Err(Exception::PageFault) } pub fn store_8(&mut self, addr: GuestPtr, val: u8) -> Result<(), Exception> { // First some address calculations which cannot fail. let index = addr_to_index(addr); let page = addr_to_page(addr); let mem = self.page_for_write(page)?; // The "?" operator checked for an Err(_) condition // and if so, propagated it to the caller. If it was // Ok(_) then we continue on with that value. mem.store_8(index, val); Ok(()) }

Crash course on enum layout

Rust “enums” can be much more fancy than C/C++ “enums”, with the ability not only to carry a discriminator of which enumerated value they carry, but to carry structure-like data payloads within them.

For Option<T>, this means you have a 1-bit discriminant between Some(_) or None and then the storage payload of the T type. If T is itself a non-zero or non-nullable type, or an enum with free space (a “niche”) then it can be optimized into a single word — for instance an Option<&u8> takes only a single pointer word of space because null references are not possible, while Option<*const u8> would take two words. When we use Result<(), E> for a return value on fallible operations that don’t need to return data, like the store_8 function above, we get that same benefit.

When you can return a single word, life is simple. At the low-level JIT implementation it’s probably passed in a register, and everything is awesome and fast.

However we have two words on data-bearing Result<T,E>s, and WebAssembly can’t return two words at once from a function. Instead, most of the time, the function is transformed to not use the return value, and instead send the address of a stack location to write the structure to memory, which means two memory stores and at least one memory read to check the discriminant value. Plus you have to manipulate the stack pointer, which you might not have needed otherwise on some functions.

This is not great, but is probably better than calling out to JavaScript and back. I haven’t yet benchmarked tight loops with this.

Packed enums in single words

I’ve also sometimes seen Result enums get packed into a single word, and I don’t yet understand the circumstances that this happens.

On a WebAssembly build, I’ve had it happen with Result<u16, E> where E is a small fieldless enum: it’s packed into a single 32-bit integer. But Result<u8, E> is sent through the stack, and Result<u32, E> is too, though it could have been packed into a single 64-bit integer.

On native x86-64 builds on macOS, I see the word packing for both Result<u16, E> and Result<u32, E>… meanwhile Result<u64, E> uses the stack, but Result<u8, E> uses two words returned in different registers.

I know the internals of Rust layout and calling conventions are officially undocumented and can change, but it’d be nice to know roughly what some of them are. ;)

One of my “fun” projects I’m poking at on the side is an x86-64 emulator for the web using WebAssembly (not yet online, nowhere near working). I’ve done partial spike implementations in C++ and Rust with a handful of interpreted opcodes and emulation of a couple Linux syscalls; this is just enough for a hand-written “Hello, world!” executable. ;)

One of the key questions is how to handle exceptional conditions within the emulator, such as an illegal instruction, page fault, etc… There are multiple potentially fallible operations within the execution of a single opcode, so the method chosen to communicate those exceptions could make a big impact on performance.

For C++, one method which looks nice in the source code and can be good on native platforms is to use C++’s normal exception facility. This turns out to be somewhat problematic in emscripten builds for WebAssembly, which is explained by looking at its implementation.

In fact it’s considered enough of a performance problem that catching exceptions is disabled by default in emscripten — throwing will abort the program with no way to catch it except at the surrounding JavaScript host page level. To get exceptions to actually work, you must pass -s DISABLE_EXCEPTION_CATCHING=0 to the compiler.

Throwing is easy

Let’s write ourselves a potentially-throwing function:

static volatile int should_explode = 0;

void kaboom() {
    if (should_explode) {
        throw "kaboom";
    }
}

Pretty exciting! The volatile int is just to make sure the optimizer doesn’t do anything clever for us, as real world functions don’t _always_ or _never_ throw, they _sometimes_ throw. At -O1 this compiles to WebAssembly something like this:

 (func $kaboom\28\29 (; 26 ;) (type $11)
  (local $0 i32)
  (if
   (i32.load
    (i32.const 13152)
   )
   (block
    (i32.store
     (local.tee $0
      (call $__cxa_allocate_exception
       (i32.const 4)
      )
     )
     (i32.const 1024)
    )
    (call $__cxa_throw
     (local.get $0)
     (i32.const 12432)
     (i32.const 0)
    )
    (unreachable)
   )
  )
 )

The function $__cxa_throw calls into emscripten’s JavaScript-side runtime to throw a native JS exception.

That’s right, WebAssembly doesn’t have a way to throw — or catch — an exception itself. Yet. They’re working on it, but it’s not done yet.

Calling best case

So now we want to call that function. In the best case scenario, there’s a function that doesn’t have any cleanup work to do after a potentially throwing call: no try/catch clauses, no destructors, no stack allocations to clean up.

void dostuff_nocatch() {
    printf("Doing nocatch stuff\n");
    kaboom();
    printf("Possibly unreachable stuff\n");
}

This compiles to nice, simple, fast code with a direct call to kaboom:

 (func $dostuff_nocatch\28\29 (; 32 ;) (type $11)
  (drop
   (call $puts
    (i32.const 1106)
   )
  )
  (call $kaboom\28\29)
  (drop
   (call $puts
    (i32.const 1126)
   )
  )
 )

If kaboom throws, the JavaScript + WebAssembly runtime unwinds the native call stack up through dostuff_nocatch until whatever surrounding code (if any) catches it. The second printf/puts call never gets made, as expected.

Catching gets clever

But if you have variables with destructors, as with the RAII pattern, or a try/catch block, things get weird fast.

void dostuff_catch() {
    try {
        printf("Doing ex stuff\n");
        kaboom();
    } catch (const char *e) {
        printf("Caught stuff\n");
    }
}

compiles to the much longer:

 (func $dostuff_catch\28\29 (; 27 ;) (type $11)
  (local $0 i32)
  (drop
   (call $puts
    (i32.const 1078)
   )
  )
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  (call $invoke_v
   (i32.const 1)
  )
  (local.set $0
   (i32.load
    (i32.const 14204)
   )
  )
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  (block $label$1
   (if
    (i32.eq
     (local.get $0)
     (i32.const 1)
    )
    (block
     (local.set $0
      (call $__cxa_find_matching_catch_3
       (i32.const 12432)
      )
     )
     (br_if $label$1
      (i32.ne
       (call $getTempRet0)
       (call $llvm_eh_typeid_for
        (i32.const 12432)
       )
      )
     )
     (drop
      (call $__cxa_begin_catch
       (local.get $0)
      )
     )
     (drop
      (call $puts
       (i32.const 1093)
      )
     )
     (call $__cxa_end_catch)
    )
   )
   (return)
  )
  (call $__resumeException
   (local.get $0)
  )
  (unreachable)
 )

A few things to note.

Here the direct call to kaboom is replaced with a call to the indirected invoke_v function:

  (call $invoke_v
   (i32.const 1)
  )

which is implemented in the JS runtime to wrap a try/catch around the call:

function invoke_v(index) {
  var sp = stackSave();
  try {
    dynCall_v(index);
  } catch(e) {
    stackRestore(sp);
    if (e !== e+0 && e !== 'longjmp') throw e;
    _setThrew(1, 0);
  }
}

Note this saves the stack position (in Wasm linear memory, for stack-allocated data — separate from the native/JS/Wasm call stack) and restores it, which allows functions that allocate stack data to participate in the faster call mode by not having to clean up their own stack pointer modifications.

If an exception is caught, a global flag is set which is checked and re-cleared after the call:

  ;; Load the threw flag into a local variable
  (local.set $0
   (i32.load
    (i32.const 14204)
   )
  )
  ;; Clear the threw flag for the next call
  (i32.store
   (i32.const 14204)
   (i32.const 0)
  )
  ;; Branch based on the stored state
  (block $label$1
   (if
    (i32.eq
     (local.get $0)
     (i32.const 1)
    )
    ....
   )
  )

If the flag wasn’t set after all, then the function continues on its merry way. If it was set, then it does its cleanup and either eats the exception (for catch) or re-throws it (for automatic cleanup like destructors).

Summary and surprises

So there’s several things that’ll slow down your code using C++ exception catching in emscripten, even if you never throw in the hot paths:

when catching, potentially throwing calls are indirected through JavaScript
destructors create implicit catch blocks, so you may be catching more than you think

There’s also some surprises:

in C++, every extern “C” function is potentially throwing!
“noexcept” functions that call potentially throwing functions add implicit catch blocks in order to call abort(), so they can guarantee that they do not throw exceptions themselves

So don’t just add “noexcept” to code willy-nilly or it may get worse.

It would be faster to use my own state flag and check it after every call (which could be hidden behind a template or a macro or something, probably, at least in part), since this would avoid calling out through JavaScript and indirecting all the calls.

Next I’ll look at another way in the Rust version, using Result<T, E> return types and the error propagation “?” shortcut in Rust…

Month: October 2019

Rust error handling with Result and Option (WebAssembly ABI)