The C looks like:
It looks a little oversimple with tight loops and function calls, but everything is inlined aggressively by the compiler and the inner loop unrolled. The asm.js looks something like:
(Note some long lines are cut off in the unrolled loop.)
The emscripten compiler can provide an emulated replacement for Math.imul when using the LEGACY_VM_SUPPORT option, but it’s very slow — a function call, two multiplications, some bit-shifts, and addition.
Unfortunately optimizing it further looks tricky without SIMD optimizations. The native builds of these libraries make aggressive use of SIMD (single-instruction-multiple-data) to apply these filtering steps to several pixels at once, and it makes a huge improvement to throughput.
There has been experimentation for some time in SIMD support for asm.js, which seems to be being dropped now in favor of moving it directly into WebAssembly. If/when this eventually arrives in Safari it’ll be a big improvement there — but IE 11 will never update, being frozen in time.