Video decoding in the JavaScript platform: “ogv.js, how U work??”

We’ve started deploying my ogv.js JavaScript video/audio playback engine to Wikipedia and Wikimedia Commons for better media compatibility in Safari, Internet Explorer and the new Microsoft Edge browser.

“It’s an older codec, but it checks out. I was about to let them through.”

This first generation uses the Ogg Theora video codec, which we started using on Wikipedia “back in the day” before WebM and MP4/H.264 started fighting it out for dominance of HTML5 video. In fact, Ogg Theora/Vorbis were originally proposed as the baseline standard codecs for HTML5 video and audio elements, but Apple and Microsoft refused to implement it and the standard ended up dropping a baseline requirement altogether.

Ah, standards. There’s so many to choose from!

I’ve got preliminary support for WebM in ogv.js; it needs more work but the real blocker is performance. WebM’s VP8 and newer VP9 video codecs provide much better quality/compression ratios, but require more CPU horsepower to decode than Theora… On a fast MacBook Pro, Safari can play back ‘Llama Drama’ in 1080p Theora but only hits 480p in VP8.

Llama drama in Theora 1080p

That’s about a 5x performance gap in terms of how many pixels we can push… For now, the performance boost from using an older codec is worth it, as it gets older computers and 64-bit mobile devices into the game.

But it also means that to match quality, we have to double the bitrate — and thus bandwidth – of Theora output versus VP8 at the same resolution. So in the longer term, it’d be nice to get VP8 — or the newer VP9 which halves bitrate again — working well enough on ogv.js.

emscripten: making ur C into JS

ogv.js’s player logic is handwritten JavaScript, but the guts of the demuxer and decoders are cross-compiled from well-supported, battle-tested C libraries.

Emscripten is a magical tool developed at Mozilla to help port large C/C++ codebases like games to the web platform. In short, it runs your C/C++ code through the well-known clang compiler, but instead of producing native code it uses a custom LLVM backend that produces JavaScript code that can run in any modern browser or node.js.

Awesome town. But what are the limitations and pain points?

Integer math

Readers with suitably arcane knowledge may be aware that JavaScript has only one numeric type: 64-bit double-precision floating-point.

This is “convenient” for classic scripting in that you don’t have to worry about picking the right numeric type, but it has several horrible, horrible consequences:

  1. When you really wanted 32-bit integers, floating-point math is going to be much slower
  2. When you really wanted 64-bit integers, floating-point math is going to lose precision if your numbers are too big… so you have to emulate with 32-bit integers
  3. If you relied on the specific behavior of 32-bit integer multiplication, you may have to use a slow polyfill of Math.imul

Luckily, because of #1 above, JavaScript JIT compilers have gone to some trouble to optimize common integer math operations. That is, JavaScript engines do support integer types and integer math, just you don’t know for sure when you have an integer at the source level.

Did I say “luckily”? :P

So this leads to one more ugly consequence:

  1. In order to force the JIT compiler to run integer math, emscripten output coerces types constantly — “(x|0)” to force to 32-bit int, or “+x” to force to 64-bit float.

This actually performs well once it’s through the JIT compiler, but it bloats the .js code that we have to ship to the browser.

The heap is an island

Emscripten provides a C-like memory model by using Typed Arrays: a single ArrayBuffer provides a heap that can be read/written directly as various integer and floating point types.

However…

Because all pointers are indexes into the heap, there’s no way for C code to reference data in an external ArrayBuffer or other structure. This is obviously an issue when your video codec needs to decode a data packet that’s been passed to it from JavaScript!

Currently I’m simply copying the input packets into emscripten’s heap in a wrapper function, then calling the decoder on the copy. This works, but the extra copy makes me sad. It’s also relatively slow in Internet Explorer, where the copy implementation using Uint8Array.set() seems to be pretty inefficient.

Getting data out can be done “zero-copy” if you’re careful, by creating a typed-array subview of the emscripten heap; this can be used for instance to upload a WebGL texture directly from the decoder. Neat!

But, that trick doesn’t work when you need to pass data between worker threads.

Workers of the JavaScript world, unite!

Parallel computing is now: these days just about everything from your high-end desktop to your low-end smartphone has at least two CPU cores and can perform multiple tasks in parallel.

Unfortunately, despite half a century of computer science research and a good decade of marketplace factors, doing parallel programming well is still a Hard Problem.

Regular JavaScript provides direct access to only a single thread of execution, which keeps things simple but can be a performance bottleneck. Browser makers introduced Web Workers to fill this gap without introducing the full complexities of shared-memory multithreading…

Essentially, each Worker is its own little JavaScript universe: the main thread context can’t access data in a Worker directly, and the Worker can’t access data from the main context. Neither can one thread cause the other to block… So to communicate between threads, you have to send asynchronous messages.

This is actually a really nice model that reduces the number of ways you can shoot yourself in the foot with multithreading!

But, it maps very poorly to C/C++ threads, where you start with shared memory and foot-shooting and try to build better abstractions on top of that.

So, we’re not yet able to make use of any multithreaded capabilities in the actual decoders. :(

But, we can run the decoders themselves in Worker threads, as long as they’re factored into separate emscripten subprograms. This keeps the main thread humming smoothly even when video decoding is a significant portion of wall-clock time, and can provide a little bit of actual parallelism by running video and audio decoding at the same time.

The Theora and VP8 decoders currently have no inherent multithreading available, but VP9 can so that’s worth looking out for in the future…

Some browser makers are working on providing an “opt-in” shared-memory threading model through an extended ‘SharedArrayBuffer’ that emscripten can make use of, but this is not yet available in any of my target browsers (Safari, IE, Edge).

Waiting for SIMD

Modern CPUs provide SIMD instructions (“Single Instruction Multiple Data”) which can really optimize multimedia operations where you need to do the same thing a lot of times on parallel data.

Codec libraries like libtheora and libvpx use these optimized instructions explicitly in key performance hotspots when compiling to native code… but how do you deal with this when compiling via JavaScript?

There is ongoing work in emscripten and by at least some browser vendors to expose common SIMD operations to JavaScript; I should be able to write suitable patches to libtheora and libvpx to use the appropriate C intrinsics and see if this helps.

But, my main targets (Safari, IE, Edge) don’t support SIMD in JS yet so I haven’t started…

GPU Madness

The obvious next thing to ask is “Hey what about the GPU?” Modern computers come with amazing high-throughput parallel-processing graphics units, and it’s become quite the rage to GPU accelerate everything from graphics to spreadsheets.

The good news is that current versions of all main browsers support WebGL, and ogv.js uses it if available to accelerate drawing and YCbCr-RGB colorspace conversion.

The bad news is that’s all we use it for so far — the actual video decoding is all on the CPU.

It should be possible to use the GPU for at least parts of the video decoding steps. But, it’s going to require jumping through some hoops…

  • WebGL doesn’t provide general-purpose compute shaders, so would have to shovel data into textures and squish computation into fragment shaders meant for processing pixels.
  • WebGL is only available on the main thread, so if decoding is done in a worker there’ll be additional overhead shipping data between threads
  • If we have to read data back from the GPU, that can be slow and block the CPU, dropping efficiency again
  • The codec libraries aren’t really set up with good GPU offloading points in them, so this may be Hard To Do.

libvpx at least has a fork with some OpenCL and RenderScript support — it’s worth investigating. But no idea if this is really feasible in WebGL.

 

In the meantime, I’ve got lots of other things to fix in Wikipedia’s video support so will be concentrating on that stuff, but will keep on improving this as the JS platform evolves!

ogv.js soft launch on Wikipedia and Wikimedia Commons

Soft launch of ogv.js on Wikipedia and Wikimedia Commons has begun! This initial deployment covers the desktop view only, so iPhones and iPads won’t get the media player yet in mobile view.

ogv.js provides a JavaScript compatibility shim for Ogg audio and video playback in Safari 6.1 and higher, IE 10/11, and Microsoft Edge browsers, which gets Wikipedia’s media files working in those browsers. (Due to patent licensing concerns, we don’t provide files in the common MP3 or MP4 H.264/AAC formats, and this has made it difficult to use media files reliably across browsers as Apple and Microsoft have not adopted the free Ogg or WebM formats.)

See list of pending fixes for additional improvements that should go out next week, after which I’ll make wider announcements.

Here, have some samples! In Firefox, Chrome, or Opera these will “just work” with native WebM playback, while in Safari/IE/Edge they will “just work” with JavaScript Ogg playback:

Curiosity’s Seven Minutes of Terror

 

Caminandes – Gran Dillama

 

Using Web Worker threading in ogv.js for smoother playback

I’ve been cleaning up MediaWiki’s “TimedMediaHandler” extension in preparation for merging integration with my ogv.js JavaScript media player for Safari and IE/Edge when no native WebM or Ogg playback is available. It’s coming along well, but one thing continued to worry me about performance: when things worked it was great, but if the video decode was too slow, it could make your browser very sluggish.

In addition to simply selecting too high a resolution for your CPU, this can strike if you accidentally open a debug console during playback (Safari, IE) or more seriously if you’re on an obscure platform that doesn’t have a JavaScript JIT compiler… or using an app’s embedded browser on your iPhone.

Or simply if the integration code’s automatic benchmark overestimated your browser speed, running too much decoding just made everything crap.

Luckily, the HTML5 web platform has a solution — Web Workers.

Workers provide a limited ability to do multithreading in JavaScript, which means the decoder thread can take as long as it wants — the main browser thread can keep responding to user input in the meantime.

The limitation is that scripts running in a Worker have no direct access to your code or data running in the web page’s main thread — you can communicate only by sending messages with ‘raw’ data types. Folks working with lots of DOM browser nodes thus can’t get much benefit, but for buffer-driven compute tasks like media decoding it’s perfect!

Threading comms overhead

My first attempt was to take the existing decoder class (an emscripten module containing the Ogg demuxer, Theora video decoder, and Vorbis audio decoder, etc) and run it in the worker thread, with a proxy object sending data and updated object properties back and forth.

This required a little refactoring to make the decoder interfaces asynchronous, taking callbacks instead of returning results immediately.

It worked pretty well, but there was a lot of overhead due to the demuxer requiring frequent back-and-forth calls — after every processing churn, we had to wait for the demuxer to return its updated status to us on the main thread.

This only took a fraction of a millisecond each time, but a bunch of those add up when your per-frame budget is 1/30 (or even 1/60) second!

 

I had been intending a bigger refactor of the code anyway to use separate emscripten modules for the demuxer and audio/video decoders — this means you don’t have to load code you won’t need, like the Opus audio decoder when you’re only playing Vorbis files.

It also means I could change the coupling, keeping the demuxer on the main thread and moving just the audio/video decoders to workers.

This gives me full speed going back-and-forth on the demuxer, while the decoders can switch to a more “streaming” behavior, sending packets down to be decoded and then displaying the frames or queueing the audio whenever it comes back, without having to wait on it for the next processing iteration.

The result is pretty awesome — in particular on older Windows machines, IE 11 has to use the Flash plugin to do audio and I was previously seeing a lot of “stuttery” behavior when the video decode blocked the Flash audio queueing or vice versa… now it’s much smoother.

The main bug left in the worker mode is that my audio/video sync handling code doesn’t properly handle the case where video decoding is consistently too slow — when we were on the main thread, this caused the audio to halt due to the main thread being blocked; now the audio just keeps on going and the video keeps playing as fast as it can and never catches up. :)

However this should be easy to fix, and having it be wrong but NOT FREEZING YOUR BROWSER is an improvement over having sync but FREEZING YOUR BROWSER. :)

WebGL performance tricks on MS IE and Edge

One of my pet projects is ogv.js, a video/audio decoder and player in JavaScript powered by codec libraries ported from C with Mozilla’s emscripten transpiler. I’m getting pretty close to a 1.0 release and deploying it to Wikimedia Commons to provide plugin-free Ogg (and experimentally WebM) playback on Apple’s Safari and Microsoft’s Internet Explorer and Edge browsers (the only major browsers lacking built-in WebM video support).

In cleaning it up for release, I’ve noticed some performance regressions on IE and Edge due to cleaning out old code I thought was no longer needed.

In particular, I found that drawing and YUV-RGB colorspace conversion using WebGL, which works fantastically fast in Safari, Chrome, and Firefox, was about as slow as on-CPU JavaScript color conversion in IE 11 and Edge — luckily I had a hack in store that works around the bottleneck.

It turns out that uploading single-channel textures as LUMINANCE or ALPHA formats is vveerryy ssllooww in IE 11 update 1 and Edge, compared to uploading the exact same data blob as an RGBA texture…

ogvjs-win10-faster

As it turns out, I had had some older code to stuff things into RGBA textures and then unpack them in a shader for IE 10 and the original release of IE 11, which did not support LUMINANCE or ALPHA texture uploads! I had removed this code to simplify my WebGL code paths since LUMINANCE got added in IE 11 update 1, but hadn’t originally noticed the performance regression.

Unfortunately this adds a user-agent sniff to my ogv.js code… I prefer to use the LUMINANCE textures directly on other browsers where they perform well, because the textures can be scaled more cleanly in the case of source files with non-square pixels.

Wikimedia video community editing tools & infrastructure status

There were a fair number of folks interested in video chatting at Wikimania! A few quick updates:

An experimental ‘Schnittserver’ (‘Clip server’) project has been in the works for a while with some funding from ze Germans; currently sitting at http://wikimedia.meltvideo.com/ (uses OAuth, has a temporary SSL cert, UI is very primitive!) It is currently usable already for converting MP4 etc source footage to WebM!

The Schnittserver can also do server-side rendering of projects using the ‘melt’ format such as those created with Kdenlive and Shotcut — this allows uploading your original footage (usually in some sort of MP4/H.264 flavor) and sharing the editing project via WebM proxy clips, without generational loss on the final rendering.

Once rendered, your final WebM output can be published up to Commons.

I would love to see some more support for this project, including adding a better web front-end for managing projects/clips and even editing…

Mozilla has an in-browser media editor thing called Popcorn.js; they’re unfortunately reducing investment in the project, but there’s some talk among people working on it and on our end that Wikimedia might be interested in helping adapt it to work with the Schnittserver or some future replacement for it.

Unfortunately I missed the session with the person working on Popcorn.js, will have to catch up later on it!

I’m very close to what I consider a 1.0 release of ogv.js, my JavaScript shim to play Ogg (and experimentally WebM) video and audio in Safari and MS IE/Edge without plugins.

Recently fixed some major sound sync bugs on slower devices, and am finishing up controls which will be used in the mobile view (when not using the full TimedMediaHandler / MwEmbedPlayer interface which we still have on the desktop).

Demo of playback at https://brionv.com/misc/ogv.js/demo2/

A slightly older version of ogv.js is also running on https://ogvjs-testing.wmflabs.org/ with integration into TimedMediaHandler; I’ll update those patches with my 1.0 release next week or so.

Infrastructure issues:

I had a talk with Faidon about video requirements on the low-level infrastructure layer; there are some things we need to work on before we really push video:

– seeking/streaming a file with Range subsets causes requests to bypass the Varnish cache layer, potentially causing huge performance problems if there’s a usage spike!

– very large files can’t be sharded cleanly over multiple servers, which makes for further performance bottlenecks on popular files again

– VERY large files (>4G or so) can’t be stored at all; which is a problem for high-quality uploads of things like long Wikimania talks!

For derivative transcodes, we can bypass some of these problems by chunking the output into multiple files of limited length and rigging up ‘gapless playback’, as can be done for HLS or MPEG-DASH-style live streaming. I’m pretty sure I can work out how to do this in the ogv.js player (for Safari and IE) as well as in the native <video> element playback for Chrome and Firefox via Media Source Extensions. Assuming it works with the standard DASH profile for WebM, this is something we can easily make work on Android as well using Google’s ExoPlayer.

DASH playback will also make it easier to use adaptive source switching to handle limited bandwidth or CPU resources.

However we still need to be able to deal with source files which may be potentially quite large…

List and phab projects!

As a reminder there’s a wikivideo-l list: https://lists.wikimedia.org/mailman/listinfo/wikivideo-l

and a Wikimedia-Video project tag in phabricator: https://phabricator.wikimedia.org/tag/wikimedia-video/

Folks who are interested in pushing further work on video, please feel free to join up. There’s a lot of potential awesomeness!

MediaWiki audio/video support updates

Mirror of mailing list post on wikitech-l and related lists

I’ve been passing the last few days feverishly working on audio/video stuff, cause it’s been driving me nuts that it’s not quite in working shape.

TL;DR: Major fixes in the works for Android, Safari (iOS and Mac), and IE/Edge (Windows). Need testers and patch reviewers.

ogv.js for Safari/IE/Edge

In recent versions of Safari, Internet Explorer, and Microsoft’s upcoming Edge browser, there’s still no default Ogg or WebM support but JavaScript has gotten fast enough to run an Ogg Theora/Vorbis decoder with CPU to spare for drawing and outputting sound in real time.

The ogv.js decoder/player has been one of my fun projects for some time, and I think I’m finally happy with my TimedMediaHandler/MwEmbedPlayer integration patch for the desktop MediaWiki interface.

I’ll want to update it to work with Video.js later, but I’d love to get this version reviewed and deployed in the meantime.

Please head over to https://ogvjs-testing.wmflabs.org/ in Safari 6.1+ or IE 10+ (or ‘Project Spartan’ on Windows 10 preview) and try it out! Particularly interested in cases where it doesn’t work or messes up.

Non-JavaScript fallback for iOS

I’ve found that Safari on iOS supports QuickTime movies with Motion-JPEG video and mu-law PCM audio. JPEG and PCM are, as it happens, old and not so much patented. \o/

As such this should work as a fallback for basic audio and video on older iPhones and iPads that can’t run ogv.js well, or in web views in apps that use Apple’s older web embedding APIs where JavaScript is slow (for example, Chrome for iOS).

However these get really bad compression ratios, so to keep bandwidth down similar to the 360p Ogg and WebM versions I had to reduce quality and resolution significantly. Hold an iPhone at arm’s length and it’s maybe ok, but zoom full-screen on your iPad and you’ll hate the giant blurry pixels!

This should also provide a working basic audio/video experience in our Wikipedia iOS app, until such time as we integrate Ogg or WebM decoding natively into the app.

Note that it seems tricky to bulk-run new transcodes on old files with TimedMediaHandler. I assume there’s a convenient way to do it that I just haven’t found in the extension maint scripts…

In progress: mobile video fixes

Audio has worked on Android for a while — the .ogg files show up in native <audio> elements and Just Work.

But video has been often broken, with TimedMediaHandler’s “popup transforms” reducing most video embeds into a thumbnail and a link to the original file — which might play if WebM (not if Ogg Theora) but it might also be a 1080p original which you don’t want to pull down on 3G! And neither audio nor video has worked on iOS.

This patch adds a simple mobile target for TMH, which fixes the popup transforms to look better and actually work by loading up an embedded-size player with the appropriately playable transcodes (WebM, Ogg, and the MJPEG last-ditch fallback).

ogv.js is used if available and necessary, for instance in iOS Safari when the CPU is fast enough. (Known to work only on 64-bit models.)

Future: codec.js and WebM and OGVKit

For the future, I’m also working on extending ogv.js to support WebM for better quality (especially in high-motion scenes) — once that stabilizes I’ll rename the combined package codec.js. Performance of WebM is not yet good enough to deploy, and some features like seeking are still missing, but breaking out the codec modules means I can develop the codecs in parallel and keep the high-level player logic in common.

Browser infrastructure improvements like SIMD, threading, and more GPU access should continue to make WebM decoding faster in the future as well.

I’d also like to finish up my OGVKit package for iOS, so we can embed a basic audio/video player at full quality into the Wikipedia iOS app. This needs some more cleanup work still.

Phew! Ok that’s about it.

im in ur javascript, decoding ur webm

Been a while since I posted; working on various bits and bobs.

Most exciting at the moment: initial work on pure-JavaScript WebM playback support for ogv.js (soon to become codec.js!)

This is currently very incomplete; video doesn’t come out right and audio’s not hooked up yet, and most notably you only get a few frames before it craps out. :)

Update: Got audio working after some sleep, and playback is more thorough. Yay! Still no seeking yet, and playback may not always reach completion.

I’ve seen a couple other attempts to build WebM decoders in JS using emscripten (eg Route9), but nobody’s gotten them hooked up to a full-blown player with audio yet. With a little more work on the decoder side, I should be able to leverage all of my investment in ogv.js’s player logic.

webm-in-progresslive preview of current experimental build

How’s it work?

I’m using nestegg for the WebM container parsing; currently this is giving me some impedence mismatch with the first-generation ogv.js code due to different i/o models.

ogv.js’s C-side Ogg container parsing was set up with entirely asynchronous i/o due to the needs of using async XHR to fetch data over the network. Nestegg, however, uses synchronous i/o callbacks in some places, and I really don’t want to jump through all the hoops to do WebM cue seeking without using the library. (Already did that with Ogg and it was horrid!)

I may be able to adapt the ogv.js C-side code to using synchronous i/o callbacks and let them get mapped to async i/o on the JS side, thanks to some fancy features in emscripten — either Asyncify or the newer emterpreter mode for parts of the high-level code while keeping the low-level decoder in pure asm.js. Need to do a little more research… :)

Possibilities

Performance: speed on WebM/VP8 decoding won’t be as good as the Ogg/Theora decoder for now so may not be suitable for mobile devices, but should work fine on many laptop/desktop machines.

VP9: untested so far, but should work… though again, likely to be still slower than VP8.

Threading: currently threading is disabled in the build. There’s work ongoing at Mozilla to support pthreads for emscripten programs, using a new SharedArrayBuffer interface to share memory between Web Worker threads. Might experiment with that later as libvpx has some threading options (may only be useful for VP9 multi-slice decoding for now) but it’ll be a while before most browsers support it. However this might be a good use case for asking Microsoft and Apple to add support. :)

SIMD: there’s also ongoing work on SIMD instruction abstraction in JavaScript, with Mozilla and Microsoft partnering. Will have to test with autovectorization in the compiler enabled and see if it makes any difference to decode speed, but don’t know how complete the support in Firefox and Edge is etc.

ogv.js MediaWiki integration updates

Over the last few weekends I’ve continued to poke at ogv.js, both the core library and the experimental MediaWiki integration. It’s getting pretty close to merge-ready!

Recent improvements to ogv.js player (gerrit changeset):

  • Audio no longer super-choppy in background tabs
  • ‘ended’ is no longer unreasonably delayed
  • various code cleanup
  • ogvjs-version.js with build timestamp available for use as a cache-buster helper

Fixes to the MediaWiki TimedMediaHandler desktop player integration (gerrit changeset):

  • Post-playback behavior is now the same as when using native playback
  • Various code cleanup

Fixes to the MediaWiki MobileFrontend mobile player integration (gerrit changeset):

  • Autoplay now working with native playback in Chrome and Firefox
  • Updated to work with current MobileFrontend (internal API changes)
  • Mobile media overlay now directly inherits from the MobileFrontend photo overlay class instead of duplicating it
  • Slow-CPU check is now applied on mobile player — this gets ogv.js video at 160p working on an old iPhone 4S running iOS 7! Fast A7-based iPhones/iPads still get 360p.

While we’re at it, Microsoft is opening up a public ‘suggestion box’ for Internet Explorer — folks might want to put in their votes for native Ogg Vorbis/Theora and WebM playback.

Testing ogv.js in MediaWiki

After many weekends of development on the ogv.js JavaScript Ogg Vorbis/Theora player I’ve started work on embedding it as a player into MediaWiki’s TimedMediaHandler extension.

The JavaScript version is functional in Safari and IE 10/11, though there’s some work yet to be done… See a live wiki at ogvjs-testing.wmflabs.org or the in-progress patch set.

Screen Shot 2014-07-13 at 8.43.34 PM

 

Meanwhile, stay tuned during the week for some demos of the soon-to-be-majorly-updated Wikipedia iOS app!