Two major updates to ogv.js Theora/Vorbis video/audio player in the last few weekends: an all-Flash decoder for older IE versions, and WebGL and Stage3D GPU acceleration for color conversion and drawing.
Try the demo and select ‘JS + WebGL‘, ‘Flash’, or even ‘Flash + GPU‘ from the drop-down! (You can also now try playback in the native video element or the old Cortado Java applet for comparison, though Cortado requires adding security exceptions if your browser works with Java at all these days. :P)
Flash / Crossbridge
The JS code output by emscripten requires some modern features like typed arrays, which aren’t available on IE 9 and older… but similar features exist in Flash’s ActionScript 3 virtual machine, and reasonably recent Flash plugins are often available on older IE installations.
A few interesting notes:
- Crossbridge compiler runs much slower than emscripten does, perhaps due to JVM startup costs in some backend tool. Running the configure scripts on the libraries is painfully slowwwwwwwwww! Luckily once they’re built they don’t have to be rebuilt often.
- There are only Mac and Windows builds of Crossbridge available; it may or may not be possible to build on Linux from source. I’ve only tested on Mac so far.
- Flash decoder performance is somewhere on par with Safari and usually a bit better than Internet Explorer’s JS.
- JS’s typed array & ArrayBuffer system doesn’t quite map to the Flash interfaces. There’s a ByteArray which is kind of like a Uint8Array plus a data stream interface, with methods to read/write values of other types into the current position and advance it. There are also Vector types, which have an interface more like the traditional untyped Array but can only contain items of the given type and are more efficiently packed in memory.
GPU acceleration: WebGL
Safari unfortunately doesn’t enable WebGL by default; it can be enabled as a developer option on Mac OS X but requires a device jailbreak on iOS.
However for IE 11, and the general fun of it, I suspected adding GPU acceleration might make sense! YCbCr to RGB conversion and drawing bytes to the canvas with putImageData() are both expensive, especially in IE. The GPU can perform the two operations together, and more importantly can massively parallelize the colorspace conversion.
I snagged the fragment shader from the Broadway H.264 player project’s WebGL accelerated canvas, and with a few tutorials, a little documentation, and a lot of trial and error I got accelerated drawing working in Firefox, Chrome, and Safari with WebGL manually enabled.
Then came time to run it on IE 11… unfortunately it turns out that single-channel luminance or alpha textures aren’t supported on IE 11’s WebGL — you must upload all textures as RGB or RGBA.
Copying data from 1-byte-per-pixel arrays to a 3- or 4-byte-per-pixel array and then uploading that turned out to be horribly slow, especially as IE’s typed array ‘set’ method and copy constructor seem to be hideously slow. It was clear this was not going to work.
I devised a method of uploading the 1-byte-per-pixel arrays as pseudo-RGBA textures of 1/4 their actual width, then unpacking the subpixels from the channels in the fragment shader.
The unpacking is done by adding two more textures: “stripe” textures at the luma and chroma resolutions, which for each pixel have a 100% brightness in the channel for the matching packed subpixel. For each output pixel, we sample both the packed Y, Cb, or Cr texture and the matching-size stripe texture, then multiple the vectors and sum the components to fold down only the relevant channel into a scalar value.
It feels kinda icky, but it seems to run just as fast at least on my good hardware, and works in IE 11 as well as the other browsers.
On Firefox with asm.js-optimized Theora decoding and WebGL-optimized drawing, I can actually watch 720p and some 1080p videos at full speed. Nice! (Of course, Firefox can already use its native decoder which is faster still…)
A few gotchas with WebGL:
- There’s a lot of boilerplate you have to do to get anything running; spend lots of time reading those tutorials that start with a 2d triangle or a rectangle until it makes sense!
- Once you’ve used a canvas element for 2d rendering, you can’t switch it to a WebGL canvas. It’ll just fail silently…
- Creating new textures is a lot slower than uploading new data to an existing texture of the same size.
- Error checking can be expensive because it halts the GPU pipeline; this significantly slows down the rendering in Chrome. Turn it off once code is working…
GPU acceleration: Flash / Stage3D
Once I had things working so nicely with WebGL, the Flash version started to feel left out — couldn’t it get some GPU love too?
Well luckily, Flash has an OpenGL ES-like API as well: Stage3D.
Unluckily, Stage3D and WebGL are gratuitously different in API signatures. Meh, I can work with that.
Really unluckily, Stage3D doesn’t include a runtime shader compiler.
You’re apparently expected to write shaders in a low-level assembly language, AGAL… and manually grab an “AGALMiniAssembler” class and stick it in your code to compile that into AGAL bytecode. What?
Luckily there’s also a glsl2agal converter, so I was able to avoid rewriting the shaders from the WebGL version in AGAL manually. Yay! This required some additional manual hoop-jumping to make sure variables mapped to the right registers, but the glsl2agal compiler makes the mappings available in its output so that wasn’t too bad.
Some gotchas with Stage3D:
- The AS3 documentation pages for Stage3D classes don’t show all the class members in Firefox. No, really, I had to read those pages in Chrome. WTF?
- Texture dimensions must be powers of 2, so I have to copy rows of bytes from the Crossbridge C heap into a temporary array of the right size before uploading to the GPU. Luckily copying byte arrays is much faster in Flash than in IE’s JS!
- As with the 2d BitmapData interface, Flash prefers BGR over RGB. Just had to flip the order of bytes in the stripe texture generation.
- Certain classes of errors will crash Flash and IE together. Nice!
- glsl2agal compiler forced my texture sampling to linear interpolation with wrapping; I had to do a string replace on the generated AGAL source to set it to nearest neighbor & clamped.
- There doesn’t appear to be a simple way to fix the Stage3D backing buffer size and scale it up to fit the window, at least in the modes I’m using it. I’m instead handling resizing by setting the backing buffer to the size of the stage, and just letting the texture render larger. This unfortunately uses nearest-neighbor for the scaling because I had to disable linear sampling to do the channel-packing trick.
- On my old Atom-based tablet, Stage3D drawing is actually slower than software conversion and rendering. D’oh! Other machines seem faster, and about on par with WebGL.
I’ll do a little more performance tweaking, but it’s starting to look like it’s time to clean it up and try integrating with our media playback tools on MediaWiki… Wheeeee!
This works both with Web Audio in Firefox, Chrome, and Safari and with the Flash audio shim for IE; basically we have to keep track of the audio playback position and match up decoding frames with that. Took a little poking in the ActionScript code but it’s now working!
This brings us much closer to being able to integrate ogv.js as a fallback video player for Wikimedia on IE 10/11 and Safari 6/7. Thanks to the guys hanging out in #xiph channel who encouraged me to keep poking at this! 😀
Additionally there’s now an override selector for the video size, so you can try decoding larger than 360p versions, or switch a slow machine down to the little 160p versions.
I’ve also started investigating an all-Flash version using Adobe’s Crossbridge, which if it works would be a suitable replacement for the Cortado Java applet on old browsers (think of all those IE 6/7/8/9 systems out there!). I seem to be able to build the ogg libraries but haven’t gone beyond that yet… will be interesting to poke at.
In addition to Internet Explorer 10/11 (via Maik’s Flash shim), I now have audio working on iOS — and smaller video sizes actually play pretty decently on a current iPhone 5s as well!
Try it out on your computer or phone!
Older iOS 7 devices and the last generation of iPod Touch are just too slow to play video reliably but still play audio just fine. The latest 64-bit CPU is pretty powerful though, and could probably handle slightly larger transcodes than 160p too.
Latest demo screencast of Android and iOS new Wikipedia mobile apps in development — now with login and basic editing! I think folks are really going to like what we’ve done once we get these polished up and out in the stores replacing our old mobile app.
If the on-wiki embedded video player doesn’t work for you, try a mirror on YouTube.
Thanks to Android 4.4’s built-in screen recording feature this didn’t require any custom hardware kit to record, but I had to jump through some hoops to edit it in PiTiVi… I’ll write up some notes on-wiki.
Mostly this means Internet Explorer and Safari — Chrome and Firefox handle the files natively. However Internet Explorer was limited by the lack of support for the Web Audio API, so could not play any sound. I’d hypothesized that a Flash shim could be used — Windows 8 ships with the Flash plugin by default and it’s widely installed on Windows 7 — but had no idea where to start.
Open source to the rescue!
One of the old maintainers of the Cortado applet, maikmerten, took an interest. After some brief fixes to get the build scripts working on Ubuntu, he scrounged up a simple ActionScript audio shim, with source and .swf output, and rigged up the ogv.js player to output audio through that if there was no native Web Audio API.
The ActionScript of the Flash shim is pretty straightforward, and it compiles into a nice, approx 1kb .swf file. Luckily, you can rebuild the .swf with the open-source Apache Flex SDK, so it doesn’t even rely on proprietary Flash Builder or anything. We could do with some further cleanup (for instance I don’t think we’re disposing of the Flash plugin when shutting down audio, but that’s easy to fix in a bit…) but the basics are in place. And of course getting proper audio/video sync will be complicated by the shim layer — the current code drives the clock based on the video and has choppy audio anyway, so there’s some ways to go before we reach that problem. 😉
It even works on Windows RT, the limited ARM version of Windows 8 — though the video decoding is much too slow on a first-gen Surface tablet’s Tegra 3 CPU, audio-only files play nicely.
Well, I did some more hacking on it this weekend:
- Color output? Check.
- Streaming input? Check. (But no seeking yet, and buffering is aggressive on some browsers.)
- Sound? Check. (But it’s not synced, choppy, and usually the wrong sample rate. And no IE or iOS support yet.)
- Pretty interface to select any file from Wikimedia Commons’ Media of the Day archive? Check.
- Laid some groundwork for separating the codec to a background Worker thread (not yet implemented)
- See the readme for more info.
Feel free to try building or hacking the source for fun.
Larger files run… veerrryyyy sslllooww on my test iPod Touches, but this certainly seems fast enough on desktop to one day replace our old Java fallback for Safari and newer IE…
How it works
Only a couple of tiny tweaks to the libraries are needed to make them build; I started with build scripts for just the audio codecs from this project, added in libtheora, and started adapting parts of one of the Theora data dump examples.
Finish up the YCbCr->RGB conversion, add audio decoding & output, and some kind of sync and seeking, and …… this could replace the old Java Cortado app we use as a fallback player on Wikipedia for browsers that don’t run WebM or Theora.
Web Workers could be used to push decoding to a background thread, depending on whether overhead is problematic.
Crazy idea: provide an HTML5 <video>-style DOM interface, integrate into TimedMediaHandler as a drop-in replacement
- Audio sync may be difficult to achieve.
- Audio output APIs — need to confirm what’s consistently available.
- Performance is surprisingly good on a desktop; I have no doubt this will be sufficient for playback in Safari and IE if audio & sync can be managed.
- Performance is not so good on my test iPod Touches; it might be fun to tune and optimize but I would expect to get much better results from a native app on iOS.
- There doesn’t seem to be a good universal way to do progressive data reads from an XMLHttpRequest; it may be necessary to buffer portions of the file by running multiple requests for subranges of the file, which is NOT pretty.
- IE 10 doesn’t support ArrayBuffer.slice(). Currently this prevents the demo from running, but it’s not actually needed.
The Ada Initiative is raising money for their programs supporting women in open source, open culture, and geekdom in general. They’ve reached about 70% of their fundraising goal… can you help them reach $100k by Saturday?
Like it or not, there are widespread issues with poor behavior, outright harassment, and creepy misogynistic tendencies in our beloved nerd communities:
- free/open source software
- science-fiction fandom
Having grown up, worked, or dabbled in all of those communities, I’m often saddened by the continued negative experiences that many women have had and continue to have. The issues that Ada Initiative deal with affect many men and other social subgroups of all sorts, too — LGBTQ folks, people with depression or other mental illness, etc — which matters to me because so many of my coworkers and friends fall into some of those categories.
I’d like to see us move away from glorifying douchebaggery in all forms, and towards respectful participation for all! Care to help?
Dear Lazyweb: anyone know a device that can decimate an HDMI 1080p60 signal to 1080p30 or 1080i60 – OR record HDMI directly at 1080p60 for a reasonable price?
I’ve got a Thunderbolt-connected HDMI recorder I use for demos and screencasts of Wikipedia on smartphones and tablets… but some of the newer devices output at 1080p60, which my low-end capture box doesn’t grok.
The devices I’m seeing that can record 1080p60 are ~$1k and/or are PCIe cards which isn’t useful in a laptop-dominated workplace.
Update 2013-08-30: I’m getting the impression that there’s actually a combination of two problems: things wanting to default to 1080p60 and HDCP encryption — which apparently is turned on by default for the HDMI outputs on the Nexus 4 and 10. I found a widget that allegedly strips the HDCP, but I’m still not sure if it’s trying to pump 1080p through there in which case it’s still not working. I’m picking up an EDID sniffer/emulator which should be able to detect the EDID from the capture box (which should say ‘no 1080p60′) and put that in front of the HDCP stripper…. we’ll see if that works. Sigh.