AudioFeeder updates for ogv.js

I’ve taken a break from the blog for too long! Time to update some on current work. We’re doing a final push on the video.js-based frontend media player for MediaWiki’s TimedMediaHandler, with some new user interface bits, better mobile support, and laying the foundation for smoother streaming in the future.

Among other things, I’m doing some cleanup on the AudioFeeder component in the ogv.js codec shim, which is still used in Safari on iOS devices and older Macs.

This abstracts a digital sound output channel with an append-only buffer, which can be stopped/started, the volume changed, and the current playback position queried.

When I was starting on this work in 2014 or so, Internet Explorer 11 was supported so I needed a Flash backend for IE, and a Web Audio backend for Safari… at the time, the only way to create the endless virtual buffer in Web Audio was using a ScriptProcessorNode, which ran its data-manipulation callback on the main thread. This required a fairly large buffer size for each callback to ensure the browser’s audio thread had data available if the main thread was hung up on drawing a video frame or something.

Fast forward to 2022: IE 11 and Flash are EOL and I’ve been able to drop them from our support matrix. Safari and other browsers still support ScriptProcessorNode, but it’s been officially deprecated for a while in favor of AudioWorklets.

I’ve been meaning to look into upgrading with an AudioWorklet backend but hadn’t had need; however I’m seeing some performance problems with the current code on Safari Technology Preview, especially on my slower 2015 MacBook Pro which doesn’t grok VP9 in hardware so needs the shim. :) Figured it’s worth taking a day or two to see if I can avoid a perf regression on old Macs when the next Safari update comes out.

So first — what’s a worklet? This is an interface that’s being adopted by a few web bits (I think some CSS animation bits are using these too) to have a fairly structured way of loading little scripts into a dedicated worker thread (the worklet) to do specific things off-main-thread that are performance critical (audio, layout, animation).

An AudioWorkletNode hooks into the Web Audio graph, giving something similar to a ScriptProcessorNode but where the processing callback runs in the worklet, on an AudioWorkletProcessor subclass. The processor object has audio-specific stuff like the media time at the start of the buffer, and is given input/output channels to read/write.

For an ever-growing output, we use 0 inputs and 1 output; ideally I can support multichannel audio as well, which I never bothered to do in the old code (for simplicity it downmixed everything to stereo). Because the worklet processors run on a dedicated thread, the data comes in small chunks — by default something like 128 samples — whereas I’d been using like 8192-sample buffers on the main thread! This allows you to have low latency, if you prefer it over a comfy buffer.

Communicating with worker threads traditionally sucks in JavaScript — unless you opt into the new secure stuff for shared memory buffers you have to send asynchronous messages; however those messages can include structured data so it’s easy to send Float32Arrays full of samples around.

The AudioWorkletNode on the main thread gets its own MessagePort, which connects to a fellow MessagePort on the AudioWorkletProcessor in the audio thread, and you can post JS objects back and forth, using the standard “structured clone” algorithm for stripping out local state.

I haven’t quite got it running yet but I think I’m close. ;) On node creation, an initial set of queued buffers are sent in with the setup parameters. When audio starts playing, after the first callback copies out its data it posts some state info back to the main thread, with the audio chunk’s timestamp and the number of samples output so far.

The main thread’s AudioFeeder abstraction can then pair those up to report what timestamp within the data feed is being played now, with compensation for any surprises (like the audio thread missing a callback itself, or a buffer underrun from the main thread causing a delay).

When stopping, instead of just removing the node from the audio graph, I’ve got the main thread sending down a message that notifies the worklet code that it can safely stop, and asking for any remaining data back. This is important if we’ve maintained a healthy buffer of decoded audio; in case we continue playback from the same position, we can pass the buffers back into the new worklet node.

I kinda like the interface now that I’m digging in it. Should work… Either tonight or tomorrow hope to sort that out and get ogv.js updated in-tree again in TMH.

Canvas, Web Audio, MediaStream oh my!

I’ve often wished that for ogv.js I could send my raw video and audio output directly to a “real” <video> element for rendering instead of drawing on a <canvas> and playing sound separately to a Web Audio context.

In particular, things I want:

  • Not having to convert YUV to RGB myself
  • Not having to replicate the behavior of a <video> element’s sizing!
  • The warm fuzzy feeling of semantic correctness
  • Making use of browser extensions like control buttons for an active video element
  • Being able to use browser extensions like sending output to ChromeCast or AirPlay
  • Disabling screen dimming/lock during playback

This last is especially important for videos of non-trivial length, especially on phones which often have very aggressive screen dimming timeouts.

Well, in some browsers (Chrome and Firefox) now you can do at least some of this. :)

I’ve done a quick experiment using the <canvas> element’s captureStream() method to capture the video output — plus a capture node on the Web Audio graph — combining the two separate streams into a single MediaStream, and then piping that into a <video> for playback. Still have to do YUV to RGB conversion myself, but final output goes into an honest-to-gosh <video> element.

To my great pleasure it works! Though in Firefox I have some flickering that may be a bug, I’ll have to track it down.

Some issues:

  • Flickering on Firefox. Might just be my GPU, might be something else.
  • The <video> doesn’t have insight to things like duration, seeking, etc, so can’t rely on native controls or API of the <video> alone acting like a native <video> with a file source.
  • Pretty sure there are inefficiencies. Have not tested performance or checked if there’s double YUV->RGB->YUV->RGB going on.

Of course, Chrome and Firefox are the browsers I don’t need ogv.js for for Wikipedia’s current usage, since they play WebM and Ogg natively already. But if Safari and Edge adopt the necessary interfaces and WebRTC-related infrastructure for MediaStreams, it might become possible to use Safari’s full screen view, AirPlay mirroring, and picture-in-picture with ogv.js-driven playback of Ogg, WebM, and potentially other custom or legacy or niche formats.

Unfortunately I can’t test whether casting to a ChromeCast works in Chrome as I’m traveling and don’t have one handy just now. Hoping to find out soon! :D

HDTV and the video look

I spent some time last night playing with my parents’ shiny new HDTV, which puts my 2005-vintage 26″ set to shame.

Pretty nice set; 40-something inch, 1080p, 120 Hz whatchamahooie, and you can plug in a USB stick full of JPEGs and force your family to watch your vacation photos. Nice!

It seems to be all the rage on new sets to have motion interpolation which can take 24-frame-sourced content (feature films and most US drama and sitcom TV shows) and smooth out the frame-to-frame motion, making it look more like 60-field video. Lots of higher-end sets advertise 120 Hz or even 240 Hz, which honestly seems excessive to me — the human eye can’t distinguish much more than 60 frames per second. :)

I’m a bit torn; on the one hand, the faster frame rate makes motion look much more vivid and realistic from any objective point of view. On the other hand, audiences have been trained over the last few decades to associate the video look with “cheesier” programming — soaps, reality shows, etc — while “serious” programs are shot on film at 24fps, making them feel more like a big-budget feature film… even to the point that lots of money was spent developing HD video cameras that could shoot at the slower, less realistic 24fps instead of HD’s native 60!

We stumbled into Harold and Kumar escape from Guantanamo of all things on HBO, and ran it for a while just to get a feel for the set. At first it drove me nuts seeing a movie I’d already seen on film looking distinctly like HD video, but after a half hour I got quite used to it and rather grew to like it. Of course as a former cinema-television student I’m extra-sensitized to this stuff — my wife immediately took to the more vivid display and commented on how much better it looked than when we’d seen it in the theater!

Looks like the mass audiences are happy to embrace high-motion video… I wonder if the long-standing holdover of the “film look” over the last decade was driven more by the oversensitized film geeks in the industry than any actual audience comparison…

Let’s learn a lesson here with our software development as well — those of us who’ve been nose-deep in web sites and software UI for years aren’t necessarily the most qualified to tell what our actual users are going to be most comfortable with.