ogv.js 1.4.0 released

ogv.js 1.4.0 is now released, with a .zip build or via npm. Will try to push it to Wikimedia next week or so.

Live demo available as always.

New A/V sync

Main improvement is much smoother performance on slower machines, mainly from changing the A/V sync method to prioritize audio smoothness, based on recommendations I’d received from video engineers at conferences that choppy audio is noticed by users much more strongly than choppy or out of sync video.

Previously, when ogv.js playback detected that video was getting behind audio, it would halt audio until the video caught up. This played all audio, and showed all frames, but could be very choppy if performance wasn’t good (such as in Internet Explorer 11 on an old PC!)

The new sync method instead keeps audio rock-solid, and allows video to get behind a little… if the video catches back up within a few frames, chances are the user won’t even notice. If it stays behind, we look ahead for the next keyframe… when the audio reaches that point, any remaining late frames are dropped. Suddenly we find ourselves back in sync, usually with not a single discontinuity in the audio track.


The HTMLMediaElement API supports a fastSeek() method which is supposed to seek to the nearest keyframe before the request time, thus getting back to playback faster than a precise seek via setting the currentTime property.

Previously this was stubbed out with a slow precise seek; now it is actually fast. This enables a much better “scrubbing” experience given a suitable control widget, as can be seen in the demo by grabbing the progress thumb and moving it around the bar.

VP9 playback

WebM videos using the newer, more advanced VP9 codec can use a lot less bandwidth than VP8 or Theora videos, making it attractive for streaming uses. A VP9 decoder is now included for WebM, initially supporting profile 0 only (other profiles may or may not explode) — that means 8-bit, 4:2:0 subsampling.

Other subsampling formats will be supported in future, can probably eventually figure out something to do with 10-bit, but don’t expect those to perform well. 🙂

The VP9 decoder is moderately slower than the VP8 decoder for equivalent files.

Note that WebM is still slightly experimental; the next version of ogv.js will make further improvements and enable it by default.


Firefox and Chrome have recently shipped support for code modules in the WebAssembly format, which provides a more efficient binary encoding for cross-compiled code than JavaScript. Experimental wasm versions are now included, but not yet used by default.

Multithreaded video decoding

Safari 10.1 has shipped support for the SharedArrayBuffer and Atomics APIs which allows for fully multithreaded code to be produced from the emscripten cross-compiler.

Experimental multithreaded versions of the VP8 and VP9 decoders are included, which can use up to 4 CPU cores to significantly increase speed on suitably encoded files (using the -slices option in ffmpeg for VP8, or -tile_columns for VP9). This works reasonably well in Safari and Chrome on Mac or Windows desktops; there are performance problems in Firefox due to deoptimization of the multithreaded code.

This actually works in iOS 10.3 as well — however Safari on iOS seems to aggressively limit how much code can be compiled in a single web page, and the multithreading means there’s more code and it’s copied across multiple threads, leading to often much worse behavior as the code can end up running without optimization.

Future versions of WebAssembly should bring multithreading there as well, and likely with better performance characteristics regarding code compilation.

Note that existing WebM transcodes on Wikimedia Commons do not include the suitable options for multithreading, but this will be enabled on future builds.

Misc fixes

Various bits. Check out the readme and stuff. 🙂

What’s next for ogv.js?

Plans for future include:

  • replace the emscripten’d nestegg demuxer with Brian Parra’s jswebm
  • fix the scaling of non-exact display dimensions on Windows w/ WebGL
  • enable WebM by default
  • use wasm by default when available
  • clean up internal interfaces to…
  • …create official plugin API for demuxers & decoders
  • split the demo harness & control bar to separate packages
  • split the decoder modules out to separate packages
  • Media Source Extensions-alike API for DASH support…

Those’ll take some time to get all done and I’ve got plenty else on my plate, so it’ll probably come in several smaller versions over the next months. 🙂

I really want to get a plugin interface so people who want/need them and worry less about the licensing than me can make plugins for other codecs! And to make it easier to test Brian Parra’s jsvpx hand-ported VP8 decoder.

An MSE API will be the final ‘holy grail’ piece of the puzzle toward moving Wikimedia Commons’ video playback to adaptive streaming using WebM VP8  and/or VP9, with full native support in most browsers but still working with ogv.js in Safari, IE, and Edge.

Testing in-browser video transcoding with MediaRecorder

A few months ago I made a quick test transcoding video from MP4 (or whatever else the browser can play) into WebM using the in-browser MediaRecorder API.

I’ve updated it to work in Chrome, using a <canvas> element as an intermediary recording surface as captureStream() isn’t available on <video> elements yet there.

Live demo: https://brionv.com/misc/browser-transcode-test/capture.html

There are a couple advantages of re-encoding a file this way versus trying to do all the encoding in JavaScript, but also some disadvantages…


  • actual encoding should use much less CPU than JavaScript cross-compile
  • less code to maintain!
  • don’t have to jump through hoops to get at raw video or audio data


  • MediaRecorder is realtime-oriented:
    • will never decode or encode faster than realtime
    • if encoding is slower than realtime, lots of frames are dropped
    • on my MacBook Pro, realtime encoding tops out around 720p30, but eg phone camera videos will often be 1080p30 these days.
  • browser must actually support WebM encoding or it won’t work (eg, won’t work in Edge unless they add it in future, and no support at all in Safari)
  • Firefox and Chrome both seem to be missing Vorbis audio recording needed for base-level WebM (but do let you mix Opus with VP8, which works…)

So to get frame-rate-accurate transcoding, and to support higher resolutions, it may be necessary to jump through further hoops and try JS encoding.

I know this can be done — there are some projects compiling the entire ffmpeg  package in emscripten and wrapping it in a converter tool — but we’d have to avoid shipping an H.264 or AAC decoder for patent reasons.

So we’d have to draw the source <video> to a <canvas>, pull the RGB bits out, convert to YUV, and run through lower-level encoding and muxing… oh did I forget to mention audio? Audio data can be pulled via Web Audio, but only in realtime.

So it may be necessary to do separate audio (realtime) and video (non-realtime) capture/encode passes, then combine into a muxed stream.

Canvas, Web Audio, MediaStream oh my!

I’ve often wished that for ogv.js I could send my raw video and audio output directly to a “real” <video> element for rendering instead of drawing on a <canvas> and playing sound separately to a Web Audio context.

In particular, things I want:

  • Not having to convert YUV to RGB myself
  • Not having to replicate the behavior of a <video> element’s sizing!
  • The warm fuzzy feeling of semantic correctness
  • Making use of browser extensions like control buttons for an active video element
  • Being able to use browser extensions like sending output to ChromeCast or AirPlay
  • Disabling screen dimming/lock during playback

This last is especially important for videos of non-trivial length, especially on phones which often have very aggressive screen dimming timeouts.

Well, in some browsers (Chrome and Firefox) now you can do at least some of this. 🙂

I’ve done a quick experiment using the <canvas> element’s captureStream() method to capture the video output — plus a capture node on the Web Audio graph — combining the two separate streams into a single MediaStream, and then piping that into a <video> for playback. Still have to do YUV to RGB conversion myself, but final output goes into an honest-to-gosh <video> element.

To my great pleasure it works! Though in Firefox I have some flickering that may be a bug, I’ll have to track it down.

Some issues:

  • Flickering on Firefox. Might just be my GPU, might be something else.
  • The <video> doesn’t have insight to things like duration, seeking, etc, so can’t rely on native controls or API of the <video> alone acting like a native <video> with a file source.
  • Pretty sure there are inefficiencies. Have not tested performance or checked if there’s double YUV->RGB->YUV->RGB going on.

Of course, Chrome and Firefox are the browsers I don’t need ogv.js for for Wikipedia’s current usage, since they play WebM and Ogg natively already. But if Safari and Edge adopt the necessary interfaces and WebRTC-related infrastructure for MediaStreams, it might become possible to use Safari’s full screen view, AirPlay mirroring, and picture-in-picture with ogv.js-driven playback of Ogg, WebM, and potentially other custom or legacy or niche formats.

Unfortunately I can’t test whether casting to a ChromeCast works in Chrome as I’m traveling and don’t have one handy just now. Hoping to find out soon! 😀

JavaScript async/await fiddling

I’ve been fiddling with using ECMAScript 2015 (“ES6”) in rewriting some internals for ogv.js, both in order to make use of the Promise pattern for asynchronous code (to reduce “callback hell”) and to get cleaner-looking code with the newer class definitions, arrow functions, etc.

To do that, I’ll need to use babel to convert the code to the older ES5 version to run in older browsers like Internet Explorer and old Safari releases… so why not go one step farther and use new language features like asynchronous functions that are pretty solidly specced but still being implemented natively?

Not yet 100% sure; I like the slightly cleaner code I can get, but we’ll see how it functions once translated…

Here’s an example of an in-progress function from my buffering HTTP streaming abstraction, currently being rewritten to use Promises and support a more flexible internal API that’ll be friendlier to the demuxers and seek operations.

I have three versions of the function: one using provisional ES2017 async/await, one using ES2015 Promises directly, and one written in ES5 assuming a polyfill of ES2015’s Promise class. See the full files or the highlights of ES2017 vs ES2015:

The first big difference is that we don’t have to start with the “new Promise((resolve,reject) => {…})” wrapper. Declaring the function as async is enough.

Then we do some synchronous setup which is the same:

Now things get different, as we perform one or two asynchronous sub-operations:

In my opinion the async/await code is cleaner:

First it doesn’t have as much extra “line noise” from parentheses and arrows.

Second, I can use a try/finally block to do the final state change only once instead of on both .then() and .catch(). Many promise libraries will provide an .always() or something but it’s not standard.

Third, I don’t have to mentally think about what the “intermittent return” means in the .then() handler after the triggerDownload call:

Here, returning a promise means that that function gets executed before moving on to the next .then() handler and resolving the outer promise, whereas not returning anything means immediate resolution of the outer promise. It ain’t clear to me without thinking about it every time I see it…

Whereas the async/await version:

makes it clear with the “await” keyword what’s going on.

Updated: I managed to get babel up and running; here’s a github gist with expanded versions after translation to ES5. The ES5 original is unchanged; the ES2015 promise version is very slightly more verbose, and the ES2017 version becomes a state machine monstrosity. 😉 Not sure if this is ideal, but it should preserve the semantics desired.

Scaling video playback on slow and fast CPUs in ogv.js

Video playback has different performance challenges at different scales, and mobile devices are a great place to see that in action. Nowhere is this more evident than in the iPhone/iPad lineup, where the same iOS 9.3 runs across several years worth of models with a huge variance in CPU speeds…

In ogv.js 1.1.2 I’ve got the threading using up to 3 threads at maximum utilization (iOS devices so far have only 2 cores): main thread, video decode thread, and audio decode thread. Handling of the decoded frames or audio packets is serialized through the main thread, where the player logic drives the demuxer, audio output, and frame blitting.

On the latest iPad Pro 9.7″, advertising “desktop-class performance”, I can play back the Blender sci-fi short Tears of Steel comfortably at 1080p24 in Ogg Theora:

The performance graph shows frames consistently on time (blue line is near the red target line) and a fair amount of headroom on the video decode thread (cyan) with a tiny amount of time spent on the audio thread (green) and main thread (black).

At this and higher resolutions, everything is dominated by video decode time — if we can keep up with it we’re golden, but if we get behind everything would ssllooww ddoownn badly.

On an iPad Air, two models behind, we get similar performance on the 720p24 version, at about half the pixels:

We can see the blue bars jumping up once a second, indicating sensitivity to the timing report and graph being updated once a second on the main thread, but overall still good. Audio in green is slightly higher but still ignorable.

On a much older iPad 3, another two models behind, we see a very different graph as we play back a mere 240p24 quarter-SD resolution file:

The iPad 3 has an older generation, 32-bit processor, and is in general pretty sluggish. Even at a low resolution, we have less headroom for the cyan bars of the video decode thread. Blue bars dipping below the red target line show we’re slipping on A/V sync sometimes. The green bars are much higher, indicating the audio decode thread is churning a lot harder to keep our buffers filled. Last but not least the gray bars at the bottom indicate more time spent in demuxing, drawing, etc on the main thread.

On this much slower processor, pushing audio decoding to another core makes a significant impact, saving an average of several milliseconds per frame by letting it overlap with video decoding.

The gray spikes from the main thread are from the demuxer, and after investigation turn out to be inflated by per-packet overhead on the tiny Vorbis audio packets… Such as adding timestamps to many of the packets. Ogg packs multiple small packets together into a single “page”, with only the final packet at the end of the page actually carrying a timestamp. Currently I’m using liboggz to encapsulate the demuxing, using its option to automatically calculate the missing timestamp deltas from header data in the packets… But this means every few frames the demuxer suddenly releases a burst of tiny packets with a 15-50ms delay on the main thread as it walks through them. On the slow end this can push a nearly late frame into late territory.

I may have further optimizations to make in keeping the main thread clear on slower CPUs, such as more efficient handling of download progress events, but overlapping the video and audio decode threads helps a lot.

On other machines like slow Windows boxes with blacklisted graphics drivers, we also benefit from firing off the next video decode before drawing the current frame — if WebGL is unexpectedly slow, or we fall back to CPU drawing, it may take a significant portion of our frame budget just to paint. Sending data down to the decode thread first means it’s more likely that the drawing won’t actually slow us down as much. This works wonders on a slow ARM-based Windows RT 8.1 Surface tablet. 🙂


Thoughts on Ogg adaptive streaming

So I’d like to use adaptive streaming for video playback on Wikipedia and Wikimedia Commons, automatically selecting the appropriate source format and resolution at runtime based on bandwidth and CPU availability.

For Safari, Edge, and IE users, that means figuring out how to rig a Media Source Extensions-like interface into ogv.js to let the streaming handler inject its buffered data into the demuxer and codecs instead of letting the player handle its own buffering.

It also means I have to figure out how to do adaptive stream switching for Ogg streams and Theora video, since WebM VP8 still decodes too slowly in ogv.js to rely on for deployment…

Theory vs Theora

At its base, adaptive streaming relies on the ability to feed the decoders with data from another stream without them freaking out and demanding a pause or reset. We can either read packets from a subset of a monolithic file for each source, or from a bunch of tiny segmented files.

In order to do this, generally you need to switch on video keyframe boundaries: each keyframe represents a point in the data stream where the video decoder can reset its state.

For WebM with VP8 and VP9 codecs, the decoders are pretty good at this. As long as you came in on a keyframe boundary you can just start feeding it packets at a new resolution and it’ll happily output frames at the new resolution.

For Ogg Theora, there are a few major impediments.

Ogg stream serial numbers

At the Ogg stream level: each Ogg logical bitstream gets a random serial number; those serial numbers will not match across separate encodings at different resolutions.

Ogg explicitly allows for “chaining” of complete bitstreams, where one ends and you just tack another on, but we’re not quite doing that here… We want to be able to switch partway through with minimal interruption.

For Vorbis audio, this might require some work if pulling audio+video together from combined .ogv files, but it gets simpler if there’s one .oga audio stream and separate video-only .ogv streams — we’d essentially have separate demuxer contexts for audio and video, and would not need to meddle with the audio.

For the Theora video stream this is probably ok too, since when we reach a switch boundary we also need to feed the decoder with…

Header packets

Every Theora video stream sets up start codes at the beginning of the stream in its three header packets. This means that encodings of the same video at different resolutions will have different header setup.

So, when we switch sources we’ll need to reinitialize the Theora decoder with the header packets from the target stream; then it should be safe to feed new packets into it from our arbitrary start position.

This isn’t a super exotic requirement; I’ve seen some provision for ‘start codes’ for MP4 adaptive streaming too.

Keyframe timing

More worrisome is that keyframe timing is not predictable in a Theora stream. This is actually due to the libtheora encoder internals — it allows you to specify a maximum keyframe interval, but it may decide at any time to insert a keyframe on its own if it thinks it’s more efficient to store a frame that way, at which point the interval starts counting from there instead of the last scheduled keyframe.

Since this heuristic is determined based on actual frame data, the early keyframes will appear in different times and places for renderings at different resolutions… And so will every keyframe following them.

This means you don’t have switch points that are consistent between sources, breaking the whole model!

It looks like a keyframe can be forced by changing the keyframe interval to 1 right before a desire keyframe, then changing it back to the desired value after. This would result in still getting some early keyframes at unpredictable times, but then also getting predictable ones. As long as the switchover points aren’t too often, that’s probably fine — just keep decoding over the extra keyframes, but only switch/segment on the predictable ones.

Streams vs split files

Another note: it’s possible to either store data as one long file per source, or to split it up into small chunk files at each keyframe boundary.

Chunk files are nice because they can be streamed easily without use of the HTTP ‘Range’ header and they’re friendly to cache layers. Long files can be easier to manage on the server, but Wikimedia ops folks have told me that the way large files are stored doesn’t always interact ideally with the caching layer and they’d be much happier with split chunk files!

A downside of chunks is that it’s harder to download a complete copy of a file at a given resolution for offline playback. But, if we split audio and video tracks we’re in a world where that’s hard anyway… Can either just say “download the full resolution source then!” Or provide a remuxer to produce combined files for download on the fly from the chunks… 🙂
The keyframe timing seems the ugliest issue to deal with; may need to patch ffmpeg2theora or ffmpeg to work around it, but at least shouldn’t have to mess with libtheora itself…

WebM seeking coming in ogv.js 1.1.2

Seeking in WebM playback will finally be supported in ogv.js 1.1.2. Yay! Try it out!

I seeked in this WebM file! Yeah really!

This took some fancy footwork, as the demuxer library I’m using (nestegg) only provides a synchronous i/o-using interface for seeking: on the first seek, it needed to be able to first do a seek to the location of the cues in the file (they’re usually at the end in WebM, not at the beginning!), then read in the cue data, and then run a second seek to the start of a cluster.

On examining the innards of the library I found that it’s fairly safe to ‘restart’ the operation directly after the seek attempts, which saved me the trouble of patching the library code; though I may come up with a patch to more cleanly & reliably do this.

For the initial hack, I have my i/o callbacks detect that an attempt was made to seek outside the buffer range, and then when the nestegg library function fails out, the demuxer sees the seek attempt and passes it back to the caller (OGVPlayer) which is able to trigger a seek on the actual, asynchronous i/o layer. Once the new data starts arriving, we call back into the demuxer to read the cues or confirm that we’ve seeked to the right place and can continue decoding. As an additional protection against the library freaking out, I make sure that the entire cues element has been buffered before attempting to read it.

I’ve also fixed a bug that caused WebM playback to occasionally die with an out of memory error. This one was caused by not freeing packet data in the demuxer. *headdesk* At least that bug was easier. 😉

This gets WebM VP8 playback working about as well as Ogg Theora playback, except for the decoding being about 5x slower! Well, I’ve got plans for that. 😀

VP8 parallelization continued

Following up on earlier musings… Found an interesting analysis by someone who did some work in this area a few years ago.


Based on the way data flows from  macroblocks adjacent above or to the left, it is safe to parallelize multiple super blocks in diagonal lines running from top-right towards the bottom-left. This means you start with a batch of one macroblock, then a batch of two, then three…. Up to the maximum diagonal dimension (30 blocks at 480p or 68 at 1080p) then back down again to one at the far bottom right corner. Hmm, not exactly diagonal, actually half diagonal?


(Based on my reading the filter stage doesn’t need to access the above-right block which simplifies things, but I could be wrong… Intraframe prediction looks scarier though and may need the different angle cut. Anyway the half diagonal order should work for the entire set of all operations…).

This breakdown should be suitable both for CPU worker-based threading (breaking the batches into smaller sub-batches per core) and for WebGL shader based GPU work (where each large batch would issue several draw calls, each processing up to the full batch size of macroblocks).

For CPU work there could also be pipelining between the data stages, though if doing full batching that may not be necessary.

Data locality and latency

On modern computers, accessing memory is often the slowest part of a naively written calculation. If the memory you’re working with isn’t in cache, fetching it can be verrrry slow. And if you need to transfer data between the CPU and GPU things get even nastier.

Unpacking the entropy-coded data structures for an entire frames worth of macroblocks then processing them in a different order sounds like it might be expensive. But it shouldn’t be insanely so; most processing will be local to each block.

For GPU usage, data also flows mostly one way from the CPU into textures and arrays uploaded into the GPU, where random texel access should be fast. We shouldn’t need to read any of that data back to the CPU until the end of the loop filter, when we read the YUV buffers back to feed into the next frame’s predictions and output to the player.

What we do need for the GPU is to read the results of each batch’s computations back into the next batch. As long as I can render into a texture and use that texture as a source for the next call I think that all works as I need it.

Loop filter madness

The loop filter stage reduces artifacting at the edges of macroblocks and sub-blocks from motion prediction and DCT fun. Because it’s very precisely specced, and the output feeds back into the next frame’s predictions, it’s important to get this right.

per spec, each macroblock may apply up to four filter passes:

left edge

subblock left edges

top edge

subblock top edges

Along each edge, for each pixel along the edge the boundaries are tested for a threshold and a filter may or may not be applied, variously to one, two, or three pixels deep from the edge. For the macroblock edges, when the filter applies it also modifies the adjacent block data!

It’s all pretty funky, but it looks parallelizable within limits.

Ah, now to find time to research this further while still getting other stuff done. 😉

Peeking into VP8 video decoding performance

The ogv.js distribution includes Ogg Theora video decoding, which we use on Wikipedia to play back our media files in Safari, IE, and Edge, but also has an experimental mode for WebM files with VP8 video.

WebM/VP8 is better supported by many tools, and provides higher quality video at lower bandwidth usage than Ogg Theora… There are two major reasons I haven’t taken WebM out of “experimental” status:

  1. The demuxer library (nestegg) was hacked in quickly and I need to add support for seeking, “not crashing”, etc
  2. Decoding VP8 video is several times slower than decoding Theora at the same resolution.

The first is a simple engineering problem; I can either hack up nestegg or replace it with something that fits the i/o model I’m working with better.

The second is an intrinsic complexity problem: although the two formats are broadly similar technologies, VP8 requires more computation to decode a frame  than Theora does.

Add to this the fact that we have some environmental limitations on common CPU optimizations for parallelizable code in the C libraries:

  • JavaScript “Worker” threads are different from the low-level pthreads threading model used by C code, and the interfaces required to more closely emulate it are not yet available in Safari or Edge.
  • SIMD (“Same Instruction Multiple Data”) processing is not available in Safari, and not yet production-enabled in Edge.

So, if we can’t use SIMD instructions to parallelize tiny bits of processing, and we can’t simply crank up multithreading to use a now-ubiquitous second CPU core, how can we split up the work?

The first step, I think, is to determine some data boundaries where we might hope to be able to export data from the libvpx library before it’s fully done processing.

What’s what

The VP8 decoder & bitstream format is defined in RFC 6386, from which we can roughly divide decoding into four stages:

  1. …decode input stream entropy encoding…
  2. reconstruct base signal by applying motion vectors against a reference frame
  3. reconstruct residual signal by applying inverse DCT
  4. apply loop filter

The entropy encoding isn’t really a separate stage, but happens as other data gets read in and then fed into each other stage. It’s not really parallelizable itself either.

Aiming high

So before I go researching more on trying to optimize some of these steps, what’s actually the biggest, slowest stuff to work on?

I did a quick profiling run playing back some 480p WebM in Chrome (pretty good compiler, and the profiling tools support profiling just one worker thread which isolates the VP8 decoder nicely). Broke down the resulting function self-time list by rough category and found:

Filter: 54.40%
Motion: 22.75%
IDCT: 10.55%
Other: 12.31%

(“Other” included any function I didn’t tag, as well as general overhead.)

Ouch! Filtering looks like a good first application of a separate step.

Possible directions – staying on CPU

If we stick with the CPU, we can create further Worker threads and send blocks of data to be filtered. In many cases, even when processing of one macroblock to another is hard to parallelize because of data dependencies, there is natural parallelism in the color system — the Y (“luma”, or brightness) plane can be processed by one thread while the U and V (“chroma”, or color) planes can be processed independently by another.

However splitting processing between luma and chroma has a limited benefit. There’s twice as much luma data as chroma, so you save at most 33% of the runtime here.

Macroblocks and subblocks

VP8 divides most processing & data into “macroblocks” of 16×16 pixels, composed of 24 “subblocks” of 4×4 pixels (that’s 16 subblocks of luma data, plus 4 each from the two lower-resolution chroma planes).

In many cases we can parallelize at the subblock level, which divides 24 evenly into 2 cores, or even 4 or 8 cores! But the most performance-sensitive devices will usually only have 2 CPU cores available, giving us only a 50% potential speedup.

Going crazy – GPU time?

Graphics processors are much more aggressively multithreaded, with multiple cores and ability to handle many more simultaneous operations.

If we run more operations on the GPU, we might actually have a chance to run 24 subblocks all in parallel!

However there’s a lot more fuzziness involved in getting there:

  • may have to rewrite code from C into GLSL
  • have to figure out how to operate as fragment or vertex shaders
  • have to figure out how to efficiently get data in/out
  • oh and you can’t use WebGL from a Worker thread, so have to bounce data up to the main thread and back down to the codec
  • once all that’s done, what’s the actual performance of the GPU vs the CPU per-operation? no idea 😀

So, there would seem to be great scaling potential on the GPU side, but there are a lot of unknowns.

Worth investigating but we’ll see what’s really feasible…


1000fps no more

Different media file formats encode things like time and frame rates differently… or not at all.
WebM doesn’t list a frame rate; each frame is simply given a position in time. Meanwhile the older Ogg Theora codec defines a consistent, pre-defined frame rate for a stream, but allows frames to be declared as duplicates of the previous frame as an optimization.
At the intersection of these two, some files auto-converted from WebM to Ogg on Wikimedia’s servers end up claiming to encode a “1000 fps” video stream, where nearly all the frames are dupes and there’s actually ~25-30 or at most 60 actual frames per second.
I had to put a hack into my ogv.js player to handle these, because actually trying to draw 1000 frames per second was kind of slow. 😉