Limitations of AVSampleBufferDisplayLayer on iOS

In my last post I described using AVSampleBufferDisplayLayer to outputÂ manually-uncompressed YUV video framesÂ in an iOS app, for playing WebM and Ogg files from Wikimedia Commons. After further experimentation I’ve decided to instead stick with using OpenGL ES directly, and here’s why…

640×360 output regularly displays with aÂ weird horizontal offset corruption on iPad Pro 9.7″. Bug filed asÂ rdar://29810344
Can’t get any pixel format with 4:4:4 subsampling to display. Theora and VP9 both support 4:4:4 subsampling, so thatÂ made some files unplayable.
Core Video pixel buffers for 4:2:2 and 4:4:4 are packed formats, and it prefers 4:2:0 to be a weird biplanar semi-packed format. This requires conversion from the planar output I already have, which may be cheap with NeonÂ instructions but isn’t free.

Instead,Â I’m treating each plane as a separate one-channel grayscale image, which works for any chroma subsampling ratios. I’m using some Core Video bits (CVPixelBufferPool and CVOpenGLESTextureCache) to doÂ texture setupÂ instead of manuallyÂ callingÂ glTeximage2d with a raw source blob, which improvesÂ a few things:

Can do CPU->GPU memory copy off main thread easily,Â without worrying about locking my GL context.
No pixel format conversions, so straight memcpy for each line…
Buffer pools are tied to the video buffer’s format object, and get swapped out automaticallyÂ when the formatÂ changes (new file, or file changes resolution).
Don’t have to manually account for stride != width in the texture setup!

ItÂ couldÂ be more efficient still if I could pre-allocate CVPixelBuffers with on-GPU memory and hand them to libvpx and libtheora to decode into… but they currently lack sufficient interfaces to accept frame buffers with GPU-allocated sizes.

A few other oddities I noticed:

The clean aperture rectangle setting doesn’t seem to be preserved when creating a CVPixelBuffer via CVPixelBufferPool; I have to re-set it when creating new buffers.
For grayscaleÂ buffers, the clean aperture doesn’t seem to be picked up by CVOpenGLESTextureGetCleanTexCoords. Not sure if this is only supposed to work with Y’CbCr buffer types or what… however I already have all these numbers in myÂ format object and just pull from there. :)

I also fell down a rabbit hole researching color space issues after noticing thatÂ some of the video formats support multipleÂ colorspace variants that may imply different RGB conversion matrices… and maybe gamma…. and what do R, G, and B mean anyway? :) Deserves another post sometime.