January 2023 – brionv

Flight sim screen recording helper script

I’ve been taking short clips from my Flight Simulator adventures and making small (sub-4 megabyte) .mp4 clips that I can upload to the forum or social media. The recordings are 3440×1440 up to 60fps, in HDR, and are both large and long.

I usually trim out clips using the trim tool in QuickTime Player on a Mac (yay fast networks), then save a shorter file I can work with in ffmpeg.

Ended up writing a script to automate this; it’ll pick a bitrate to fit within 3,500,000 bytes (optionally including audio), scale down and crop or pad to 16:9, and tone-map the HDR down to SDR for common-denominator playback.

Because there has to be a cursed element, the script is in command-line PHP. :D

https://brionv.com/git/brion/pack-vid/src/branch/main/pack-vid

A few examples:

Tanarg 912 trike

Grand Canyon in Kodiak 100

Chicago in Aermacchi MB-339

Chicago in Monster XCub

Atari Mandelbrot fractal: imul16

Another nostalgia+practice project I’m poking at on the Atari 800 XL is a Mandelbrot fractal generator, which I’m still in early stages of work on. This is mostly an exercise in building a 16-bit integer multiplier for the MOS 6502 processor, which has only addition, subtraction, and bit-shift operations.

The Mandelbrot set consists of those complex-plane points which, when iterating over z_i+1 = z_i^2 + c (where c is the input coordinate and z_0 = 0) never escape beyond |z_i| > 2. Surprisingly this creates a really cool shape, which has been the subject of fascination for decades:

https://en.wikipedia.org/wiki/Mandelbrot_set#/media/File:Mandel.png

To implement this requires three multiplications per iteration, to calculate zx^2, zy^2, and zy*zx. Famous PC fractal program Fractint used for low zooms a 16-bit integer size, which is good because anything bigger gets real slow!

For higher zooms Fractint used a 32-bit integer and 29 fractional bits for Mandelbrot and Julia sets, which leaves a range from -4..3.9, plenty big enough. For the smaller 16 bit size that means 3.13 layout, should be plenty for a few zooms in on a 160×192 screen. :D Multiplication creates a 32-bit integer with twice the integer bits, so 6.26 with a larger range which covers the addition results for the new zx and zy values.

These then need to be shifted back up and multiplied to get zx^2, zy^2, and zy*zx for the next iteration; the boundary condition is zx^2 + zy^2 >= 4.

imul16

Integer multiplication when you have binary shifts and addition only is kinda slow and super annoying. Because you have to do several operations for each bit, every cycle adds up — a single 16-bit add is just 18 cycles while a multiply can run several *hundred* cycles, and varies based on input.

Note that a 650-cycle function means a runtime of about a half a millisecond on average (1.79 MHz processor, with about 30% of cycles taken by the display DMA). The whole shebang could easily take 2-3 ms per iteration with three multiplications and a number of additions and shifts.

Basically, for each bit in one operand, you either add, or don’t add, the other operand with the corresponding bitshift to the result. If you’re dealing with signed integers you need to either sign-extend the operands to 32 bits or negate the inputs and keep track of whether you need to negate the output; not extending can be faster because you can assume the top 16 bits are 0 and shortcut some operations. ;)

Status and next steps

imul16 seems to be working, though could maybe use more tuning. I’ve sketched out the mandelbrot iteration function but haven’t written it yet.

Another trick Fractint used was trying to avoid having to go to max iterations within the “Mandelbrot lake” by adding a check for periodic repetition; apparently when working with finite precision often you end up with the operations converging on a repeating sequence of zx & zy values that end up yielding themselves after one or a few iterations; these will never escape the boundary condition, so it’s safe to cut off without going to max iterations. I’ll have to write something up with a little buffer of saved values, perhaps only activated by an adjacent max-iters pixel.

Once the internals are working I’ll wrap a front-end on it: 4-color graphics display, and allow a point-n-zoom with arrow keys or maybe joystick. :D

Atari photo/video viewer project

I recently picked up a vintage Atari 800 XL computer like one I had as a kid in the 1980s, and have been amusing myself learning more the low-level programming in that constrained environment.

The 8-bit Atari graphics are good for 1979 but pretty primitive; some sprite-like overlays (“player/missile graphics”) and a background that can either be character-mapped or a bitmap, trading off resolution for colors: 320×192 at 2 colors, 160×192 at 4 colors, or 80×192 at 9 colors (limited by the number of palette registers handy when they implemented the extended modes).

This not only means you have relatively few colors available for photorealistic images, but a 40 byte * 192 line framebuffer is 7680 bytes, a large amount for a computer with a 64KB address space.

However you have a lot of flexibility too: any scanline can switch modes in the display list so you can mix high-res text or instruments with many-colored playfields, and you can change palette registers between scanlines if you get the timing right.

I wondered whether video would be possible — if you go for the high res mode, and *do* touch every pixel, how long would it take to process a frame? Well, I did the numbers and it’s a *terrible* frame rate. BUT — if you had uncompressed frames ready to go in RAM or ROM, you can easily cycle between frames at 60 Hz, letting the display processor see each fresh frame.

With enough bank-switched ROM to back it, you could stream 60 Hz video at 480 KiB per second. A huge amount of data for its day, but now you could put a processed GIF animation onto a cartridge. ;)

So I’ve got a few things I want to explore on this fun project:

dithering to 4 colors, with per-scanline palettes (working as of December 2022)
can you also embed audio? 4-bit PCM at 15.8 or 7.9 KHz (working at 7.9; 15.8 may require a tweak)
try adding a temporal component to the error-diffusion dithering
add a couple lines of text mode for subtitles/captions

Dithering and palette selection

I’ve got a dither implementation hacked together in JS which reads in an image, sizes it, and then walks through the scanlines doing an error-propagation dither combined with a palette reduction.

To start with, the complete Atari palette is 7 bits (3 bits luminance, 4 bits hue, where 0 is grayscale and 1-15 are various points around the NTSC QI hue wheel). I took an RGB list of the colors from the net and, after gamma adjustment to linear space, perform an error-diffusion dither that looks for the closest color from the available palette then divides up the difference from the original color among neighboring pixels. At the end of the scanline, we count how many colors were used, including black which cannot be changed. If the remaining colors are > 3, they’re ranked based on usage and closeness and the least scoring color is removed. This is continued until the dither selects only colors that fit.

Formatting and playback

Due to a quirk of the Atari’s display processor, a frame buffer can’t cross a 4096-byte boundary, so with a 40-byte screen width you have to divide it into two non-contiguous sections. Selecting a widescreen aspect ratio (also to leave room for captions later) means there’s room enough to fit in arrays for the palettes as well (3 bytes per scanline) an to fit audio (131 or 262 bytes depending on sample rate).

Note that for extra fun, the hardware register that gives you the current scanline number gives you the count *divided by two*. This is because the whole signal has 262 scanlines per frame, which is bigger than 256 and doesn’t fit in a byte! :D

So it makes sense to handle these by waiting until we’re synced up on line 0 and then doing an explicit timing loop with horizontal blanking waits (STA WSYNC). This way we know if we’re on the 0 or the 1 subline, and can use the VCOUNT register (0..130) as an index into arrays of palette or audio entries.

For testing without simulating bank-switching, I’m squishing two frames into RAM and switching between the two by making a complex display list: basically just the same thing twice, but pointing at different frame buffers and looping back around.

It seems to work pretty nice! But the timing is tight and I have to disable interrupts.

Audio

The Atari doesn’t have DMA-based PCM audio where you just slap in some bytes and it plays the audio… you either use the square-wave generators, or you manually set the volume level of the voices for each sample *at the right time*.

Using the scan line frequency is handy since we’re already in there changing palette entries during horizontal blanking. Every freq is about 15.8 KHz, every other line is 7.9 KHz, slightly worse than telephone frequency.

It seems to work at 7.9 at least, and I might be able to do 15.8 with ROM backing (bank-switching every frame makes things easier vs a long buffer in RAM). Note that you only get 4 bits of precision, and unpacking two samples from one byte is annoyingly expensive. ;)

Next steps

The next thing I’ll try is a tweak to the dither algorithm to try to drive a more direct dither pattern between temporally adjacent frames; at least on an LCD, the 60 Hz flip looks great and it should “blend” even better on a classic CRT with longer phosphor retention times.

Then I’ll see if I can make a 1 MiB bank-switched cartridge image from the assembler that I can load in the emulator (and eventually flash onto a cartridge I can get for the physical device) so I can try running some longer animations/videos.

No rush though; I gotta get the flashable cartridge. ;)

Blog blog blog 2023

I resolve this year to publish more long-form blog posts with whatever cool stuff I’m working on, for work or for fun.

I’m trying to treat social media as more ephemeral. I quit Twitter entirely last year, deleting the account; my mastodon.technology account has vanished with the server shutting down, and I’ve even set my new account to delete non-bookmarked posts after two weeks.

It’s fun to talk about my projects a couple hundred characters at a time, but it’s also really nice to put together a bigger post that can take you through something over time and collect all the pieces together.

A long-form blog, with updateable pages, allows for this, and I think makes for a better experience when you really *do* mean to publish something informative or interesting. Let’s bring embloggeration back!