SSHKeychain

For a long time I’ve found SSHKeychain an invaluable little app on my Macs, making remote logins with keys relatively painless.

When I upgraded to an Intel-based MacBook last month, I was saddened to find that there wasn’t a Universal release; the last PPC release didn’t run on Intel; and development seems to have stopped altogether. 🙁

The good news is that it’s open source, and further one of the last code checkins, early in 2006, had been to add Universal binary build support. So I went ahead and built the thing for my own use.

I did though find that it crashed intermittently when waking from sleep. After a little debugging I found the problem; some variables were initialized badly so if all your keychains were locked it would crash. Fun! Easy to fix, though.

I mailed the patch to the dormant developers mailing list and the author, hopefully it’ll get rolled in and other people will get to use it…

There is no Brion, only SUL

In the run-up to Wikimania, I plan to get the single-user-login code base sufficiently up to stuff to at least demo it at the conference, and to be able to snap it into action probably shortly after.

Milestones:

1) logging in on local DBs
2) new account creation on local DBs
3) migration-on-first-login of matching local accounts on local DBs
4) migration-on-first-login of non-matching local accounts on local DBs
5) renaming-on-first-login of non-matching local accounts on local DBs
6) provision for forced rename on local DBs
7) basic login for remote DBs
8) new account for remote DBs
9) migration for remote DBs
10) profit!

additional goodies:
11) secure login form
12) multiple-domain cookies to allow site-hopping

dbzip2 production testing

The English Wikipedia full-history data dump (my arch-nemesis) died again while building due to a database disconnection. I’ve taken the opportunity to clean up dbzip2 a little more and restart the dump build using it.

The client now handles server connections dropping out, and can even reconnect when they come back, so it should be relatively safe for a long-running process. The remote daemon also daemonizes properly, instead of leaving zombies and breaking your terminal.

Using six remote dbzip2d threads, and the faster 7zip decompression for the data prefetch, I’m getting about 6.5 megabytes per second of (pre-compression XML) throughput average, peaking around 11 mb/sec. A big improvement over what I was measuring with the local threads, by a factor of 5 or so. If this holds up, it should actually complete in “just” two or three days…

Of course that’s assuming the database connection doesn’t drop again! Another thing to improve…

Video crap

A while ago I picked up Motion 2 on a lark to replace the ancient copy of After Effects I occasionally used to do little animation bits. Finally escaped the wiki for a couple hours and got some chance to play with it some more:

Oh my goodness!
1.2MB Ogg Theora (640×360, no sound) (download)

The particle effects are yummy… 720p also fits nicely on my screen while editing. Did a bit in Blender as well; there’s a nice tutorial on WikiBooks (damn, that brings me right back to wiki!)

Stumbled on this while searching for Theora transcoding recommendations.

dbzip2 vincit

I’ve managed to bang my dbzip2 prototype into a pretty decent state now, rewriting some of the lower-level bitstream code as a C module while keeping the high-level bits in Python.

It divides up input into proper-sized blocks and combines output blocks into a single output stream, achieving bit-for-bit compatibility with single-threaded standard bzip2. While still slower than bzip2smp for local threads, I was quite pleased to find it scales to multiple remote threads well enough to really look worth it:

This was using Wikimedia’s database servers; beefy Opteron boxes with gigabit ethernet and usually a lot of idle CPU cycles while they wait on local disk I/O.

The peak throughput on my initial multiple-server tests was about 24 megabytes per second with 10 remote threads, and I was able to get over 19 megs/sec on my full gigabyte test file, compressing it in under a minute. With some further work and better stability, this could be really helpful in getting the big data dumps going faster.

Next step: parallel decompression…?

dbzip2 continues

Still fiddling with distributed bzip2 compression in the hopes of speeding up data dump generation.

My first proof of concept worked similarly to the pbzip2 I came across: input was broken into blocks, each block sent out to local (and remote) threads to be separately compressed as its own little bzip2 stream. The streams were then concatenated together in order.

This works surprisingly well, but has serious compatibility problems. While the standard bzip2 utility happily decompresses all the streams you care to place into one input file, other users of the library functions don’t expect this: the tools I needed to work with the dumps would end with the first stream.

A more compatible implementation should produce a single stream with multiple data blocks in the expected sequence, but the file format doesn’t seem to be well documented.

In my research I came across another parallel bzip2 implementation, bzip2smp. Its author had also found pbzip2 and rejected it, preferring instead to hack the core bzip2 library enough to parallelize the slowest part of it, the Burrows-Wheeler transform. The author claims the output is bit-for-bit identical to the output of regular bzip2, which is obviously attractive.

I’m not 100% sure how easy or practical it would be to extend that to distributed use; the library code is scary and lots of parts are mixed together. Before diving in, I decided to start poking around at the file format and the compression steps to get a better idea of what happened where.

I’ve been putting together some notes on the format. The basic stream itself is slightly complicated by being a true bitstream — blocks in output are not aligned to byte boundaries!

As a further proof of concept I’ve hacked up dbzip2 to combine the separately compressed blocks into a single stream, complete with a combined CRC at the end so bzip2 doesn’t barf and cut off the end of the data. This should be more compatible, but it’s not suitable for use right now: the bit-shifting is done in Python and way too slow. Additionally the input block cutting is probably not totally reliable, and certainly doesn’t produce the same size blocks as real bzip2.

Replicating the RLE performed on input could get the blocks the same size, but at that point it might start to make sense to start using the actual library (or a serious hack of it) instead of the scary Python wrapper I’ve got now.