bloggy blog

I tossed together a silly WordPress plugin to special-case links into [[leuksman|my wiki pages]] such as my [[gimp mac helper]] tools, as well as to MediaWiki’s bug tracker (eg, bug 1) and SVN repository (r12345 was nice).

SVN anonymous checkout: http://svn.wikimedia.org/svnroot/mediawiki/trunk/silly-regex-linker/

Only useful if you’re me at the moment, but maybe I’ll generalize it for fun.

SSHKeychain

For a long time I’ve found SSHKeychain an invaluable little app on my Macs, making remote logins with keys relatively painless.

When I upgraded to an Intel-based MacBook last month, I was saddened to find that there wasn’t a Universal release; the last PPC release didn’t run on Intel; and development seems to have stopped altogether. :(

The good news is that it’s open source, and further one of the last code checkins, early in 2006, had been to add Universal binary build support. So I went ahead and built the thing for my own use.

I did though find that it crashed intermittently when waking from sleep. After a little debugging I found the problem; some variables were initialized badly so if all your keychains were locked it would crash. Fun! Easy to fix, though.

I mailed the patch to the dormant developers mailing list and the author, hopefully it’ll get rolled in and other people will get to use it…

There is no Brion, only SUL

In the run-up to Wikimania, I plan to get the single-user-login code base sufficiently up to stuff to at least demo it at the conference, and to be able to snap it into action probably shortly after.

Milestones:

1) logging in on local DBs
2) new account creation on local DBs
3) migration-on-first-login of matching local accounts on local DBs
4) migration-on-first-login of non-matching local accounts on local DBs
5) renaming-on-first-login of non-matching local accounts on local DBs
6) provision for forced rename on local DBs
7) basic login for remote DBs
8) new account for remote DBs
9) migration for remote DBs
10) profit!

additional goodies:
11) secure login form
12) multiple-domain cookies to allow site-hopping

dbzip2 production testing

The English Wikipedia full-history data dump (my arch-nemesis) died again while building due to a database disconnection. I’ve taken the opportunity to clean up dbzip2 a little more and restart the dump build using it.

The client now handles server connections dropping out, and can even reconnect when they come back, so it should be relatively safe for a long-running process. The remote daemon also daemonizes properly, instead of leaving zombies and breaking your terminal.

Using six remote dbzip2d threads, and the faster 7zip decompression for the data prefetch, I’m getting about 6.5 megabytes per second of (pre-compression XML) throughput average, peaking around 11 mb/sec. A big improvement over what I was measuring with the local threads, by a factor of 5 or so. If this holds up, it should actually complete in “just” two or three days…

Of course that’s assuming the database connection doesn’t drop again! Another thing to improve…

Video crap

A while ago I picked up Motion 2 on a lark to replace the ancient copy of After Effects I occasionally used to do little animation bits. Finally escaped the wiki for a couple hours and got some chance to play with it some more:

Oh my goodness!
1.2MB Ogg Theora (640×360, no sound) (download)

The particle effects are yummy… 720p also fits nicely on my screen while editing. Did a bit in Blender as well; there’s a nice tutorial on WikiBooks (damn, that brings me right back to wiki!)

Stumbled on this while searching for Theora transcoding recommendations.

dbzip2 vincit

I’ve managed to bang my dbzip2 prototype into a pretty decent state now, rewriting some of the lower-level bitstream code as a C module while keeping the high-level bits in Python.

It divides up input into proper-sized blocks and combines output blocks into a single output stream, achieving bit-for-bit compatibility with single-threaded standard bzip2. While still slower than bzip2smp for local threads, I was quite pleased to find it scales to multiple remote threads well enough to really look worth it:

This was using Wikimedia’s database servers; beefy Opteron boxes with gigabit ethernet and usually a lot of idle CPU cycles while they wait on local disk I/O.

The peak throughput on my initial multiple-server tests was about 24 megabytes per second with 10 remote threads, and I was able to get over 19 megs/sec on my full gigabyte test file, compressing it in under a minute. With some further work and better stability, this could be really helpful in getting the big data dumps going faster.

Next step: parallel decompression…?