Wikipedia on Leopard


The Dictionary.app included in Mac OS X 10.5 has support for making lookups to Wikipedia, optionally in various languages.

The actual display of articles seems to be done by loading the page out of the live Wikipedia and doing some custom filtering of it. This isn’t documented to us, so I hope we don’t break it by mistake!

The searching is done via a relatively simple REST protocol to do title-prefix searches as type-ahead suggestions.

Some Apple engineers whipped up a little index search using the DARTS C++ library, with a PHP wrapper extension around it for web output. The results are wrapped in some simple HTML, pretty straightforward to handle.

Once production finally rolled out, though, we encountered some problems:

  1. The number of page titles in the system has increased to the point where a complete index for all languages barely fits in memory on a 32-bit box. I had to break the index in two (English and non-English) just to get it to generate.
  2. Performance was spotty, sometimes mysteriously hanging up for several long seconds. I suspect this is due to the huge indexes loaded in memory; every once in a while something decides to swap.

I finally got my hands on a copy of Leopard to confirm I wasn’t breaking the client, so it’s time to see what I can do…

Rather than investing more resources into the DARTS indexer, I figured I’d see if we can roll this back in with our existing tools to make it easier to maintain.

We already have a type-ahead suggestion backend, which is used for our [[OpenSearch]] interface. If you’re running Firefox 2.0 or later you can pull up the ‘Wikipedia’ search and try it out.

I did some quick testing and confirmed that it was pretty easy to make a translator that would query the OpenSearch suggestion API and format results for the Apple widget; I just had to add a limit option, then a simple re-query and wrap the results.

On my quick benchmarking, performance at least isn’t any worse, and seems to be more consistent so far and gives up to date results — no waiting for the next index generation.

The one big problem right now is that our suggestion search is case-sensitive, since it pulls directly from the binary-collated page title columns in our core database. That’s a minor annoyance except that the Dictionary app sends us queries which have been forced to lowercase — so you can’t easily reach titles with caps past the first letter.

Guess it’s time to bring back the title key field and get that working properly so I can switch in the new version…

Mac v Linux

I first switched to the Mac in ’03 after a few years of being a mostly Linux/BSD guy. Aside from the ability to test Wikipedia in Mac browsers, I was drawn by the oh-so-cute factor of the 12″ aluminum PowerBook and more importantly the way it actually was able to detect its included hardware and attached monitors. ;)

Four years later, desktop Linux is better than ever but still tends to fall down and wet itself when doing things like configuring a multimonitor configuration or installing Flash and Java plugins in 64-bit mode. I’d be afraid to even try it on a laptop without knowing that sleep/wake and external monitor hookup work properly on that exact model.

But when I switched I promised myself I would retain my freedom to switch back. Today I’m using a Mac laptop and a Linux desktop together in the office; if I wanted to switch 100% to Linux, what would I need to change?…

Mac app Linux app
Firefox Ahh, open source. :)
Thunderbird
Gimp
NeoOffice OpenOffice
TextMate / BBEdit gedit? jEdit? Eclipse? I haven’t really been happy with *nix GUI editors. Emacs is not an acceptable option. ;)

I need a good project-wide regex search/replace, good charset support, ability to open & save files over SFTP, and syntax highlighting/sensitivity that doesn’t interfere with my indenting.

Being easy to load files from a terminal and not sucking are pluses.

Yojimbo Tomboy? I use Yojimbo constantly for notes, scratch space, web receipts, chat snippets, todo lists, reference cheat sheets, anything and everything.

Simple as it is, I love this app! The closest thing I’ve used on *nix is Tomboy, but it doesn’t feel as smooth to me. I’ll just have to fiddle with it more… figuring out how to import all my existing data would be another issue.

QuickSilver Gnome desktop launcher? I’ve found QuickSilver invaluable for launching various apps… I used to switch to Terminal and run ‘open -a Firefox’ and such. ;) I think the new launcher which will be included with Ubuntu Gutsy will serve okay on this, though I haven’t tried it.
Keynote OpenOffice Impress Wonder if it’s got the nice preview-on-second-screen that Keynote does.
Parallels VMWare Workstation Already use this on my office Linux box.
iChat Pidgin Been using Pidgin a bit on my Linux box in the office; it’s pretty decent these days.
Colloquy XChat-GNOME Kind of awkward, but I haven’t found an IRC client I’m happier with on *nix.
Google Earth Google Maps I haven’t had any luck getting the Linux version of Google Earth to run on my office box, but the web version is usually fine.
iTunes RhythmBox I’d have to strip DRM from my iTMS tracks, but that’s certainly doable. Don’t know whether it’ll be able to sync with an iPhone, though. ;)
iPhoto F-Spot I took a quick peek on the F-Spot web site and was surprised to find nothing about importing from iPhoto. Should be doable; the photos are all just JPEGs and the metadata’s in some kind of XML last I looked.
NetNewsWire ? I haven’t found a good RSS reader on *nix yet.
iCal Evolution calender? Sunbird? I guess I could use Google Calender, but it’s kind of nice to have something that works locally.

The biggest lapse if I switch at home would be in the video editing / multimedia end of things, which I dabble in sometimes and keep meaning to get back into more. I’m pretty happy with the Apple pro apps (Final Cut, Motion, etc), and there’s not really much touching that in Linux-land.

iProduct vs Veronica Mars

So I finally gave in and picked up an Apple TV unit; that frees up my Mac Mini from TV duty to be my main home computer, while letting the Apple TV concentrate on being a media player.

The good: unit is very compact, setup is pretty straightforward, and picture looks good once I adjust the ungodly color saturation my TV defaults to on the component input.

The bad: at least for the shows I tested (Veronica Mars season 3), video playback is totally broken at HD resolutions!

At 720p playback stutters very badly, with very jerky motion and sound out of sync from picture by about a second.

At 1080i I don’t even *get* picture during playback, just sound. (Menus display fine.)

At 480p everything looks great, though, and the currently available content doesn’t need more than that, so I’m leaving it there for now.

A quick Google scan doesn’t show any other obvious complaints of this problem, so I’m not sure if I’ve got a bogus unit or if it’s something funky with the Veronica Mars encoding that might not be a problem with other shows…

Update: At some point it started working fine. *shrug*

Virtualization on Mac

Click me, I'm deliciousWe’ve got some new machines in for the Wikimedia staff, and among them a shiny Core 2 Duo iMac has found its way to my desk as my in-office development workstation. Yum!

Doing web development, I need to have access to a number of operating systems for testing purposes: Linux servers, Windows clients, Windows servers, Linux clients, Mac clients, and the occasional other oddity.

In theory, at least, an Intel-based Mac should be the ideal environment to run this: test the Mac clients on the main OS, and everything else running in virtualization at full speed. The new Core 2 Duo boxes are further capable of running both i386 and x86_64 guest OSs, for full coverage.

With this in mind I’ve fiddled around for a while with the main desktop-level virtualization packages on the Mac to get a feel for what’s available… unfortunately the field isn’t very thick.

Basically there’s Parallels and the beta of VMware Fusion. There’s also some QEMU-based packages, but last I tried that was very unsatisfactory, both slow and unstable.

Parallels

The good:

  • A real shipping product!
  • Relatively inexpensive
  • Good Windows integration (drivers, keyboard and mouse handling, filesystem integration)

The bad:

  • No 64-bit guest support — 32-bit only
  • Guests can only use one CPU core
  • No guest tools support for Linux; GUI desktops are slow and awkward to use
  • No snapshots
  • Snapshotting is the ability to save the state of the virtual machine, run it further, then return to the saved state. You can use this to roll back installation of experimental software, for instance. *Very* useful when developing and testing software, for obvious reasons.

    Since this has been part of VMware Workstation for some time, I had hoped to find it also in…

    VMware Fusion

    The good:

    • Based on the mature VMware engine
    • Portability of VMs to and from VMware Workstation and Player on other platforms
    • 64-bit guest support
    • Dual-processor guest support
    • Guest tools & drivers for Linux and some other Unix clients as well as Windows
    • Limited support for snapshots

    The bad:

    • Still in beta; there’s no shipping product and you can expect problems.
    • Last I checked, networking was horribly broken, but that may be better on beta 2 (need to try it more)
    • As of beta 2 only allows a single snapshot per VM

    That single snapshot limitation is *horrible* from my perspective; it’s totally arbitrary and wrecks much of the usefulness of it.

    An example of one of my prime uses for snapshots on VMware Workstation was maintaining a single copy of Windows XP in both IE 6 and IE 7 states; I could switch back and forth between them at will, while still using the snapshotting for more local changes. That’s something I couldn’t do with only a single snapshot available — I’d have to install two separate copies, which would imply a second license. And then I’d still be stuck with only a single snapshot for all my debugging uses!

    The quick fix

    For now I’ve wiped the disk and installed Ubuntu Linux, so I can run VMware Workstation for Linux. I’ve got the full range of snapshotting features available, and can still use my laptop for Mac client testing and all the other happy shiny Mac OS X goodness.

    Of course there were some installation issues… ;)

    iMac vs Ubuntu

    • Distorted screen at native resolution with VESA video driver (proprietary ATI driver works once fiddled with a bit)
    • Installation fails on setup of GRUB bootloader with a Boot Camp dual-boot configuration; you have to wipe the disk and install a DOS partition map
    • Sound doesn’t work
    • Doesn’t seem to wake from Suspend
    • … and probably others ;)

    Hopefully Parallels will catch up or they’ll get proper snapshotting into Fusion and I can someday reinstall Tiger (or perhaps Leopard by then), but in the mean time it looks pretty rockin’ on my desk and VMware actually works!