Some installer tasks

Been poking about at MediaWiki, but not sure what to do? Here’s a few tasks that would help with some common problems for third-party users:

  • 1379: Install can’t find config/index.php
    Some hosting services put a “control panel” of some sort at the “/config” URL, making it difficult to get at the MediaWiki installer. Renaming this to something more unique, and providing a compatibility link for convenience, would not be very hard but would help people stuck on this sort of host.
  • 9954: Detect “extra whitespace” / BOM conditions
    PHP is very picky about extra whitespace at the start and end of source files. Unfortunately it’s not uncommon for people to end up with extra blank lines or a hidden Unicode BOM sequence at the start or end of files they’ve customized. This leads to weird, hard to diagnose problems like cookies not getting set or RSS feeds that break with little explanation. Some software support to detect this and report which file is broken (and how to fix it!) would be very helpful.
  • 10387: Detect and handle ‘.php5’ extension environments
    More and more hosting services are providing PHP 5.x, but some are putting it alongside existing PHP 4.x services, requiring that files be named with a .php5 extension. With a little care, the installer could detect this out of the box and set things up to work on such systems with few problems.
    Update 2007-06-28: Edward Z. Yang whipped up a good patch for this, which I’ve commited to trunk.

Synergy vs gnome-screensaver

I’ve been using Synergy to share my mouse & keyboard between my Linux desktop and Mac laptop in the office.

One of the features of Synergy that hasn’t been working so well for me is the screen-saver synchronization. I’m not too picky, but I do want to be able to quickly lock both screens at once so I can leave the room without leaving a bunch of server terminals open to anyone who walks in!

After a little research, I found that Synergy’s X11 server code looks explicitly for Xscreensaver, but Ubuntu ships with gnome-screensaver, which has a different interprocess control API based on DBUS. This is apparently an issue of much contention, as a lot of video players and other apps haven’t updated to speak the new protocol, and you end up with screen savers activating during long-playing files and such.

One possibility is to manually reconfigure Ubuntu to use Xscreensaver, but it would probably be cuter to add support for the DBUS interface to Synergy.

Leverage your synergy

On Rob’s advice, I set up Synergy to share my keyboard and mouse between my Linux and Mac boxes at the office.

Pretty straightforward to set up (if you’re a *nix geek); I had just one nasty surprise. If you’re sharing a keyboard from a PC server to a Mac OS X client, it switches the alt and command keys for you.

That might be a cute option if you’re using a PC keyboard, where Alt and Windows keys appear in the opposite order from the Mac’s Alt/Option and Command keys. Not so cute if you’re using a Mac keyboard and want things to remain sensible.

Luckily, it’s pretty easy to switch them back. In the screen section of the config file for the Mac client, add these options:

		super = alt
		alt = super

It seems to consider ‘super’ and ‘meta’ to be almost the same, but if you say ‘meta’ here it gets confused — you get two option keys and no command key.

Wikimedia in Google Summer of Code

Wikimedia’s been accepted as a mentoring organization for the 2007 Google Summer of Code program.

Here’s our organization page, and I put up an initial project list on meta.

The list is semi-protected so it won’t be too vandalized ;) but additional suggestions are welcome. I’d like to ask that people who aren’t directly involved in development not add too much to the main page directly, though; last year we ended up with lots of project submissions for things that weren’t really considered high priority, so I’d like to keep the list a little more ordered this time.

We don’t know for sure how many projects we’ll get assigned, so we’ll see. :) At least Tim and I will serve as mentors for the student projects; if a couple more experienced developers would like to help out with that too that would be super.

Last year’s projects went really well up to the public demo stage but never quite got integrated into the mainline; I’m hoping that this year we can stick with projects that will be easier to slip in and take live much earlier in the process, which should help keep the students interested and the projects active.

I hate^H^H^H^Hlove you, Subclipse

Inspired by river’s addition of PostgreSQL support, I was gonna make a few quick changes to mwdumper. I figured I’d get Eclipse and the Subclipse SVN plugin set up on my 64-bit Linux workstation so I’d have a decent Java IDE to work on it in.

Well… no.

Neither with the version of Eclipse that ships with Ubuntu Feisty, nor with a fresh copy of it from eclipse.org… when I try to check out from SVN, and get to the final stage, it just… stops. No error message, no explanation. Just the wizard’s done and I’ve got no project.

I… hate… computers.

Update: Mark Phippard explained the secret in a comment — you can check out a project from the SVN Repository browse view, and it works! Thanks, Mark!

Fun with mb_strlen

I noticed the fallback implementation for mb_strlen() that we had in GlobalSettings.php sucked:

	function mb_strlen( $str, $enc = "" ) {
		preg_match_all( '/./us', $str, $matches );
		return count($matches);
	}

There are two things to note about this code:

  1. It doesn’t actually work, because no matches are done — it always returns 1
  2. Even if you fix it to return the matches, it’s extremely slow and will eat lots of memory by creating a giant array of every character in the (potentially quite long) string

I’m replacing this with a new version which uses PHP’s count_chars() function to count up the ASCII-compatible bytes and multibyte sequence head bytes. It’s still a smidge slower than mb_strlen but it’s… much better than the old one.

	/**
	 * Fallback implementation of mb_strlen, hardcoded to UTF-8.
	 * @param string $str
	 * @param string $enc optional encoding; ignored
	 * @return int
	 */
	function new_mb_strlen( $str, $enc="" ) {
		$counts = count_chars( $str );
		$total = 0;

		// Count ASCII bytes
		for( $i = 0; $i < 0x80; $i++ ) {
			$total += $counts[$i];
		}

		// Count multibyte sequence heads
		for( $i = 0xc0; $i < 0xff; $i++ ) {
			$total += $counts[$i];
		}
		return $total;
	}

Some quick benchmarks using the UTF-8 normalization benchmark pages (code):

Testing washington.txt:
              strlen      31526 chars    0.007ms
           mb_strlen      31526 chars    0.114ms
       old_mb_strlen      31526 chars 4813.686ms
       new_mb_strlen      31526 chars    0.132ms

Testing berlin.txt:
              strlen      36320 chars    0.001ms
           mb_strlen      35899 chars    0.129ms
       old_mb_strlen      35899 chars 6328.748ms
       new_mb_strlen      35899 chars    0.127ms

Testing bulgakov.txt:
              strlen      36849 chars    0.001ms
           mb_strlen      20418 chars    0.076ms
       old_mb_strlen      20418 chars 3003.042ms
       new_mb_strlen      20418 chars    0.133ms

Testing tokyo.txt:
              strlen      36244 chars    0.001ms
           mb_strlen      19936 chars    0.071ms
       old_mb_strlen      19936 chars 2623.109ms
       new_mb_strlen      19936 chars    0.131ms

Testing young.txt:
              strlen      36694 chars    0.001ms
           mb_strlen      16676 chars    0.063ms
       old_mb_strlen      16676 chars 2246.179ms
       new_mb_strlen      16676 chars    0.125ms

Font fun in Gimp on Mac

While whipping up a theme for the nascent Planet Wikimedia, I needed to use the standard font for Wikimedia logos, Gill Sans.

Gill Sans seems to come conveniently preinstalled on Macs, so I opened up my MacBook, symlinked the Mac fonts from /Library/Fonts to /usr/lib/x11/fonts/TTF, and whipped out Gimp… Unfortunately I ran into an old problem I’d forgotten about:

Fonts o doom.png

For some fonts, only the bold-italic version seems to come up in the font list. This time I decided to get to the bottom of it. Poking at the font files, I found that Gill Sans comes as a single .dfont file instead of the more traditional bundle of .ttfs. While Gimp/Freetype was happy to read the file, it appears to only pick up one of the style variants — in this case bold italic.

Some googling turned up this page which included a hint that you can extract .ttf files out of a .dfont using the utility fondu. Conveniently this is available as a fink package, so in a couple minutes I was able to replace the .dfont symlinks in /usr/lib/x11/fonts/TTF with separated .ttf files. Restart Gimp, and presto!

Fonts o fun.png

Virtualization on Mac

Click me, I'm deliciousWe’ve got some new machines in for the Wikimedia staff, and among them a shiny Core 2 Duo iMac has found its way to my desk as my in-office development workstation. Yum!

Doing web development, I need to have access to a number of operating systems for testing purposes: Linux servers, Windows clients, Windows servers, Linux clients, Mac clients, and the occasional other oddity.

In theory, at least, an Intel-based Mac should be the ideal environment to run this: test the Mac clients on the main OS, and everything else running in virtualization at full speed. The new Core 2 Duo boxes are further capable of running both i386 and x86_64 guest OSs, for full coverage.

With this in mind I’ve fiddled around for a while with the main desktop-level virtualization packages on the Mac to get a feel for what’s available… unfortunately the field isn’t very thick.

Basically there’s Parallels and the beta of VMware Fusion. There’s also some QEMU-based packages, but last I tried that was very unsatisfactory, both slow and unstable.

Parallels

The good:

  • A real shipping product!
  • Relatively inexpensive
  • Good Windows integration (drivers, keyboard and mouse handling, filesystem integration)

The bad:

  • No 64-bit guest support — 32-bit only
  • Guests can only use one CPU core
  • No guest tools support for Linux; GUI desktops are slow and awkward to use
  • No snapshots
  • Snapshotting is the ability to save the state of the virtual machine, run it further, then return to the saved state. You can use this to roll back installation of experimental software, for instance. *Very* useful when developing and testing software, for obvious reasons.

    Since this has been part of VMware Workstation for some time, I had hoped to find it also in…

    VMware Fusion

    The good:

    • Based on the mature VMware engine
    • Portability of VMs to and from VMware Workstation and Player on other platforms
    • 64-bit guest support
    • Dual-processor guest support
    • Guest tools & drivers for Linux and some other Unix clients as well as Windows
    • Limited support for snapshots

    The bad:

    • Still in beta; there’s no shipping product and you can expect problems.
    • Last I checked, networking was horribly broken, but that may be better on beta 2 (need to try it more)
    • As of beta 2 only allows a single snapshot per VM

    That single snapshot limitation is *horrible* from my perspective; it’s totally arbitrary and wrecks much of the usefulness of it.

    An example of one of my prime uses for snapshots on VMware Workstation was maintaining a single copy of Windows XP in both IE 6 and IE 7 states; I could switch back and forth between them at will, while still using the snapshotting for more local changes. That’s something I couldn’t do with only a single snapshot available — I’d have to install two separate copies, which would imply a second license. And then I’d still be stuck with only a single snapshot for all my debugging uses!

    The quick fix

    For now I’ve wiped the disk and installed Ubuntu Linux, so I can run VMware Workstation for Linux. I’ve got the full range of snapshotting features available, and can still use my laptop for Mac client testing and all the other happy shiny Mac OS X goodness.

    Of course there were some installation issues… ;)

    iMac vs Ubuntu

    • Distorted screen at native resolution with VESA video driver (proprietary ATI driver works once fiddled with a bit)
    • Installation fails on setup of GRUB bootloader with a Boot Camp dual-boot configuration; you have to wipe the disk and install a DOS partition map
    • Sound doesn’t work
    • Doesn’t seem to wake from Suspend
    • … and probably others ;)

    Hopefully Parallels will catch up or they’ll get proper snapshotting into Fusion and I can someday reinstall Tiger (or perhaps Leopard by then), but in the mean time it looks pretty rockin’ on my desk and VMware actually works!