Fun with mb_strlen

I noticed the fallback implementation for mb_strlen() that we had in GlobalSettings.php sucked:

	function mb_strlen( $str, $enc = "" ) {
		preg_match_all( '/./us', $str, $matches );
		return count($matches);
	}

There are two things to note about this code:

  1. It doesn’t actually work, because no matches are done — it always returns 1
  2. Even if you fix it to return the matches, it’s extremely slow and will eat lots of memory by creating a giant array of every character in the (potentially quite long) string

I’m replacing this with a new version which uses PHP’s count_chars() function to count up the ASCII-compatible bytes and multibyte sequence head bytes. It’s still a smidge slower than mb_strlen but it’s… much better than the old one.

	/**
	 * Fallback implementation of mb_strlen, hardcoded to UTF-8.
	 * @param string $str
	 * @param string $enc optional encoding; ignored
	 * @return int
	 */
	function new_mb_strlen( $str, $enc="" ) {
		$counts = count_chars( $str );
		$total = 0;

		// Count ASCII bytes
		for( $i = 0; $i < 0x80; $i++ ) {
			$total += $counts[$i];
		}

		// Count multibyte sequence heads
		for( $i = 0xc0; $i < 0xff; $i++ ) {
			$total += $counts[$i];
		}
		return $total;
	}

Some quick benchmarks using the UTF-8 normalization benchmark pages (code):

Testing washington.txt:
              strlen      31526 chars    0.007ms
           mb_strlen      31526 chars    0.114ms
       old_mb_strlen      31526 chars 4813.686ms
       new_mb_strlen      31526 chars    0.132ms

Testing berlin.txt:
              strlen      36320 chars    0.001ms
           mb_strlen      35899 chars    0.129ms
       old_mb_strlen      35899 chars 6328.748ms
       new_mb_strlen      35899 chars    0.127ms

Testing bulgakov.txt:
              strlen      36849 chars    0.001ms
           mb_strlen      20418 chars    0.076ms
       old_mb_strlen      20418 chars 3003.042ms
       new_mb_strlen      20418 chars    0.133ms

Testing tokyo.txt:
              strlen      36244 chars    0.001ms
           mb_strlen      19936 chars    0.071ms
       old_mb_strlen      19936 chars 2623.109ms
       new_mb_strlen      19936 chars    0.131ms

Testing young.txt:
              strlen      36694 chars    0.001ms
           mb_strlen      16676 chars    0.063ms
       old_mb_strlen      16676 chars 2246.179ms
       new_mb_strlen      16676 chars    0.125ms

Font fun in Gimp on Mac

While whipping up a theme for the nascent Planet Wikimedia, I needed to use the standard font for Wikimedia logos, Gill Sans.

Gill Sans seems to come conveniently preinstalled on Macs, so I opened up my MacBook, symlinked the Mac fonts from /Library/Fonts to /usr/lib/x11/fonts/TTF, and whipped out Gimp… Unfortunately I ran into an old problem I’d forgotten about:

Fonts o doom.png

For some fonts, only the bold-italic version seems to come up in the font list. This time I decided to get to the bottom of it. Poking at the font files, I found that Gill Sans comes as a single .dfont file instead of the more traditional bundle of .ttfs. While Gimp/Freetype was happy to read the file, it appears to only pick up one of the style variants — in this case bold italic.

Some googling turned up this page which included a hint that you can extract .ttf files out of a .dfont using the utility fondu. Conveniently this is available as a fink package, so in a couple minutes I was able to replace the .dfont symlinks in /usr/lib/x11/fonts/TTF with separated .ttf files. Restart Gimp, and presto!

Fonts o fun.png

Virtualization on Mac

Click me, I'm deliciousWe’ve got some new machines in for the Wikimedia staff, and among them a shiny Core 2 Duo iMac has found its way to my desk as my in-office development workstation. Yum!

Doing web development, I need to have access to a number of operating systems for testing purposes: Linux servers, Windows clients, Windows servers, Linux clients, Mac clients, and the occasional other oddity.

In theory, at least, an Intel-based Mac should be the ideal environment to run this: test the Mac clients on the main OS, and everything else running in virtualization at full speed. The new Core 2 Duo boxes are further capable of running both i386 and x86_64 guest OSs, for full coverage.

With this in mind I’ve fiddled around for a while with the main desktop-level virtualization packages on the Mac to get a feel for what’s available… unfortunately the field isn’t very thick.

Basically there’s Parallels and the beta of VMware Fusion. There’s also some QEMU-based packages, but last I tried that was very unsatisfactory, both slow and unstable.

Parallels

The good:

  • A real shipping product!
  • Relatively inexpensive
  • Good Windows integration (drivers, keyboard and mouse handling, filesystem integration)

The bad:

  • No 64-bit guest support — 32-bit only
  • Guests can only use one CPU core
  • No guest tools support for Linux; GUI desktops are slow and awkward to use
  • No snapshots
  • Snapshotting is the ability to save the state of the virtual machine, run it further, then return to the saved state. You can use this to roll back installation of experimental software, for instance. *Very* useful when developing and testing software, for obvious reasons.

    Since this has been part of VMware Workstation for some time, I had hoped to find it also in…

    VMware Fusion

    The good:

    • Based on the mature VMware engine
    • Portability of VMs to and from VMware Workstation and Player on other platforms
    • 64-bit guest support
    • Dual-processor guest support
    • Guest tools & drivers for Linux and some other Unix clients as well as Windows
    • Limited support for snapshots

    The bad:

    • Still in beta; there’s no shipping product and you can expect problems.
    • Last I checked, networking was horribly broken, but that may be better on beta 2 (need to try it more)
    • As of beta 2 only allows a single snapshot per VM

    That single snapshot limitation is *horrible* from my perspective; it’s totally arbitrary and wrecks much of the usefulness of it.

    An example of one of my prime uses for snapshots on VMware Workstation was maintaining a single copy of Windows XP in both IE 6 and IE 7 states; I could switch back and forth between them at will, while still using the snapshotting for more local changes. That’s something I couldn’t do with only a single snapshot available — I’d have to install two separate copies, which would imply a second license. And then I’d still be stuck with only a single snapshot for all my debugging uses!

    The quick fix

    For now I’ve wiped the disk and installed Ubuntu Linux, so I can run VMware Workstation for Linux. I’ve got the full range of snapshotting features available, and can still use my laptop for Mac client testing and all the other happy shiny Mac OS X goodness.

    Of course there were some installation issues… ;)

    iMac vs Ubuntu

    • Distorted screen at native resolution with VESA video driver (proprietary ATI driver works once fiddled with a bit)
    • Installation fails on setup of GRUB bootloader with a Boot Camp dual-boot configuration; you have to wipe the disk and install a DOS partition map
    • Sound doesn’t work
    • Doesn’t seem to wake from Suspend
    • … and probably others ;)

    Hopefully Parallels will catch up or they’ll get proper snapshotting into Fusion and I can someday reinstall Tiger (or perhaps Leopard by then), but in the mean time it looks pretty rockin’ on my desk and VMware actually works!

Cardee’s Jr

I grew up in California and have lived pretty much all my life there. The Carl’s Jr. burger chain, itself born in Southern California, has always been a cultural fixture for me in the western United States. Knowing that they didn’t have Carl’s back east was one of those sad little things about moving to Florida that I was going to have to get used to.

Driving down I-10 in the middle of the South somewhere, though, I started seeing a familiar logo on the Gas-Food-Lodging signs. Perhaps fatigue from the long drive had made me hallucinate? But no, I got a good up-close look at one next to a gas station in northern Florida:

Oh look, a Carl'... wha?

Even the web sites are the same… Carl’s vs Hardee’s

Well, a quick peek through the company history pages indicates that Carl Karcher Enterprises bought out the Hardee’s chain in the 1990s, and started rebranding it in the 2000s with the “new look” (apparently the Carl’s logo) and new menu items. It’s kind of… creepy nonetheless. I guess they didn’t want to lose local brand recognition by changing the name, but adopting a different logo seems kind of weird.

Combining the maps from the store locators, it seems there’s no territorial overlap between the Carl’s and Hardee’s chains:

Carls and Hardees maps combined

Only Oklahoma has both; there’s one Hardee’s way out East, but it’s miles away from the nearest Carl’s. Both chains are missing in the Northeast, which is probably why I haven’t stumbled on any Hardee’s back east before…

Looks like there is a Hardee’s in town here in St. Pete, I’ll have to track it down and see if the inside is as eerily familiar as the outside.

Yay!

My Final Cut Studio crossgrade finally arrived… now I can get back to not making cute animation videos because I’m busy doing something else instead of because Motion doesn’t run on my MacBook.

For the record, this is the most awful software upgrade procedure I’ve experienced.

  1. The previous versions of various Apple pro media apps such as Motion don’t run at all on newer, Intel-based machines. BOO!
  2. The various individual products have been discontinued in favor of the Final Cut Studio bundle which includes a bunch of them. You can no longer get just one. So to run your old copy of Motion on your new MacBook, you have to upgrade to the entire bundle… BOO!
  3. …which they offer a huge cross-grade discount on! YAY!
  4. To get your new installation media, you have to mail in your original installation DVD… which will naturally get lost in the mail. BOO!

The good news is, if you have enough documentation you can talk them into replacing your lost media so you can send in for the upgrade again.

fruit ratings

[[Wikipedia:Apricot|Apricots]]: YUMMMMMMM! Easy to slice in half and remove the pit, and verrrry delicious. Dried apricots are also nice and last longer in the cupboard.

[[Wikipedia:Peach|Peaches]]: Similar to its smaller cousin the apricot, but IMHO they’re harder to work with. The pitting vs deliciousness ratio is unfavorable.

[[Wikipedia:Blueberry|Blueberries]]: Pretty awesome when you pick them yourself in the forest. Prepackaged, though, I find them kinda… boring and tasteless. Maybe I just got boring mass-produced Chilean blueberries, though.

[[Wikipedia:Blackberry|Blackberries]]: Upside: yum! Downside: full of little crunchy seeds.