Wikipedia WAP portal updated

We’ve got a semi-experimental mobile portal for Wikipedia, based on the Hawpedia code using Hawhaw, that’s been up for a while.

I’ve updated it to the current version of the code, which seems to handle some templates better, as well as producing proper output for iPhone viewing. :)

Today’s fancy phones with their fancy browsers (the iPhone, Opera Mini, etc) can do a pretty good job handling the “real web” in addition to the stripped-down limited “mobile web” of yesteryear, but there are different pressures, which one should take into account when targeting mobile devices.

Screens are small, bandwidth is low. Wikipedia articles tend to be very long and thorough, but often all you need for an off-the-cuff lookup is the first couple paragraphs. The WAP gateway splits pages into shorter chunks, so you don’t have to wait to download the entire rest of the page (or wait for the slow phone CPU to lay it out).

Even on an iPhone capable of rendering the whole article and the MonoBook skin in all its glory, I find there’s some strong benefits to a shorter, cleaner page to do quick lookups on the go. (Especially if I’m not on Wifi!)

The biggest problem with the Hawpedia gateway today is that it tries to do its own hacky little wiki text parser, which dies horribly at times. Many pages look fine, but others end up with massive template breakage and become unreadable.

Long-term it may be better to do this translation at a higher level, working from the output XHTML… or else in an intermediate stage of MediaWiki’s own parser, with more semantic information still available.

Case-insensitive OpenSearch

I did some refactoring yesterday on the title prefix search suggestion backend, and added case-insensitive support as an extension.

The prefix search suggestions are currently used in a couple of less-visible places: the OpenSearch API interface, and the (disabled) AJAX search option.

The OpenSearch API can be used by various third-party tools, including the search bar in Firefox — in fact Wikipedia will be included by default as a search engine option in Firefox 3.0.

I’m also now using it to power the Wikipedia search backend for Apple’s Dictionary application in Mac OS X 10.5.

We currently have the built-in AJAX search disabled on Wikimedia sites in part because the UI is a bit unusual, but it’d be great to have more nicely integrated as a drop-down into various places where you might be inputting page titles.

The new default backend code is in the PrefixIndex class, which is now shared between the OpenSearch and AJAX search front-ends. This, like the previous code, is case-sensitive, using the existing title indexes. I’ve also got them now both handling the Special: namespace (which only AJAX search did previously) and returning results from the start of a namespace once you’ve typed as far as “User:” or “Image:” etc.

More excitingly, it’s now easy to swap out this backend with an extension by handling the PrefixSearchBackend hook.

I’ve made an implementation of this in the TitleKey extension, which maintains a table with a case-folded index to allow case-insensitive lookups. This lets you type in for instance “mother ther” and get results for “Mother Theresa”.

In the future we’ll probably want to power this backend at Wikimedia sites from the Lucene search server, which I believe is getting prefix support re-added in enhanced form.

We might also consider merging the case-insensitive key field directly into the page table, but the separate table was quicker to deploy, and will be easier to scrap if/when we change it. :)

MediaWiki security bump

Did a security release yesterday: MediaWiki 1.11.1, 1.10.3, and 1.9.5. I noticed that some of the output formats for api.php are susceptible to HTML injection through a longstanding problem with Internet Explorer’s content-type autodetection.

We’ve had protection against this in action=raw mode for years, but it didn’t make it into the API as nobody had quite noticed that some output formats were vulnerable. JSON and XML-based formats aren’t, but PHP serialization and YAML don’t escape strings as much as we might like.

If the format lets you pass some raw HTML tags, and you can stick an additional fake path after the script name in the URL (as allowed by most configurations), MSIE opens up a big XSS hole on your site.

Path components in URLs are supposed to be opaque; the HTTP content-type header is the only thing that’s supposed to specify what kind of resource you’re loading. Microsoft thinks it knows better, though — if it recognizes one of several pre-defined “extension”s at the end of the “filename” on the URL, it sniffs the file’s actual content to try to determine the file type. If it sees certain HTML tags, it’ll interpret it as HTML — even for valid GIF and PNG files!

(Rumor is that last hole has finally been fixed in recent Windows security updates; GIF and PNG headers will override the HTML detection. I haven’t tested to confirm this though.)

For “raw” and “API” type stuff where we have to pass through user data, we can protect against the autodetection by ensuring the URL hasn’t been tampered with. Having both an unrecognized URL and an unrecognized content-type keeps the content sniffy away… That’s why you currently get a ‘403 – Forbidden’ if you just toss ?action=raw on the end of a page URL.

User-to-user mail SPF and privacy borkage

Per bug 12655

On our newer, Ubuntu-based Apache configuration we’ve been using sSMTP as a minimal local SMTP sending agent. This emulates the ‘sendmail’ binary and simply passes messages off to a hub server with no local queuing… but it’s not without its problems.

sSMTP forces the message’s ‘From’ header and the SMTP envelope sender address to be the same, which causes some problems for us when that ‘From’ address is a user’s offsite e-mail:

  • Servers which validate SPF records may reject the messages outright
  • In case of delivery problems, bounce messages will be sent back to the user, possibly including the recipient’s address which is supposed to be kept private.

As a workaround for such configurations I’ve introduced a config var $wgUserEmailUseReplyTo. When set, a wiki-specific address is used as ‘From’, and the user’s address is put in ‘Reply-To’.

This is uglier — you don’t see a clean ‘Sender’ column in your mail client — but mails will get through and private data won’t get tossed around inappropriately.

In the long term I’d like to see us either dump sSMTP (a local-only postfix or something would work fine) or patch it to let the envelope sender be set separately.

Mobile MediaWiki

I’d like to revive some interest in improving support for mobile browsers.

Extremely limited WAP-based browsers are at least sort of served by the experimental WAP gateway, but there are a lot of smartphones and other handheld devices that get on the “real” web with greater or lesser degrees of success, and I’d like to see us improve the default look & feel of MediaWiki on them.

At the moment I think we can roughly divide the mobile browsers into two categories:

  • Those that render much like a full desktop browser and let you zoom as necessary (iPhone/Mobile Safari, Opera Mini, …?)
  • Those that have very limited CSS and JavaScript or strip a lot of stuff down (Opera Mini in “mobile view” mode, most others?)

At the moment, all I’ve got access to are an iPhone and the Opera Mini simulator applet, so that’s what I’ll be putting the occasional bit of time into. These already pretty much “just work”, but the UI can be very awkward due to the desktop-size layout; I’d like a cleaner handheld stylesheet that lets most pages be legible when you get to them.

If you’ve got another device and you’d like to help testing and developing for it, please stake your claim.

Alternatively if you’ve got a spare device you can donate to us, that’d be great too! (Especially if it doesn’t need a service subscription to get on the net…)

Google Transit WTF?

One of the great joys of my life is Google Maps. It’s attractive, fast, easy to use, usually gives workable directions, and lets you do cute things like customize your route with a simple drag-and-drop (ooooh).

Plus it’s built in to iPhones now. :D

But… When you’ve gone carless in the big city, the driving directions aren’t always helpful — you want to find the best metro or bus route to take.

There’s a fairly complete online trip planner for Bay Area transit at 511.org, but it fails in every way that Google Maps succeeds — it’s ugly, slow, confusing, and if you want to adjust to an alternate route it’s almost impossible to figure out how.

Google keeps taunting me with a little link on Maps to “take public transit” which never ever is able to find any directions from anywhere to anywhere.

Eventually I discovered that it does give you directions… but only if you put in exact train stations as your start and end points. If I already know which stations I need, I hardly need a transit planner now do I?

Even if it did work, here in San Francisco it includes BART but not MUNI, which has more in-town rail lines and about a hojillion buses and thus is far more likely to be what I’d take.

Update 2008-05-08: It actually works now!

Mandatory Apple reactions

Like every other Apple fanboy, of *course* I have to post my reactions to the MacWorld announcements…

iPhone updates: Firmware updates are welcome, but nothing earth-shattering.

AppleTV movie rentals: Potentially very cool… The price point is about even with going to a BlockBuster store without having to get off your ass. I will say though I’ve gotten pretty comfortable with Netflix’s flat monthly fee and huge selection.

Limited selection for TV is the reason I haven’t used my AppleTV much in months except for playing music in the living room… My shows are available on iTunes, but half of my lady’s aren’t, so we ended up getting cable.

If the selection’s decent and downloads actually *do* start “in seconds”, the new AppleTV software should be perfect for spur of the moment rentals, but if we’re already paying a flat fee for Netflix I don’t have much incentive to use it frequently… unless I have to see something *now* I can put it on my queue and wait.

I’m assuming of course that the software update will come for older AppleTV units…

MacBook AIR: A year or two ago this would have been the answer to my prayers — the fairly compact form factor of the MacBook, while thinner and lighter than my first love, the PowerBook G4 12″. I’m a bit leery of the lack of an optical drive and losing some of the wired ports, but it’d make a great travel / conference / meeting machine and everything but the FireWire can be replaced with USB extras for “what do you mean, there’s no WiFi?” emergencies.

I have the suspicion though that the iPhone’s going to eat up a lot of my computer-on-the-go requirements; it’s already got mail and web, and an official SDK should let us see extra apps come in (chat, organizers, games ;) that lessen the need to lug a laptop around town.

Seeing Apple lurch towards solid-state drives is verrrrrrrrry exciting, but the cost is still high and the capacities too small for a primary-use computer (my iPhoto or iTunes libraries *each* would fill the optional 64GB SSD, and they’re only going to get bigger).

Now if we can just get the pervasive connectivity that the iPhone delivers built in to the laptops…

Sweeeet

Hadn’t noticed this before… on Leopard, when you do a window screenshot (command-shift-4, space) it now captures the window’s drop shadow over a transparent background.

Shadow! Shit yeah

That’s pretty cool for demo screenshots; I used to use temporary white backgrounds and capture an area around the window manually, but this is way prettier. :D