Case-insensitive OpenSearch

I did some refactoring yesterday on the title prefix search suggestion backend, and added case-insensitive support as an extension.

The prefix search suggestions are currently used in a couple of less-visible places: the OpenSearch API interface, and the (disabled) AJAX search option.

The OpenSearch API can be used by various third-party tools, including the search bar in Firefox — in fact Wikipedia will be included by default as a search engine option in Firefox 3.0.

I’m also now using it to power the Wikipedia search backend for Apple’s Dictionary application in Mac OS X 10.5.

We currently have the built-in AJAX search disabled on Wikimedia sites in part because the UI is a bit unusual, but it’d be great to have more nicely integrated as a drop-down into various places where you might be inputting page titles.

The new default backend code is in the PrefixIndex class, which is now shared between the OpenSearch and AJAX search front-ends. This, like the previous code, is case-sensitive, using the existing title indexes. I’ve also got them now both handling the Special: namespace (which only AJAX search did previously) and returning results from the start of a namespace once you’ve typed as far as “User:” or “Image:” etc.

More excitingly, it’s now easy to swap out this backend with an extension by handling the PrefixSearchBackend hook.

I’ve made an implementation of this in the TitleKey extension, which maintains a table with a case-folded index to allow case-insensitive lookups. This lets you type in for instance “mother ther” and get results for “Mother Theresa”.

In the future we’ll probably want to power this backend at Wikimedia sites from the Lucene search server, which I believe is getting prefix support re-added in enhanced form.

We might also consider merging the case-insensitive key field directly into the page table, but the separate table was quicker to deploy, and will be easier to scrap if/when we change it. :)