TODO: MediaWiki’s MySQL search backend

Some problems and solutions…

Problem 0: Wildcard searches don’t work

  • This was fixed in 1.12! You can search for “releas*” and match both “release” and “releasing” etc.

Problem 1: Minimum length and stopwords

  • People don’t like when their searches can’t turn up their favorite acronyms and such
    You can tweak the MySQL configuration… server-wide… if you have enough permissions on the server…

  • We can hack a transformation like we do for Unicode: append x00 or such to small words to force them to be indexed.

Problem 2: The table crashes sometimes

  • People often get mystified when the searchindex table is marked crashed.
  • Catch the error: try a REPAIR TABLE transparently, and display a friendlier error if that fails.

Problem 3: Separate title and text search results are ugly and hard to manage

  • People are used to Google-style searches where you just get one set of results which takes both title and body text into account.
  • Merge the title into the text index and return one set of results only.

Problem 4: Needs to join to ‘page’ table

  • The search does joins to the ‘page’ table to do namespace & redirect filtering and to return the original page title for result display. These joins can cause ugly slow locks, mixing up the InnoDB world with the MyISAM world.
  • Denormalize: add fields for namespace, original title, and redirect status to ‘searchindex’ table.