Looks like we had a problem this morning with our domain redirection configuration, which broke access to the site for at least some people for a while.
Wikimedia has a lot of non-default domains registered, which we set up as redirects to the various primary domains — for instance www.wikipedia.com redirects to www.wikipedia.org, the standard location for Wikipedia’s multilingual entry portal.
This is handled by setting up a special Apache web server virtual host configuration which accepts connections for all the domains we don’t actually host wikis on — this virtual host has a bunch of mod_rewrite settings which go through and decide which domain to send the request on to. It returns an HTTP redirect response to the browser, which then goes on to the correct site.
For efficiency, many of these responses are declared to be cacheable (“301 Moved Permanently”), since they always send on to the same spot. This means that multiple hits to the same redirected URL will make use of our Squid proxy caching layer, reducing traffic to our backend servers.
The unfortunate thing is that if the configuration gets messed up and people are sent to the *wrong* URL, that’s also cached. An accidental breakage in the redirect config file was made this morning while maintaining it, creating some redirect loops for URLs which weren’t supposed to redirect in the first place.
To fix it, we’ve been restarting the Squid proxies and clearing their caches to ensure that all bad redirects are flushed out of the system.
As part of our ongoing mission to create permanent fixes to known site maintenance problems, we’re pushing up some improvements already on our list but not yet reached:
- Proper version control for the relevant config files
- Staging server for web server configuration changes — something we can test against in the live environment but which doesn’t pollute the primary web caches if it breaks while we’re testing it