Was out sick for a couple days, catching up now before the FOSDEM rush.
Got a report that en.planet.wikimedia.org hadn’t been updating since January 30; turned out to be a UTF-8 BOM problem — one of the maintainers saved the Planet config.ini file in an editor that added a Unicode byte-order-marker character.
In theory, the BOM is a great idea. It’s a particular character (U+FEFF) which can act as a signature to tell the computer what variety of Unicode encoding the file is using — UTF-16LE, UTF-16BE, UTF-32, or UTF-8.
In practice, in the Unix-y web world, UTF-8 is king… and lots of things that eat UTF-8 don’t know what the hell a BOM is. We can’t use them in our PHP code because PHP sends them to output, corrupting your headers or binary data. We can’t use them in our Planet configuration files because the Python config file class gets confused by it.
Back in the 1990s they told us Unicode was going to be so easy… *wistful sigh*