Top 10 Wikimedia DB errors

I did a quick look last night through our database error logs for the last week or so, breaking them down by function and error type. Here’s the top ten function-err loci:

Hits Function errno Error
620 Article::updateCategoryCounts 1213 Deadlock found when trying to get lock; Try restarting transaction
240 Article::insertOn 1062 Duplicate entry ‘N-XXX’ for key 2
41 Article::doDeleteArticle 1213 Deadlock found when trying to get lock; Try restarting transaction
26 LinksUpdate::incrTableUpdate 1213 Deadlock found when trying to get lock; Try restarting transaction
19 TitleKey::prefixSearch 1030 Got error 28 from table handler
9 Title::invalidateCache 1213 Deadlock found when trying to get lock; Try restarting transaction
9 2013 Lost connection to MySQL server during query
8 User::saveSettings 1205 Lock wait timeout exceeded; Try restarting transaction
8 TitleKey::prefixSearch 2003 Can’t connect to MySQL server on ‘XXX’
7 Job::pop 1213 Deadlock found when trying to get lock; Try restarting transaction

A large chunk of our DB errors are from conflicting transactions; the number one spot is currently taken up by updates to category counts, which is often part of an expensive page deletion transaction.

We’re often pretty lazy about rerunning database transactions when they’re rolled back, throwing an error and making the end-user resubmit the change. This is kind of lame, but at least the transaction rollback theoretically keeps the database consistent.

The number two spot seems to be for conflicting page creations — possibly due to automatic resubmissions after a slow save operation.

There’s a few “disk full” errors, which were probably due to a transitory error on one DB box.