I did a quick look last night through our database error logs for the last week or so, breaking them down by function and error type. Here’s the top ten function-err loci:
Hits | Function | errno | Error |
---|---|---|---|
620 | Article::updateCategoryCounts | 1213 | Deadlock found when trying to get lock; Try restarting transaction |
240 | Article::insertOn | 1062 | Duplicate entry ‘N-XXX’ for key 2 |
41 | Article::doDeleteArticle | 1213 | Deadlock found when trying to get lock; Try restarting transaction |
26 | LinksUpdate::incrTableUpdate | 1213 | Deadlock found when trying to get lock; Try restarting transaction |
19 | TitleKey::prefixSearch | 1030 | Got error 28 from table handler |
9 | Title::invalidateCache | 1213 | Deadlock found when trying to get lock; Try restarting transaction |
9 | 2013 | Lost connection to MySQL server during query | |
8 | User::saveSettings | 1205 | Lock wait timeout exceeded; Try restarting transaction |
8 | TitleKey::prefixSearch | 2003 | Can’t connect to MySQL server on ‘XXX’ |
7 | Job::pop | 1213 | Deadlock found when trying to get lock; Try restarting transaction |
A large chunk of our DB errors are from conflicting transactions; the number one spot is currently taken up by updates to category counts, which is often part of an expensive page deletion transaction.
We’re often pretty lazy about rerunning database transactions when they’re rolled back, throwing an error and making the end-user resubmit the change. This is kind of lame, but at least the transaction rollback theoretically keeps the database consistent.
The number two spot seems to be for conflicting page creations — possibly due to automatic resubmissions after a slow save operation.
There’s a few “disk full” errors, which were probably due to a transitory error on one DB box.