We’ve been battling Cyrus database corruption on OS X Panther for longer than I care to admit.
We don’t have an exceptional number of users (something a little north of 500) but some of their mailboxes are quite larger (ie: greater than 2GB). Just restarting the Panther mail services can take over an hour to do it’s integrity check and bring IMAP up. Reconstructing a corrupted Cyrus DB takes even longer (several hours in fact).
To compound that pain, every-time we reconstruct the DB the read status of all of the users mail gets reset. So not only is mail down for the duration of the repair, but even if the corruption happens off hours our users will know about it because all of their mail is marked as unread when they log in the next time.
Generally this was a lousy place to be. Especially with a service as essential to our users as mail. Clearly something had to change, we’d even begun to investigate alternatives to Cyrus and/or the OS X mail service (Courier, CommunigatePro, etc).
So anyway, the other day I was doing some searching and came across a number of posts that identified stability issues with Movable Type and Berkeley DB on Panther. These may have been specific to MT rather than BDB but it got me thinking.
I started looking through the info-cyrus mail list archives and I came across several posts that seemed to indicate that using BDB was not recommended any more. Instead they recommended using skiplist.
Now this was getting good! A quick search turned up this post on AFP548 demonstrating how to modify a script from Apple to change your Panther Cyrus install to use skiplist instead of BDB. The interesting thing was that the original script is used by the Tiger installer to switch your Cyrus DB if you are upgrading from Panther.
Cool! We where planning on upgrading to Tiger anyway, so this gave us that extra needed push. We scheduled some downtime, took a snapshot of the Server boot volume, and ran the Tiger upgrade and the subsequent updates (to Tiger 10.4.4). Everything went swimingly.
We started the mail servers and mail came up – not in hours – not in minutes – in seconds. I was blown away. Other than having to replace our shared ical directories from backup all of the services worked without a hitch.
It’s too soon to tell if this has returned long-term stability to the box, but so far the results are very encouraging. If you have been battling the Cyrus DB stability issues on Panther, I highly recommend looking at upgrading to Tiger or at very least changing from BDB to skiplist using the script above.