Web Performance Cache I Hates You!

Apache in OS X Server has a feature1 called the web performance cache which is the bane of my existence. It has limited use (mostly for large volume static sites) but it is enabled by default for every new site you create, and enabling it for just one site can effect the behavior of every site on the box.

Invariably someone creates a new site on one of our servers and fails to disable the perf cache. Bad things ensue. I get grumpy.

Server Admin makes it difficult to discover which of the 100 or so sites on our primary web servers have the the perf cache enabled (click-click, options, close, click-click, options, close… sigh). Thankfully there is another way. Fire up the terminal and type:

cat /etc/webperfcache/webperfcache.conf

This command will list the contents of the webperfcache.conf file. Any sites listed will have the perf cache enabled. Turn it off, I beg you.

1 This “feature” has cost me more time than a zillion years worth of performance benefit.

One step forward…

As I mentioned here, we upgraded our mail server to Tiger. All looked good, until we started to see this in our logs:

Potential VM growth in DirectoryService since client PID: 0, 
  has 550 open references when the warning limit is 500.

According to posts in Apple’s OS X Server forum (here, here, and the tail end of here) it looks like this nasty problem involves either servermgrd and/or DirectoryService. The number of open reference continuously grows until it renders your server unusable – usually not considered a good thing.

I don’t have a fix for the problem.

I do however, have an bandaid.

I created a perl script in /usr/local/bin/ called plugleak.pl. The script checks the system.log looking for the tell-tail error message above. If it finds it it restarts both servermgrd and DirectoryService.

Using Lingon (a very nice little GUI for launchd items) I created a periodic task (/Library/LaunchDeamons/com.pimedia.plugleak.plist) to run the plugleak.pl script every 10 minutes.

Not necessarily an elegant solution. But, well, ya do what ya gotta do.

Update

I don’t do a lot of admin support anymore, but I do scan the Mac OS X Server mailing list, and I came across a post that led me to this thread on the Darwin Kernel list that goes a long way to explaining what is going on here.

Very interesting and worth a read.

OS X Cyrus DB corruption fix?

We’ve been battling Cyrus database corruption on OS X Panther for longer than I care to admit.

We don’t have an exceptional number of users (something a little north of 500) but some of their mailboxes are quite larger (ie: greater than 2GB). Just restarting the Panther mail services can take over an hour to do it’s integrity check and bring IMAP up. Reconstructing a corrupted Cyrus DB takes even longer (several hours in fact).

To compound that pain, every-time we reconstruct the DB the read status of all of the users mail gets reset. So not only is mail down for the duration of the repair, but even if the corruption happens off hours our users will know about it because all of their mail is marked as unread when they log in the next time.

Generally this was a lousy place to be. Especially with a service as essential to our users as mail. Clearly something had to change, we’d even begun to investigate alternatives to Cyrus and/or the OS X mail service (Courier, CommunigatePro, etc).

So anyway, the other day I was doing some searching and came across a number of posts that identified stability issues with Movable Type and Berkeley DB on Panther. These may have been specific to MT rather than BDB but it got me thinking.

I started looking through the info-cyrus mail list archives and I came across several posts that seemed to indicate that using BDB was not recommended any more. Instead they recommended using skiplist.

Now this was getting good! A quick search turned up this post on AFP548 demonstrating how to modify a script from Apple to change your Panther Cyrus install to use skiplist instead of BDB. The interesting thing was that the original script is used by the Tiger installer to switch your Cyrus DB if you are upgrading from Panther.

Cool! We where planning on upgrading to Tiger anyway, so this gave us that extra needed push. We scheduled some downtime, took a snapshot of the Server boot volume, and ran the Tiger upgrade and the subsequent updates (to Tiger 10.4.4). Everything went swimingly.

We started the mail servers and mail came up – not in hours – not in minutes – in seconds. I was blown away. Other than having to replace our shared ical directories from backup all of the services worked without a hitch.

It’s too soon to tell if this has returned long-term stability to the box, but so far the results are very encouraging. If you have been battling the Cyrus DB stability issues on Panther, I highly recommend looking at upgrading to Tiger or at very least changing from BDB to skiplist using the script above.