Performance tuning a server in less than three minutes while being slashdotted
So you wrote a blog post about something that seemed fairly innocuous, but for whatever reason, it caught the attention of one of the major sites and now your server load is at 110 and climbing, the ssh command line session is taking thirty seconds to respond to anything at all, and given that your post is on the front page of slashdot at primetime, this doesn’t look like it’s a temporary blip. What do you do?
Okay, first things first. You don’t have time to do a proper fine-tuning session. You need a quick & dirty tuneup here. Proper fine tuning you leave till after the traffic spike, and you can then come back at it with a plan and decent tools like siege and so on – but this is the “fix it in three minutes” version. So if you see stuff here that looks crude to you, despair not, everything looks crude when you’re trying to do it in that kind of timeframe.
First, you have to be able to actually use the server command line. The load’s at 110 – that’s not coming down anytime soon and until it does you can’t do the rest of the work you need to since it takes so much time for even a shell to respond. The load is being caused by Apache and MySQL taking on more work than the server can handle, causing the server to swap excessively; and you’ve got to dump that work or shut off the incoming new work to recover. You can try sudo killall -9 apache2 if you can get an open ssh terminal in to do the job (and it’s the first thing you should try), but the odds are that that server has to be reset. Whether that means a phone call to the data centre, or a walk down the hall to the server room, or just clicking a button in a web interface, that’s the first thing to do. Don’t hold off, because unless everyone stops reading your page right now, that load’s not coming down.
Once the box has rebooted (and I mean immediately – sit there watching ping until it comes back), ssh in and shut down apache. MySQL is okay for now, but the work is coming from Apache, so that has to get shut down until everything’s ready to go again. You’re going to lose a few minutes of service, yes, but that’s recoverable from for a blog (and if this is for something more serious than a blog, you’re going to be in trouble for not properly spec’ing the hardware to begin with anyway, so limit the damage now and take your licks later).
At this point — whether you’ve just logged in or whether you managed to run killall successfully — if it’s a debian server you should run sudo /etc/init.d/apache2 stop (did I mention I love debian for having scripts like that? No man-paging apachectl, just an easy to remember standard interface for every service on the box. Wonderful). It’ll tidy up from killall, or it’ll shut down apache cleanly, depending on how you got here.
I’m going to use my server as the example here, by the way, since it’s what got burned last night and it’s what prompted me to write this refresher — it’s been two or three years since I last had to maintain a server that was near its capacity, the experience was a bit of a flashback 🙂 So, some background on my server – I moved my blog from wordpress.com to here, on a Hetzner server. It’s their entry-level dedicated server offering 2Gb of RAM, a 64-bit Athlon, 160Gb of hard drive in a hardware RAID-1 array and a 1Gbit NIC) — all running Debian Lenny (and no, I’ve no relationship with Hetzner, they were just the cheapest of the places various friends recommended). WordPress is up to date on my server (2.9.2 at the time of writing), as is the Lenny install — if you don’t have the latest security fixes and such in place, or your WordPress is outdated, then that’s probably adding to your problem, but for a quick fix like this, that’s too big a job. Get through the traffic spike and deal with it later.
And yes, that server spec is overkill for my needs really – but I had a bunch of side projects like RangeClerk (don’t bother, not much is up yet) and the blog for Herself Indoors and her book and some other things I wanted to run as well that would be using wierd php and python modules and libraries and the like; and I just hate cpanel and not being able to install anything I wanted. Plus, it was cheap 😀
Right, back to it.
The first thing we need to do is to sort out MySQL’s configuration. Open up the my.cnf file, whereever you’ve put it (it’ll be /etc/mysql/my.cnf for a stock Debian install). We need to tweak just a few settings. First off, key_buffer. This is probably the most critical variable here because by default all the tables will be MyISAM tables (if they’re not, then this isn’t so critical). It’s set to about 16Mb by default; we’re going to turn that up quite a bit. On a dedicated database box, this would be set very high – anything up to 50% of the total available memory. In this case, with a full stack on the one box, we set it a bit lower since Apache’s going to want a lot of RAM too – 256Mb will do for a starting value.
Next we’re going to disable the InnoDB engine completely to cut down on MySQL’s footprint. Again, WordPress by default isn’t using it. Just ensure skip-innodb is either uncommented or inserted into my.cnf.
Lastly, we’re going to enable the query cache. The thing is, MySQL’s query cache is a fairly blunt instrument. If a query is precisely the same the second time it comes in, it’ll hit the cache – but any change at all, no matter how small, and it misses the cache. So it’s not as enormously useful as you’d first imagine. However, it does help, so we’ll increase its size modestly (48Mb of RAM is sufficient here). So our changes to my.cnf look like so:
[cc lang=”ini”]
key_buffer = 256M
query_cache_limit = 16M
query_cache_size = 48M
skip-innodb[/cc]
Once those changes are made, sudo /etc/init.d/mysql restart will get the MySQL server up and running with the new setup. Once that’s done, let’s look to the next level in the stack – Apache. Under debian the config files are arranged differently than normal; the configuration changes we’ll make will be in /etc/apache2/apache2.conf but in other installations they would be in httpd.conf or elsewhere.
The default Apache install uses the prefork MPM setup – one thread per process. It’s older, slower, less efficient, but less buggy than the worker MPM which isn’t threadsafe. So find the prefork MPM config settings in apache2.conf. They should look like this in a default install:
[cc escaped=”true” lang=”apache”]
<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0
</IfModule>[/cc]
We’re going to cut down a lot on how much work Apache takes on at once here. Yes, some users will have to wait a few seconds to see your page – but right now, with the load at 110 and climbing, they could wait until their browser timed out and they’d never see anything. So we reduce slightly the number of servers that Apache will farm off to handle requests at any one time from 5 to 4; we’ll increase the number of spare servers it’ll keep around to hand off requests to (we want to reduce the overhead of starting and stopping those processes) from 10 to 12. We’ll set an upper limit on how many we can have though, and we’ll keep it to just under 100. This works on my system, which is an entry-level system; you might get away with more, but for now use these settings and it’ll get you up and running and you can increase a bit and check again as you go (and this guide really isn’t aimed at big sites anyway, just small ones like mine which were caught on the hop). We’re also going to ensure no apache process takes on too much at once by creating a limit of how many requests any process can take on – we’ll keep it low for now (3), but it can be increased later. So our changed config settings now look like this:
[cc escaped=”true” lang=”apache”]<IfModule mpm_prefork_module>
StartServers 4
MinSpareServers 4
MaxSpareServers 12
ServerLimit 96
MaxClients 96
MaxRequestsPerChild 3
</IfModule>
[/cc]
Okay. At this stage, you have two options. The first is to start Apache up again and get back to work. Odds are, this will hold up pretty well – but you want to keep a window open with htop running in the background to keep an eye on things (and mainly you’re watching the swap space usage and the load. The former’s critical, the latter indicative that a problem’s arising – if either go sideways, kill apache and edit apache2.conf setting even lower values for ServerLimit, MaxClients and MaxRequestsPerChild before restarting apache). If that’s your preferred option, skip to the end of this post.
However, if you want to take that extra step, we could install memcached quickly here. It’s a very effective load reducer and under debian, it’s far easier than you’d expect:
[cc lang=”bash”]sudo aptitude install build-essential php5-devel php-pear memcached[/cc]
And let that haul in whatever other libraries it needs, then:
[cc lang=”bash”]pecl install memcached[/cc]
And once that’s done, edit the php.ini file (in Debian, that’ll be /etc/php5/apache2/php.ini ) and insert this (anywhere in the file will do, but the extensions section is the tidiest):
[cc lang=”ini”]extension=memcache.so[/cc]
That should be memcached installed and running in a default configuration (we can finetune later). We now need to drop in the backend that WordPress uses to take advantage of memcached. Download object-cache.php and copy it into the wp-content directory of your website and change the permissions and ownership of the file:
[cc lang=”bash”]cd [insert your www/wp-content directory here]
sudo wget http://plugins.trac.wordpress.org/export/215933/memcached/trunk/object-cache.php
sudo chown www-data.www-data object-cache.php
sudo chmod 644 object-cache.php[/cc]
And that’s it done. Quick, dirty, and everything at default, but that’s a three-minute setup for you (well, maybe five if you do the memcached setup as well, and I am assuming you have a fast net connection for the aptitude step, but still).
Now, restart apache and everything should fire up with memcached caching a lot of requests and keeping the server load to a managable level.
[cc lang=”bash”]sudo /etc/init.d/apache2 force-reload
sudo /etc/init.d/apache2 restart[/cc]
And once that traffic spike is past… take the time to tune it properly!
Just in case, you were using APC right?
The WP SuperCache plugin is another quick win for WordPress users.
Hi,
I don’t really get the MaxRequestsPerChild 3 setting here. This setting defines[1] how often the child is killed and respawned (in your case after every three request). Doesn’t killing / spawning children so often actually hurt performance here?
[1] http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxrequestsperchild
Another idea for that kind of temporary hit on one unique page that I think would save you more than memcache and that would work on any site no matter whether you have a memcache plugin ready to be installed, is to just save the html of the page into some static html file and force-serve that for the particular url that gets all the hits with a RewriteRule /url/blah /path/to.html [L]
Cheers,
Jordi
MaxRequestsPerChild 3 …. what??
I would set that to 1000. If you only let a child serve 3 requests before dying and reforking, you’re going to spend a lot of time in system calls making and destroying processes.
But to tune Apache properly, you need to install a modern web server like Nginx… hah.
Hi : ) nice article
Depending on the website and what article is causing the problem you could consider even much dirtier and more efficient tuning hack : )
To push performance to the limit you could add a temporary rewrite rule to the htaccess and place static html of that page somewhere in web root heheehhe ultimate caching performance 😉
Sure it an useless hack for dynamic websites but still ….. if its just one article that is killing the server and it will last for a day you might get away with some banners/tags/menus etc being cached statically.
In the mean time you can do all the above and consider real caching + ‘useless calls reductions’ but hey … your page is showing already so no pressure : )
on the other note, setting MaxRequestsPerChild = 3 kind of kills the purpose here i guess. what it does is not process cant handle more than 3 concurrent requests but process can handle 3 requests and then dies. when it dies a new process will be spawned so efectively you get new apache spawn every 3 hits : / ouch defaut 0 means process will not die and can run forever. Or did i get something wrong here?
Art
We’ve seen quite a few of these spikes at several of our sites, and the first times around on a new vm there was one thing that killed us each time: swapping. As soon as Apache gets tired and hogs up a bit more memory than you initially planned and the traffic comes in, things will start swapping. And when things start swapping, you’re out of luck.
I’d also suggest taking a look at varnish, which you can drop in quite quickly with the standard settings to be able to handle a very high load. The default configuration (depending on the version you install, probably) will however skip any cache handling for requests with cookies, which may come from tracking scripts such as google analytics. There’s pre-made rules to get around that problem for certain cookies, or you can just allow it to cache all content, regardless of any current cookies.
The key to getting varnish up and running is to simply move apache to port 81, set that apache as the backend in varnish and put varnish on port 80.
Nope, APC was on the “to-do list”. The problem was, this server’s my side project site, and work has been eating every waking hour for a few months so progress along the TDL has been seriously slow.
WP SuperCache _is_ installed; but since this was a three-minute guide, several good measures weren’t mentioned. Installing WP-SuperCache or WP-TotalCache, changing out from Apache to lighttp or nginx, installing more RAM (turns out I can’t do that with this box 🙁 ) — all very good approaches which I’ve tried in the past, but you can’t get them done in three minutes 😀 The whole point of this post was to be the server version of first aid, not scheduled surgery, and if you’d ever seen or received CPR, you know exactly what I mean when I say that first aid’s not exactly clean and tidy 😀
In practice here, increasing MaxRequestsPerChild in the prefork MPM drove load up, not down. It’s not supposed to do that under normal usage, you’re completely correct. I have two theories as to why it acted oddly:
Yup, that works too… until someone tries to comment on that page, at which point it goes a bit sideways.
But yes, for anything that can be made static, that’s how to do it. Here though, WP-SuperCache was doing that and the load was still passing 130 at times (I’m actually surprised the server was still responsive at the shell at all at those points).
Yup, like I said to Mikko, it’s very odd. But what I saw here was that when that was increased (I had it set to 2000 at one point), load shot up much faster; turning it down to single digits improved performance. I have theories as to why; I don’t know for certain and it’s kindof bugging me a bit if I’m blunt about it.
Jordi beat you to it Artur 😀
Doesn’t work so well here, because comments were on, but at the worst point I used WP-SuperCache’s lockdown function which does almost exactly what you’re suggesting, but in an easier-to-manage way.
Yup Mats, it’s the swap that kills you every time! That’s the idea behind ServerLimit – don’t let Apache run too many processes and trigger that first swap, because if it swaps once, odds are that that’s the first stumble and a few minutes later the whole server faceplants at speed.
Varnish is something I’ve been talking about with Conor (the sysadmin for boards.ie) actually, but it’s not really a three-minute first aid thing. Even memcached is a bit of a stretch for the first aid mindset 😀
Performance tuning a server in less than three minutes while being slashdotted…
So you wrote a blog post about something that seemed fairly innocuous, but for whatever reason, it caught the attention of one of the major sites and now your server load is at 110 and climbing, the ssh command line session is taking thirty seconds to …
[…] didn’t notice at the time, what with all the reddit fun, and then all the fun that happened on the server as a result of all the reddit fun, but @susan_lanigan kindly pointed out that I’ve been […]
[…] Versionen auf dem System zu haben und die aktuelle Anwendung bzw ihre Unit-Tests damit zu testen Performance tuning a server in less than three minutes while being slashdotted Netter Artikel wie man kurzfristig schnell seinen Server pimpen kann und vor dem Zusammenbruch […]
[…] From stochasticgeometry.ie […]
Next to WP Super Cache I can recommend Semiologic Cache from the SemPro package. It has good memcache support and goes the needed bit better than WP Super Cache. But that’s the problem: Don’t blame Apache / MySql for the misuse WordPress plays on them.
Some more information about caching on wp can be found over there @ ask apache.
[…] the rest here: Performance tuning a server in less than three minutes while being slashdotted | Stochastic Geometry Tags: apache, mysql, performance, […]
You want the processes torn down rapidly instead of hanging about waiting on someone who’s hit the stumble button twenty seconds ago and is now off browsing fark while you wait on them.
I have to disagree here. The control method for that scenario is KeepAliveTimeout. I would suggest changing the default 15 to much lower, such as 2 seconds. This prevents the server from keeping a process reserved for 15 seconds.
Forking processes can put a serious load on your server. It’s dog slow, and really bogs down your server especially if it has to fork dozens of processes fast, like when you’re slashdotted. You only want to tear down processes when they’re leaking memory, which you mentioned. Therefore, I have found the following pattern works very well.
Set MinSpareServers to around 20 so you have enough processes started and waiting to serve. Set MaxSpareServers a bit higher, such as 30. Set MaxRequestsPerChild to 1000 or something high, so you only kill processes when they done plenty of work. MaxClients should be set to the amount of processes your apache can run without swapping, and definitely not too high. You can calculate this number by dividing the amount of RAM available to Apache by the maximum memory used by any Apache process. ServerLimit is set automatically and can be removed from the config.
Also, do not underestimate the effect of a byte-code cache such as APC, or Zend Optimizer+. It will avoid the PHP engine having to compile your PHP application for each and every request and will speed up each request quite a lot, at the expense of some memory. It wil help a lot.
The rest of your tips are great, so thanks for the article.
Performance tuning a server in less than three minutes while being slashdotted…
So you wrote a blog post about something that seemed fairly innocuous, but for whatever reason, it caught the attention of one of the major sites and now your server load is at 110 and climbing, the ssh command line session is taking thirty seconds to …
GREAT! Messed up the GD apache2.conf file.
Seems that by following your tutorial on performance tuning, I mistakenly changed the instead of the one you suggested. Now whenever I hit restart in debian squeeze for apache2 and mysql, everything seems ok.
Except when I goto my site on the net where there is justy a blank screen.
NOW what do I do?