We have sysstat installed (SAR tool) And what we see nothing in it besides some high load average until everything is stucked. It doesn't succeed to log the real situation when it comes to it, since server doesn't have any resources... No iowait noticed at all. Everything is between php/mysql. Server has good dsics, raid1 and iowait is getting to 10% maximum in bad times..also I run iostat from time to time to see what goes on. I also think change io scheduling but imo the current problem has nothing with io. Dmesg also doesn't point to anything like that. We are trying to dump the apache/mysql status once there is a problem (through mytop/server-status and other things) the problem is that it's too late in most cases and nothing "cooperates" with us.
---------- Forwarded message ---------- From: Khalid B <kb@2bits.com> Date: Wed, 11 Oct 2006 07:05:58 -0400 Subject: Re: [development] 2k qps To: development@drupal.org
I have another client site that shoots up to 3,000 QPS, but I don't have the statistics of page access for it. This one has 4 Xeons 3.2GHZ each and 4GB.
Write a simple script to run vmstat every 15 seconds or so, log the output to a file. Keep it running until the "hang" happens, and then see what it tells you about the system. Are you out of CPU? Do you have too many blocked processes or runable processes? Do you have a lot of wait on i/o? Are you swapping to the point of thrashing.
Only after you do this analysis you should decide whether to upgrade the RAM or just tune things.
Regarding decreasing the number of Apache processes, yes, that means you can only serve so many users, but you prevent the chaos caused by swapping. Think of it as a popular restaurant. If they let in more people than they have tables/seats, would the customer be happy? Would the staff cope with it?