On Sun, 2006-02-26 at 16:34 -0800, Benson Wong wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase.
1. Use a PHP cache:
I found that using APC speeds up Drupal by a lot, 3 to 5 times the pages view per second. This was _literally_ a 5 minute install (on FreeBSD) for a 300% to 500% performance improvement. I think at that point it was my dev servers SATA HDDs were the bottle neck. It sits beside me and when I hit it with ab, I can hear the HDDs wrrrrr like crazy.
This one works... really works... is a must for any busy drupal site.
2. Use mod_gzip (or ob_compress or whatever it is in php, I prefer mod_gzip, or mod_deflate in Apache2)
The benefits of this are amazing, considering the minimal effort it takes to implement. If doesn't matter if it takes Drupal 0.002 seconds to generate a 40K of html, if it takes like 1 to 2 seconds for a client to download it (more if using a modem). mod_gzip usually gives a 10% to 80% compression depending on the size of pages. Amazing results for 10 minutes of work.
I've never used mod_gzip for performance, mainly as a cost control tool.
3. Get a faster DB server. I'm thinking of 3x15K SCSI (raid 5), dual way xeon mysql server from freebsdsystems.com for my next installation. These things rip . Expensive (~$5K to $7K) but fast. An average Drupal dev charges like $100USD these days right? A super fast db server is still more bang for your buck than 50 to 70hrs of code performance tuning.
This will be most effective for large consistant throughput with multiple webservers. A local database server is far faster in terms of throughput as long as it is not tight on memory, and has a sizable query cache. That being said, if you are going to implement a seperate DB server it is best down over a private gigabit network.
4. Get faster (or more) Web Servers. Maybe not the same specs as above, but fast anyways. If you already have the PHP cache, and database server seperated, you should concentrate your resources on processors and memory, fast drives are less important at this juncture of the configuration since most everything will be cached...
You will also need to figure out how you are going to handle files. I've played with two approaches 1) NFS shared files directory... 2) external server for file store.(requires a good bit of coding but can be done)
5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load
I haven't tested drupal behind a true load balancer, but did have difficulties with sessions and authenticate behind a geographically distributed squid cluster... If anyone has had success with drupal behind squid please share that on drupal.org. Similar file issues apply.
6. Do MySQL replication
At present you can only use mysql replication as a hot standby, drupal does not support mysql replication at present because it does not differentiate between read and write queries.
7. Profile / tune Drupal's code (shudder).
I think the community as whole works at tuning the code, and caching the results of complex functions where possible. Removing/disabling all unused modules may help a teensy bit.
Ben.
memcache and sqlrelay are other possibly helpful tools in various configurations. .darrel.