[drupal-devel] Drupal performance testing
Hello, we, Trellon and CivicSpace Labs, are doing Drupal performance testing for a client. We will be sharing the results with the community. We identified three areas to focus on for performance testing. 1) First we would like to reduce the memory requirements of running Drupal so that we can run more Apache processes. -Please advise if there are particular configurations of Drupal that you recommend to reduce memory usage. -Are there particular contributed modules that we should avoid because of memory issues. We are looking at flexinode, event, location, and event finder, as well as organic groups. We are looking at Robert Douglas and Mike Gifford work on memory profiling: http://cvs.drupal.org/viewcvs/drupal/contributions/ sandbox/robertdouglass/profiler/, and http://lists.drupal.org/ archives/drupal-devel/2005-08/msg00314.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00347.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00344.html 2) We would like implement a better caching strategy. We are aware of several implementations. Once the site becomes an active community of posters caching become less useful. -We are planning on using Jeremy Andrews caching patch for testing. -We would welcome other suggestions. 3)MySQL specific optimizations- John Paul Ashenfelter is a very experienced performance tuner and is heading up the scalability project. -We are interesed in MySQL cache optimizations. -In particular MySQL replication topologies as well as using the Memory table type for some of the problematic data. -If you have experience tuning large or high performance sites with MySQL we would appreciate your insight. We are looking at results from RPMs that have been turned for scalability and security as defined here: http://www.sourcelabs.com/ SourceLabsApacheMySQLPHPTestResults.pdf. This tells us that our limitations are processors for PHP support and RAM for MySQL support in the testing environment. As always we need technical savvy people who can help with documentation or are willing to help produce diagrams to prove to potential users that Drupal can scale. Your help and comments are appreciated. Kieran
you definately want to test with and node_access module like organic groups if you intend to use one in production. the presence of this sort of module adds some complexity to some key queries. any large php site that dreams of being 'google fast' has to use an opcode cache application. see http://drupal.org/node/2603. this is the biggest bang for the buck improvement you can make. on the downside, it is rumored to segfault under certain conditions and i saw that a bit myself when uploading files into drupal.
2) We would like implement a better caching strategy. We are aware of several implementations. Once the site becomes an active community of posters caching become less useful. -We are planning on using Jeremy Andrews caching patch for testing. -We would welcome other suggestions.
I don't think there is a caching patch in the queue. just test with HEAD Drupal and you have our best effort. the traffic pattern matters a lot here. lots of registered users yields a far heavier load than lots of anon users (this is the slashdot scenario). Drupal's highest performing cache setting is called 'Loose'. Any high traffic site will want this, so i see no reason to test otherwise. At one time, we thought that there must be a bug in its implementation because drupal.org was not experiencing a good cache hit ratio. In the end, I think we saw that some misbehaved crawlers and hosts were slamming the site and requesting many obscure pages not in the cache. One of them was AskJeeves! Lots of discussion starting with this post: http://lists.drupal.org/archives/infrastructure/2005-05/msg00123.html and conturing into the next month ... This sort of optimization really requires drupal knowledge and not just 'optimization experience.'
3)MySQL specific optimizations- John Paul Ashenfelter is a very experienced performance tuner and is heading up the scalability project. -We are interesed in MySQL cache optimizations. -In particular MySQL replication topologies as well as using the Memory table type for some of the problematic data. -If you have experience tuning large or high performance sites with MySQL we would appreciate your insight.
i recently enabled the mysql cache on some of my sites and saw a big boost. i encourage folks to read an intro at http://dev.mysql.com/tech-resources/articles/mysql-query-cache.html. once you've enabled this, i think the DB becomes a minor performance concern. Of course that depends a lot on what modules you use and how much data you have (thousands of terms, thousands of users, etc.)
On 19 Aug 2005, at 04:23, Moshe Weitzman wrote:
Drupal's highest performing cache setting is called 'Loose'. Any high traffic site will want this, so i see no reason to test otherwise. At one time, we thought that there must be a bug in its implementation because drupal.org was not experiencing a good cache hit ratio. In the end, I think we saw that some misbehaved crawlers and hosts were slamming the site and requesting many obscure pages not in the cache. One of them was AskJeeves! Lots of discussion starting with this post: http://lists.drupal.org/archives/ infrastructure/2005-05/msg00123.html and conturing into the next month ... This sort of optimization really requires drupal knowledge and not just 'optimization experience.'
The current loose caching scheme doesn't work for drupal.org, where lots of nodes/comments get posted. I modified drupal.org's cache_clear_all() function to ignore 19 out of 20 cache flushes. Without that hack, drupal.org would be have fallen over already. That is, loose caching still needs to be looked at. -- Dries Buytaert :: http://www.buytaert.net/
On 19 Aug 2005, at 04:23, Moshe Weitzman wrote:
you definately want to test with and node_access module like organic groups if you intend to use one in production. the presence of this sort of module adds some complexity to some key queries.
There are dozens of paramaters (MySQL configuration, Apache configuration, opcode cache, MySQL cache, hardware, single-server vs multi-server setup, Drupal version, Drupal modules installed, Drupal settings, etc.) What we really need is a script or tool that allows us to benchmark a particular setup. The script/tool should take an input file with one or more navigation patterns or scenario's to simulate/benchmark. We could use the same script to evaluate the performance impact of certain code changes. Then, with the script/ tool we can benchmark and tune existing configurations and document our knowledge in a document. In addition to such tool, we need tools to profile your site while being benchmarks. Some profiling can be done by using certain Drupal modules/patches (eg. like the one I made to profile the cache behavior). For other things, you might need other tools. -- Dries Buytaert :: http://www.buytaert.net/
On Fri, 19 Aug 2005, Dries Buytaert wrote:
On 19 Aug 2005, at 04:23, Moshe Weitzman wrote:
you definately want to test with and node_access module like organic groups if you intend to use one in production. the presence of this sort of module adds some complexity to some key queries.
There are dozens of paramaters (MySQL configuration, Apache configuration, opcode cache, MySQL cache, hardware, single-server vs multi-server setup, Drupal version, Drupal modules installed, Drupal settings, etc.) What we really need is a script or tool that allows us to benchmark a particular setup.
Ack. This way you could change your setup and check how performance changes.
The script/tool should take an input file with one or more navigation patterns or scenario's to simulate/benchmark.
I had come across tools that allow specification of such patterns. Typical patterns should ideally be extracted from the site in question as each site is different. We'd need a script to extract them from the accesslog.
We could use the same script to evaluate the performance impact of certain code changes.
Right.
Then, with the script/ tool we can benchmark and tune existing configurations and document our knowledge in a document.
In addition to such tool, we need tools to profile your site while being benchmarks. Some profiling can be done by using certain Drupal modules/patches (eg. like the one I made to profile the cache behavior). For other things, you might need other tools.
While doing some benchmarks on redLED's machines earlier this year I used a script that monitored about 165 system parameters over time. Should be enough info for everybody... Cheers, Gerhard
is rumored to segfault under certain conditions and i saw that a bit myself when uploading files into drupal.
Rumored? eAccelerator was tried at one on Drupal to decrease load and was a total success -- it segfaulted Apache regurarly. No Apache, no load :). However, since then new eAccelerator releases have appeared but as far as I know, Kjartan have not tried this again.
Friday, August 19, 2005, 12:05:08 PM, Karoly Negyesi wrote:
is rumored to segfault under certain conditions and i saw that a bit myself when uploading files into drupal.
Rumored? eAccelerator was tried at one on Drupal to decrease load and was a total success -- it segfaulted Apache regurarly. No Apache, no load :).
However, since then new eAccelerator releases have appeared but as far as I know, Kjartan have not tried this again.
So in the interest of science, I installed the latest version of eAccelerator. From a quick scan of their forums it still has some nasty bugs, but I hope they are rare enough to use it now. -- Kjartan <kjartan@zind.net> :: "A program is a spell cast over a computer, turning input into error messages."
On Thu, 18 Aug 2005, Kieran Lal wrote:
1) First we would like to reduce the memory requirements of running Drupal so that we can run more Apache processes. -Please advise if there are particular configurations of Drupal that you recommend to reduce memory usage. -Are there particular contributed modules that we should avoid because of memory issues. We are looking at flexinode, event, location, and event finder, as well as organic groups. We are looking at Robert Douglas and Mike Gifford work on memory profiling: http://cvs.drupal.org/viewcvs/drupal/contributions/ sandbox/robertdouglass/profiler/, and http://lists.drupal.org/ archives/drupal-devel/2005-08/msg00314.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00347.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00344.html
All enabled modules are loaded for registered users, therefore the less big modules you have the better your memory use pattern. The ultimate memory use pattern is probably attainable through Karoly's split patch. Or no modules. ;)
2) We would like implement a better caching strategy. We are aware of several implementations. Once the site becomes an active community of posters caching become less useful. -We are planning on using Jeremy Andrews caching patch for testing.
That patch is in core and has not really proven it is usefull for drupal.org, but it apparently is for other setups (kerneltrap).
-We would welcome other suggestions.
I had suggested to base the invalidation of caching on whether the new version of a cached page is actually different from the cached page. The problem is that now the cache is invalidated each time somebody posts a comment or a new node. This posting does however not neccessarily make the node/nnn page of another node look any different unless there is a block that lists new posts or new comments. A change that would be needed for such an improved caching system is the block configuration to return the path patterns where a particular block appears and to introduce a cache hook into modules because modules know best if the content or views of content they provide changes by a particular new post and can then invalidate the cached pages or not. Unfortunately Dries has rejected this proposal as "won't work" without giving detailed reasons. Since I don't run high performance sites, I do not have a personal itch to scratch and didn't develop the approach to a patch.
3)MySQL specific optimizations-
Moshe's suggestion about MySQL cache looked very interesting. Unfortunately the cache probably will increase MySQL's memory usage even more. Cheers, Gerhard
Kieran Lal wrote:
Hello, we, Trellon and CivicSpace Labs, are doing Drupal performance testing for a client. We will be sharing the results with the community. We identified three areas to focus on for performance testing.
1) First we would like to reduce the memory requirements of running Drupal so that we can run more Apache processes. -Please advise if there are particular configurations of Drupal that you recommend to reduce memory usage. -Are there particular contributed modules that we should avoid because of memory issues. We are looking at flexinode, event, location, and event finder, as well as organic groups.
Hi Kieran. IMHO, one of the main problems about Drupal performance and scalability is that memory requirements, which just grow as you enable more modules. So my option to keep memory low is usually to use the less modules possible. About how much load each module adds you can have a general idea by looking at how much php code it includes, for which plain file size is good indicator. I've started some work in this direction, but I dont think it will be ready for version 4.7. However I've posted a simple proof of concept patch, this one: http://drupal.org/node/27901. Another concern I have about memory usage is templated themes vs plain php ones. I'd like to get some real data on this, as for now is just a guess. Hope this helps.
Here is a summary of advice: 1. Memory profiling. -Look at the size of the code in a module -Look at Robert Douglas's profiler module: http://cvs.drupal.org/ viewcvs/drupal/contributions/sandbox/robertdouglass/profiler/ -Look at Karoly's split patch: 2. MySQL -Try MySQL memory Cache -Implement MySQL replication topologies to support more postings 3. Caching -Get a better heuristic for cache_clear_all() function than ignoring 19 out of 20 calls. -Get results from Jeremy Andrews fuzzy logic caching patch -Get an experienced Drupal Admin to look at the traffic patterns. You might have a rogue crawler searching everything and not taking advantage of your caching. 4. PHP tools -Use the opcode cache: http://drupal.org/node/2603 -Test eaccelerator: http://eaccelerator.net/ 5. Drupal specific - Test node access with Organic Groups under a heavy load -Consider Gerhard's idea to base the invalidation of caching on whether the new version of a cached page is actually different from the cached page. Get block configuration to return the path patterns where a particular block appears and to introduce a cache hook into modules. Cheers, Kieran On Aug 18, 2005, at 5:07 PM, Kieran Lal wrote:
Hello, we, Trellon and CivicSpace Labs, are doing Drupal performance testing for a client. We will be sharing the results with the community. We identified three areas to focus on for performance testing.
1) First we would like to reduce the memory requirements of running Drupal so that we can run more Apache processes. -Please advise if there are particular configurations of Drupal that you recommend to reduce memory usage. -Are there particular contributed modules that we should avoid because of memory issues. We are looking at flexinode, event, location, and event finder, as well as organic groups. We are looking at Robert Douglas and Mike Gifford work on memory profiling: http://cvs.drupal.org/viewcvs/drupal/contributions/ sandbox/robertdouglass/profiler/, and http://lists.drupal.org/ archives/drupal-devel/2005-08/msg00314.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00347.html http://lists.drupal.org/archives/drupal-devel/2005-08/msg00344.html
2) We would like implement a better caching strategy. We are aware of several implementations. Once the site becomes an active community of posters caching become less useful. -We are planning on using Jeremy Andrews caching patch for testing. -We would welcome other suggestions.
3)MySQL specific optimizations- John Paul Ashenfelter is a very experienced performance tuner and is heading up the scalability project. -We are interesed in MySQL cache optimizations. -In particular MySQL replication topologies as well as using the Memory table type for some of the problematic data. -If you have experience tuning large or high performance sites with MySQL we would appreciate your insight.
We are looking at results from RPMs that have been turned for scalability and security as defined here: http://www.sourcelabs.com/ SourceLabsApacheMySQLPHPTestResults.pdf. This tells us that our limitations are processors for PHP support and RAM for MySQL support in the testing environment.
As always we need technical savvy people who can help with documentation or are willing to help produce diagrams to prove to potential users that Drupal can scale.
Your help and comments are appreciated.
Kieran
On 31 Aug 2005, at 01:23, Kieran Lal wrote:
3. Caching -Get a better heuristic for cache_clear_all() function than ignoring 19 out of 20 calls. -Get results from Jeremy Andrews fuzzy logic caching patch
+ Jeremy's fuzzy cache logic has been refactored. These two points have been dealt with. + I'd add: create a performance testing script/suite. + Some of the points on your list are well understood and described in the literature. -- Dries Buytaert :: http://www.buytaert.net/
Wednesday, August 31, 2005, 1:23:30 AM, Kieran Lal wrote:
-Test eaccelerator: http://eaccelerator.net/
I tried this before we moved drupal.org to the new hardware and it still segfaults apache now and then, much rarer than with earlier versions but still happening once every 48 hours or so. -- Kjartan <kjartan@zind.net> :: "C makes it easy to shoot yourself in the foot. C++ makes it harder, but when you do, it blows away your whole leg." - Bjarne Stroustrup
participants (7)
-
Dries Buytaert -
Gerhard Killesreiter -
Jose A. Reyero -
Karoly Negyesi -
Kieran Lal -
Kjartan Mannes -
Moshe Weitzman