I don't want to hijack the file cache maintainer thread, so I am starting my own thread about this since I have some questions. I did talk to Robert about this in January, and I have done extensive testing with his code. His code patches core files all over the place in order to check memcache if there is a cached version available and then load that first. He has stated at http://drupal.org/project/memcache that "A set of patches that can be applied selectively to Drupal to cache various objects for high performance. The patches are necessary and this cannot be avoided." However, if we are doing object level caching for nodes than we can cache a complete loaded node without the need to hit the database until a node_save, until a new module is enabled or until the max life is hit. If the code in node_load hit the static variable $nodes first, then secondly hit cache_get then we would not need to have a set of patches, and this could be an option in the performance settings. You could then allow the pluggable cache.inc that I just realized has been merged into HEAD (As of January) to handle storage. This would only take a small change to the core modules in a few places. For example in node.module, if (is_numeric($param)) { if ($cachable && isset($nodes[$param])) { return is_object($nodes[$param]) ? drupal_clone($nodes[$param]) : $nodes[$param]; } $cond = 'n.nid = %d'; $arguments[] = $param; } could be changed into something like this: if (is_numeric($param)) { if($cachable) { if (isset($nodes[$param])) { return is_object($nodes[$param]) ? drupal_clone($nodes[$param]) : $nodes[$param]; } //Attempt to load from cache return false on cache miss $node = cache_get($param, 'cache_node'); if($node) { //Load into static $nodes $nodes[$param] = is_object($node) ? drupal_clone($node) : $node; return $node; } } $cond = 'n.nid = %d'; $arguments[] = $param; } I have time to work on this and offer performance testing. I am just not sure how far anyone has gone down these roads or is currently working on it. I have been working on custom module development for clients for about 7 months now and I have not really had time to contribute to the Drupal project personally other than by spreading the good word and having clients use Drupal. This is one area that I feel could be substantially improved upon in order for us to allow Drupal to scale beyond the current ability. According to Dries testing on his website, he has seen Drupal into the 80+ Requests per second area (ref. http://buytaert.net/drupal-webserver-configurations-compared). With some of the testing I have done with caching, I have seen it go as high as 500-700 Requests per second. In my testing out of the box I have seen Drupal do about 10 requests per second, with normal caching about 100 requests per second and with aggressive caching as high as 250 request per second. Thanks, Steve Rude On 3/6/07, Khalid Baheyeldin <kb@2bits.com> wrote:
See this http://drupal.org/project/memcache
Talk to Robert about it before you duplicate it.
On 3/6/07, Steve Rude <steve@achieveinternet.com> wrote:
I have done much testing with memory based object caching using memcached and I think it would be nice if there was the ability to choose what type of caching you wanted (memory/file/db) and have the cache.incfunctionality basically an abstraction layer that the storage mechanism can be taken care of by other include files ( e.g. cache-memcache.inc cache-file.inc. The same way that the database abstraction works.
I would be willing to begin some work on this if anyone is interested. We are building a Drupal site now that will need to scale to >30m page views per month and I would like to work within an accepted method to help grow and scale Drupal.
If anyone has any prior work or projects that can be looked at to help get this going down the correct path, please point me in the right direction.
Thanks,
Steve Rude
On 3/6/07, Karoly Negyesi <karoly@negyesi.net> wrote:
Hi,
I might be mistaken but I thought file caching was something much sought after. I found posts relating to it as early as 2002.
And yet, when the possibility has arisen for it to happen cleanly (pluggable cache.inc), the code for it has been dumped into CVS on August 26, 2006 and not a single cvs commit since then.
It's not used much according to the issues but that probably has something to do with not having a single release.
Is the community truly interested? If yes, then please a maintainer step up.
And yes, I have contacted the current maintainers three months ago to no avail.
Regards,
NK
--
Steve Rude Lead Web Developer
800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com
Achieve Internet is a Division of Web Page Maintenance, Inc.
-- 2bits.com http://2bits.com Drupal development, customization and consulting.
-- Steve Rude Lead Web Developer 800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com Achieve Internet is a Division of Web Page Maintenance, Inc.
Steve Rude wrote:
I don't want to hijack the file cache maintainer thread, so I am starting my own thread about this since I have some questions.
I did talk to Robert about this in January, and I have done extensive testing with his code. His code patches core files all over the place in order to check memcache if there is a cached version available and then load that first. He has stated at http://drupal.org/project/memcache that "A set of patches that can be applied selectively to Drupal to cache various objects for high performance. The patches are necessary and this cannot be avoided."
However, if we are doing object level caching for nodes than we can cache a complete loaded node without the need to hit the database until a node_save, until a new module is enabled or until the max life is hit.
If the code in node_load hit the static variable $nodes first, then secondly hit cache_get then we would not need to have a set of patches, and this could be an option in the performance settings. You could then allow the pluggable cache.inc that I just realized has been merged into HEAD (As of January) to handle storage. This would only take a small change to the core modules in a few places.
When I first suggested that node objects should be cachable I wasn't able to get sufficient support for this and had to accept the static cache as a minor solution. However, now that there are so many very popular Drupal sites, I think that the forces that be might reconsider their position. ;) The only problem is that some modules (notably poll.module) do user specific stuff in the nodeapi load hook. I see no problem to move this to the view hook instead. I already have discussed this with Steven in Brussels and he agreed on this. I'd sure like to have node object caching for drupal.org.
According to Dries testing on his website, he has seen Drupal into the 80+ Requests per second area (ref. http://buytaert.net/drupal-webserver-configurations-compared). With some of the testing I have done with caching, I have seen it go as high as 500-700 Requests per second.
In my testing out of the box I have seen Drupal do about 10 requests per second, with normal caching about 100 requests per second and with aggressive caching as high as 250 request per second.
You should not confuse the page caching we have now (regardless if aggressive or not) with the node object caching you want to add. The node object caching is rather similar to the menu tree caching as it caches objects for both anonymous and authenticated users whereas the page cache is only for anon users. Cheers, Gerhard
When I first suggested that node objects should be cachable I wasn't able to get sufficient support for this and had to accept the static cache as a minor solution. However, now that there are so many very popular Drupal sites, I think that the forces that be might reconsider their position. ;)
That would be excellent. I believe that as a configurable option this would be a great win for Drupal. I will put together some documentation of the performance gains for authenticated and anonymous users. I'd sure like to have node object caching for drupal.org. :) -- Steve Rude Lead Web Developer 800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com Achieve Internet is a Division of Web Page Maintenance, Inc.
On 06 Mar 2007, at 23:01, Gerhard Killesreiter wrote:
When I first suggested that node objects should be cachable I wasn't able to get sufficient support for this and had to accept the static cache as a minor solution. However, now that there are so many very popular Drupal sites, I think that the forces that be might reconsider their position. ;)
I'm fairly convinced that memory-based caching, a la memcached, is the way forward rather than file-based caching. But as mentioned in my previous e-mail some rigorous experiments would help to proof this point. The challenge might be to create a good API that allow people to take advantage of memory-based caching without having to refactor every single module and without having to embed a ton of special cases all over the map. We want something clean, elegant and transparent that gives us a lot of bang. Either way, let's work on this. I'd be happy to help test, review and benchmark but can't take a lead in the development of such functionality. -- Dries Buytaert :: http://www.buytaert.net/
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now. I think the real way forward is object-based caching hooks in core and a pluggable cache interface. Ideally, one would have tiered caching, so that more than one cache mechanism could be plugged in, and various types of caching (object, page, menu, query, etc.) could be assigned to one of the available mechanisms. However, that ideal would be many steps down the road, if ever, due to the complexity involved. Instead, we should at least allow site owners to chose amongst memory, file and database caching at a minimum. As for object-based caching, start with nodes. Eventually, add other common objects, e.g. users, and provide an API(?) to make it easy for contrib modules which create new objects to cache their objects as well.
The API for this is in place. Steve Rude and I will be overhauling the current memcache code to better utilize the built-in cache infrasctructure. Basically, it should all start with a cache_{custom table name} table and use cache_get and cache_set. Module developers are getting wise to the benefits of having their own cache tables (Views), and when I've evaluated and committed Steve's code, we'll have a caching system that first looks to memory based caching and then to db based caching for any given resource. This way, we can approach all caching issues in a general way, and the merits of node caching, for example, can be discussed in the context of Drupal core without having to worry about compatibility with the caching layer. -Robert Chris Johnson wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
I think the real way forward is object-based caching hooks in core and a pluggable cache interface. Ideally, one would have tiered caching, so that more than one cache mechanism could be plugged in, and various types of caching (object, page, menu, query, etc.) could be assigned to one of the available mechanisms. However, that ideal would be many steps down the road, if ever, due to the complexity involved.
Instead, we should at least allow site owners to chose amongst memory, file and database caching at a minimum.
As for object-based caching, start with nodes. Eventually, add other common objects, e.g. users, and provide an API(?) to make it easy for contrib modules which create new objects to cache their objects as well.
-- * * * * * Lullabot's First Ever Advanced Workshops Are Here! Drupal API & Module Building - Advanced Drupal Themeing April 9th-13th - Providence, RI Early Bird Discounts Available Now http://www.lullabot.com/training * * * * *
Drupal file caching + lighttpd server + X-LIGHTTPD-send-tempfile! http://blog.lighttpd.net/articles/2006/11/29/faster-fastcgi Nothing will beat this, unless the system has Gb of RAM! But we all know that many hosts out there have imposed limits to 256Mb or 512Mb, being out of the question the use of memcache with more than maybe 64Mb. On 3/7/07, Chris Johnson <cxjohnson@gmail.com> wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
64 MB will buy you a lot with memcache. All your path lookups, for example, can be memcached. Taxonomy trees, terms and vocabs can be cached in that space, too. Fernando Silva wrote:
Drupal file caching + lighttpd server + X-LIGHTTPD-send-tempfile! http://blog.lighttpd.net/articles/2006/11/29/faster-fastcgi
Nothing will beat this, unless the system has Gb of RAM! But we all know that many hosts out there have imposed limits to 256Mb or 512Mb, being out of the question the use of memcache with more than maybe 64Mb.
On 3/7/07, Chris Johnson <cxjohnson@gmail.com> wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
-- * * * * * Lullabot's First Ever Advanced Workshops Are Here! Drupal API & Module Building - Advanced Drupal Themeing April 9th-13th - Providence, RI Early Bird Discounts Available Now http://www.lullabot.com/training * * * * *
But not a site with 10k nodes (pages), all them cached for anonymous use. Assuming 16Kb per page (generated) this would result in 156Mb of cached size. By the way, putting this cache into the filesystem, makes the database smaller... and that means faster backups. On 3/7/07, Robert Douglass <rob@robshouse.net> wrote:
64 MB will buy you a lot with memcache. All your path lookups, for example, can be memcached. Taxonomy trees, terms and vocabs can be cached in that space, too.
Fernando Silva wrote:
Drupal file caching + lighttpd server + X-LIGHTTPD-send-tempfile! http://blog.lighttpd.net/articles/2006/11/29/faster-fastcgi
Nothing will beat this, unless the system has Gb of RAM! But we all know that many hosts out there have imposed limits to 256Mb or 512Mb, being out of the question the use of memcache with more than maybe 64Mb.
On 3/7/07, Chris Johnson <cxjohnson@gmail.com> wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
On 3/7/07, Fernando Silva <fsilva.pt@gmail.com> wrote:
But not a site with 10k nodes (pages), all them cached for anonymous use. Assuming 16Kb per page (generated) this would result in 156Mb of cached size. By the way, putting this cache into the filesystem, makes the database smaller... and that means faster backups.
I don't think this is really a question of what to do for anonymous users. The problem is that authenticated users who have pages built for them based on permissions or anything else will never benefit from full page caching. -- Steve Rude Lead Web Developer 800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com Achieve Internet is a Division of Web Page Maintenance, Inc.
On 3/7/07, Fernando Silva <fsilva.pt@gmail.com> wrote: But not a site with 10k nodes (pages), all them cached for anonymous use. Assuming 16Kb per page (generated) this would result in 156Mb of cached size. By the way, putting this cache into the filesystem, makes the database smaller... and that means faster backups.
Erm, then why are you backing up your cache tables? There are techniques to selectively backup tables. --Andy
On 07 Mar 2007, at 19:23, Steve Rude wrote:
I don't think this is really a question of what to do for anonymous users. The problem is that authenticated users who have pages built for them based on permissions or anything else will never benefit from full page caching.
Correct. Serving cached pages to anonymous users is blazing fast. If we want to improve Drupal's scalability we need to focus on the page generation time of non-cached pages. We want to parse and execute less code in the slow path. -- Dries Buytaert :: http://www.buytaert.net/
Chris Johnson wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching."
Neither do I. BTW, what sucky mailclient do you use that it can't quote? :p
That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
Why I didn't understand it, is that it was a reply on my request for more object caching in Drupal. I actually didn't say where it should be cached.
I think the real way forward is object-based caching hooks in core and a pluggable cache interface.
Yes.
Ideally, one would have tiered caching, so that more than one cache mechanism could be plugged in, and various types of caching (object, page, menu, query, etc.) could be assigned to one of the available mechanisms. However, that ideal would be many steps down the road, if ever, due to the complexity involved.
Yeah, might be overly complex.
Instead, we should at least allow site owners to chose amongst memory, file and database caching at a minimum.
As for object-based caching, start with nodes. Eventually, add other common objects, e.g. users,
Agreed.
and provide an API(?) to make it easy for contrib modules which create new objects to cache their objects as well.
Well, that API is more or less in place. Cheers, Gerhard
I know I'm just a contrib developer, but as a seasoned app developer I thought I'd put in my 2 cents. If you're talking about object caching where in 90% of the cases the objects live in the database, I wonder whether this kind of caching would have a better payoff than just increasing the memory caching done by the DB engine (mysql,etc). It doesn't strike me as a big payoff for the code that it would take too achieve. I'd recommend general caching drupal caching API's should be restricted to "rendered" objects, such as the output of a block, a view, or a menu. Then you save the expensive rendering process. If you're just talking about object data, let the database engines do the caching. Of course it does make sense to cache XMLRPC call results and other expensive data gets. It seems really odd to talk about caching nodes in a database, when nodes and users are in a database to begin with? Dave
Rendered nodes and rendered users would be the examples here. The rendered, threaded comments on a node are another logical target. Metzler, David wrote
It seems really odd to talk about caching nodes in a database, when nodes and users are in a database to begin with?
Dave
-- * * * * * Lullabot's First Ever Advanced Workshops Are Here! Drupal API & Module Building - Advanced Drupal Themeing April 9th-13th - Providence, RI Early Bird Discounts Available Now http://www.lullabot.com/training * * * * *
If you're talking about object caching where in 90% of the cases the objects live in the database, I wonder whether this kind of caching would have a better payoff than just increasing the memory caching done by the DB engine (mysql,etc). It doesn't strike me as a big payoff for the code that it would take too achieve.
90% may well live in the database but in a disparate way. Nodes are "constructed". If you have say 10 modules all adding their own unique goodness to the $node object via hook_load() or hook_nodeapi() then there's a lot of database queries going on during that construction process. Add to that the PHP overhead of calling all those hooks during the construction phase then the db layer caching starts to become minor even though still beneficial. If you can cache the resulting constructed object into a memcache (or file, whatever) you save yourself all that construction overhead. You can just quickly recover the ready assembled object from cache. Expiring the cache can be done in node_save() and the next node_load() reconstructs/caches the node again. Same for user and comment objects. Cache expiry when adding a new module (or removing one) would be needed also. But getting a cache set-up like this can have great benefits. I think that's what this thread is about, isn't it? --Andy
On 3/7/07, AjK <andy@pingv.com> wrote:
If you're talking about object caching where in 90% of the cases the objects live in the database, I wonder whether this kind of caching would have a better payoff than just increasing the memory caching done by the DB engine (mysql,etc). It doesn't strike me as a big payoff for the code that it would take too achieve.
90% may well live in the database but in a disparate way. Nodes are "constructed". If you have say 10 modules all adding their own unique goodness to the $node object via hook_load() or hook_nodeapi() then there's a lot of database queries going on during that construction process. Add to that the PHP overhead of calling all those hooks during the construction phase then the db layer caching starts to become minor even though still beneficial.
If you can cache the resulting constructed object into a memcache (or file, whatever) you save yourself all that construction overhead. You can just quickly recover the ready assembled object from cache. Expiring the cache can be done in node_save() and the next node_load() reconstructs/caches the node again. Same for user and comment objects. Cache expiry when adding a new module (or removing one) would be needed also.
But getting a cache set-up like this can have great benefits. I think that's what this thread is about, isn't it?
--Andy
Exactly! -- Steve Rude Lead Web Developer 800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com Achieve Internet is a Division of Web Page Maintenance, Inc.
<snip>
If you can cache the resulting constructed object into a memcache (or file, whatever) you save yourself all that construction overhead.
Exactly!
There is just one obstacle to overcome, the node_access table and permissions in general. But nothings impossible and I'm sure there's an engineering solution out there. This is something I do intend to work on when I get some more time. --Andy
On 3/7/07, Metzler, David <metzlerd@evergreen.edu> wrote:
If you're talking about object caching where in 90% of the cases the objects live in the database, I wonder whether this kind of caching would have a better payoff than just increasing the memory caching done by the DB engine (mysql,etc). It doesn't strike me as a big payoff for the code that it would take too achieve.
The main issue here is that building the node object by invoking a lot of expensive select and joins is a real slowdown in my testing. Not to mention that real world query caching does not get you very far on a heavily trafficked website. The other part of this that is missing is that in a multi-server environment using memcached as a shared cache allows you to only need to build the node objects once per farm of servers before it is cleared or rebuilt based on max life. With file based caching you have to do it on every server. (Obviously you could do this on a NFS'd file system, but that would negate a lot of the speed) I'd recommend general caching drupal caching API's should be restricted
to "rendered" objects, such as the output of a block, a view, or a menu. Then you save the expensive rendering process. If you're just talking about object data, let the database engines do the caching.
I would agree that these would are good places to cache as well, but using eAccelerator or APC you can significantly speed up the time it takes php to build the rendered output from the object. I haven't seen the rendering to be all that significant. I still need to do some more testing however. Of course it does make sense to cache XMLRPC call results and other
expensive data gets.
It seems really odd to talk about caching nodes in a database, when nodes and users are in a database to begin with?
As Andy pointed out already (sorry I was late to jump on this), nodes don't live in one table. The biggest problem in addition to the other tables that get queried to build the nodes, is the node > node_revisions join that happens. In our experience with an active site with user contributed data, the node_revisions table can get big very fast. -- Steve Rude Lead Web Developer 800-618-8777phone / 858-225-0479 fax www.achieveinternet.com steve@achieveinternet.com Achieve Internet is a Division of Web Page Maintenance, Inc.
On 07 Mar 2007, at 12:25, Chris Johnson wrote:
I don't quite understand Dries remark "I'm fairly convinced that memory-based caching ... is the way forward rather than file-based caching." That remark seems to presuppose knowing every site's and host's situation (impossible) or to say that file-based caching will never provide enough performance benefit over what we have now.
I think the real way forward is object-based caching hooks in core and a pluggable cache interface.
When I wrote "memory-based caching, a la membcached", I meant "memory- based _object_ caching, a la memcached". When I wrote "the way forward", I meant that object-based caching is likely to buy us a lot more than file-based page caching will buy us over database-based page caching. I certainly didn't meant to presuppose every site's situation and specifically said that this won't hold in all cases -- though, I'm convinced that it will hold in the majority of the cases. Plus, I stressed that we'll want to do a rigourous analysis first so we have good understanding of the problem space.
Instead, we should at least allow site owners to chose amongst memory, file and database caching at a minimum.
Yes, see above. :) -- Dries Buytaert :: http://www.buytaert.net/
participants (8)
-
AjK -
Chris Johnson -
Dries Buytaert -
Fernando Silva -
Gerhard Killesreiter -
Metzler, David -
Robert Douglass -
Steve Rude