Refresh rather than re-create D6 cache
Hi, In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated. Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be *refreshed* rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh. This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time. I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists. Thanks! vacilando
What would be the practical difference between emptying and recreating a cache, and refreshing it? On 18/10/2010, at 9:22 PM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be refreshed rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
Not all caches would *have* to be re-populated at the same time. Currently, if many or all caches get empty, they need to be re-populated on request, extending the script execution. On a busier site other requests are waiting because the first request is still not finished re-populating caches. Eventually, Apache's maximum number of connections may be reached. In the scenario I am considering, expired caches would not be emptied, so they would continue serving (slightly older) data fast, and they would be refreshed gradually (e.g. one cache per request). <http://www.vacilando.org/> On Mon, Oct 18, 2010 at 12:25, Chris Skene <chris@xtfer.com> wrote:
What would be the practical difference between emptying and recreating a cache, and refreshing it?
On 18/10/2010, at 9:22 PM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be *refreshed* rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
Would something like Elysia Cron solve the problem... http://drupal.org/project/elysia_cron It has some advanced cron control which can manage long requests, as well as disable or change the timing on different cron hooks. On 18/10/2010, at 9:41 PM, Tomáš Fülöpp (vacilando.org) wrote:
Not all caches would have to be re-populated at the same time.
Currently, if many or all caches get empty, they need to be re-populated on request, extending the script execution. On a busier site other requests are waiting because the first request is still not finished re-populating caches. Eventually, Apache's maximum number of connections may be reached.
In the scenario I am considering, expired caches would not be emptied, so they would continue serving (slightly older) data fast, and they would be refreshed gradually (e.g. one cache per request).
On Mon, Oct 18, 2010 at 12:25, Chris Skene <chris@xtfer.com> wrote: What would be the practical difference between emptying and recreating a cache, and refreshing it?
On 18/10/2010, at 9:22 PM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be refreshed rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
Elysia Cron is a perfect module and indeed it would be possible to re-write all cache refreshes as cron hooks and then configure them to run at different times. Still, I look for a more generic solution (consecutive refreshes of any number of caches rather than having to set a time for each) and possibly somehow using Drupal's cache (hm, perhaps using cache_set with CACHE_PERMANENT and then, at x seconds after it was created consider it expired, but refresh maximum one cache per request?) <http://www.vacilando.org/> On Mon, Oct 18, 2010 at 12:59, Chris Skene <chris@xtfer.com> wrote:
Would something like Elysia Cron solve the problem...
http://drupal.org/project/elysia_cron
It has some advanced cron control which can manage long requests, as well as disable or change the timing on different cron hooks.
On 18/10/2010, at 9:41 PM, Tomáš Fülöpp (vacilando.org) wrote:
Not all caches would *have* to be re-populated at the same time.
Currently, if many or all caches get empty, they need to be re-populated on request, extending the script execution. On a busier site other requests are waiting because the first request is still not finished re-populating caches. Eventually, Apache's maximum number of connections may be reached.
In the scenario I am considering, expired caches would not be emptied, so they would continue serving (slightly older) data fast, and they would be refreshed gradually (e.g. one cache per request).
On Mon, Oct 18, 2010 at 12:25, Chris Skene <chris@xtfer.com> wrote:
What would be the practical difference between emptying and recreating a cache, and refreshing it?
On 18/10/2010, at 9:22 PM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be *refreshed* rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
On 18 Out 2010 12h12 WEST, tomi@vacilando.org wrote:
[1 <text/plain; UTF-8 (quoted-printable)>] Elysia Cron is a perfect module and indeed it would be possible to re-write all cache refreshes as cron hooks and then configure them to run at different times.
Still, I look for a more generic solution (consecutive refreshes of any number of caches rather than having to set a time for each) and possibly somehow using Drupal's cache (hm, perhaps using cache_set with CACHE_PERMANENT and then, at x seconds after it was created consider it expired, but refresh maximum one cache per request?)
Perhaps Cache Actions (using Rules) would do the trick: http://drupal.org/project/cache_actions --- appa
Just to make sure, have you tried using the minimum cache lifetime on the performance page? It essentially says that a cache record will always last at least that long, even if a clear is requested for it. That's your first step if you're finding some caches clearing too frequently (especially the expensive filter and page caches). --Larry Garfield On 10/18/10 5:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
Yes, Larry, I did experiment with that approach as well. It allows caches to expire at different times, but a) you need to keep some sort of approximate overview about the various expiration delays you're setting (so that they don't have much chance happening at the same time) and b) whenever such cache happens to expire, it still gets deleted at cron and then will have to be re-calculated during the precious time of page request. Currently I am thinking about the following approach - crude pseudo code: // Make sure no other cache has been started over defined period * set buffer period between cache refreshes $cbuffer = 15 seconds * if ( (time()-$lastcacherun) > $cbuffer ) // Check if the cache at hand has expired * Set some cache life time $clife * SELECT `created` FROM `cache` where `cid` = 'cachedobjectname' * if ( (time()-$created) > $clife ) // Refresh the expired cache * $lastcacherun = time() * recreate cachedobjectname * DELETE FROM `cache` WHERE cid = 'cachedobjectname' * cache cachedobjectname using cache_set with CACHE_PERMANENT What do you think? On Mon, Oct 18, 2010 at 14:34, larry@garfieldtech.com < larry@garfieldtech.com> wrote:
Just to make sure, have you tried using the minimum cache lifetime on the performance page? It essentially says that a cache record will always last at least that long, even if a clear is requested for it. That's your first step if you're finding some caches clearing too frequently (especially the expensive filter and page caches).
--Larry Garfield
On 10/18/10 5:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
I... think I follow? It sounds like you want an approach like the search index uses; when you tell it to rebuild the search index it doesn't truncate the index but marks all records as "dirty", so they get reindexed over time by cron. You may also find this of use: http://drupal.org/project/views_content_cache --Larry Garfield On 10/18/10 8:31 AM, Tomáš Fülöpp (vacilando.org) wrote:
Yes, Larry, I did experiment with that approach as well. It allows caches to expire at different times, but a) you need to keep some sort of approximate overview about the various expiration delays you're setting (so that they don't have much chance happening at the same time) and b) whenever such cache happens to expire, it still gets deleted at cron and then will have to be re-calculated during the precious time of page request.
Currently I am thinking about the following approach - crude pseudo code:
// Make sure no other cache has been started over defined period * set buffer period between cache refreshes $cbuffer = 15 seconds * if ( (time()-$lastcacherun) > $cbuffer )
// Check if the cache at hand has expired * Set some cache life time $clife * SELECT `created` FROM `cache` where `cid` = 'cachedobjectname' * if ( (time()-$created) > $clife )
// Refresh the expired cache * $lastcacherun = time() * recreate cachedobjectname * DELETE FROM `cache` WHERE cid = 'cachedobjectname' * cache cachedobjectname using cache_set with CACHE_PERMANENT
What do you think?
On Mon, Oct 18, 2010 at 14:34, larry@garfieldtech.com <mailto:larry@garfieldtech.com> <larry@garfieldtech.com <mailto:larry@garfieldtech.com>> wrote:
Just to make sure, have you tried using the minimum cache lifetime on the performance page? It essentially says that a cache record will always last at least that long, even if a clear is requested for it. That's your first step if you're finding some caches clearing too frequently (especially the expensive filter and page caches).
--Larry Garfield
On 10/18/10 5:22 AM, Tomáš Fülöpp (vacilando.org <http://vacilando.org>) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
Larry, thanks for the tips; I am looking into how search indexing works and I definitely will use Views Content Cache (don't know how could I miss it!) On Mon, Oct 18, 2010 at 15:41, larry@garfieldtech.com < larry@garfieldtech.com> wrote:
I... think I follow? It sounds like you want an approach like the search index uses; when you tell it to rebuild the search index it doesn't truncate the index but marks all records as "dirty", so they get reindexed over time by cron.
You may also find this of use:
http://drupal.org/project/views_content_cache
--Larry Garfield
On 10/18/10 8:31 AM, Tomáš Fülöpp (vacilando.org) wrote:
Yes, Larry, I did experiment with that approach as well. It allows caches to expire at different times, but a) you need to keep some sort of approximate overview about the various expiration delays you're setting (so that they don't have much chance happening at the same time) and b) whenever such cache happens to expire, it still gets deleted at cron and then will have to be re-calculated during the precious time of page request.
Currently I am thinking about the following approach - crude pseudo code:
// Make sure no other cache has been started over defined period * set buffer period between cache refreshes $cbuffer = 15 seconds * if ( (time()-$lastcacherun) > $cbuffer )
// Check if the cache at hand has expired * Set some cache life time $clife * SELECT `created` FROM `cache` where `cid` = 'cachedobjectname' * if ( (time()-$created) > $clife )
// Refresh the expired cache * $lastcacherun = time() * recreate cachedobjectname * DELETE FROM `cache` WHERE cid = 'cachedobjectname' * cache cachedobjectname using cache_set with CACHE_PERMANENT
What do you think?
On Mon, Oct 18, 2010 at 14:34, larry@garfieldtech.com <mailto:larry@garfieldtech.com> <larry@garfieldtech.com
<mailto:larry@garfieldtech.com>> wrote:
Just to make sure, have you tried using the minimum cache lifetime on the performance page? It essentially says that a cache record will always last at least that long, even if a clear is requested for it. That's your first step if you're finding some caches clearing too frequently (especially the expensive filter and page caches).
--Larry Garfield
On 10/18/10 5:22 AM, Tomáš Fülöpp (vacilando.org <http://vacilando.org>) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
Thanks!
vacilando
On 10/18/2010 3:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
This is the Pressflow caching model. When David Strauss first got into Drupal, he did a lot of work with this kind of thing, where caching wasn't done when requested, but on cron.
Le lundi 18 octobre 2010 à 08:04 -0700, Earl Miles a écrit :
On 10/18/2010 3:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
This is the Pressflow caching model. When David Strauss first got into Drupal, he did a lot of work with this kind of thing, where caching wasn't done when requested, but on cron.
Drupal caching model is not wrong, but a lot of modules developers did not understood how to use the hook_flush_caches() and implement massive cache clear or data rebuild within this function, which is absolutely wrong because core will run this hook at each cron run, thus making those modules caches virtually useless. It's not free critiscism, there was a time where I myself did the error (I'm currently fixing a lot of code right now for site performances). I'm currently reporting those issues with some D6 contrib modules . Pierre.
On 10/18/2010 8:14 AM, Pierre Rineau wrote:
Drupal caching model is not wrong, but a lot of modules developers did not understood how to use the hook_flush_caches() and implement massive cache clear or data rebuild within this function, which is absolutely wrong because core will run this hook at each cron run, thus making those modules caches virtually useless.
It's not free critiscism, there was a time where I myself did the error (I'm currently fixing a lot of code right now for site performances). I'm currently reporting those issues with some D6 contrib modules .
I disagree. Drupal's behavior is actually bad here, because hook_flush_caches() is what we use when the 'clear cache' button is pressed. The cache clear that happens during cron runs is actually a misuse of the feature from its original intention, I think. In any case, that wasn't really relevant to my point, which is that what you're asking about is the Pressflow model of caching, and you should look into it.
Le lundi 18 octobre 2010 à 08:22 -0700, Earl Miles a écrit :
On 10/18/2010 8:14 AM, Pierre Rineau wrote:
Drupal caching model is not wrong, but a lot of modules developers did not understood how to use the hook_flush_caches() and implement massive cache clear or data rebuild within this function, which is absolutely wrong because core will run this hook at each cron run, thus making those modules caches virtually useless.
It's not free critiscism, there was a time where I myself did the error (I'm currently fixing a lot of code right now for site performances). I'm currently reporting those issues with some D6 contrib modules .
I disagree. Drupal's behavior is actually bad here, because hook_flush_caches() is what we use when the 'clear cache' button is pressed. The cache clear that happens during cron runs is actually a misuse of the feature from its original intention, I think.
hook_flush_caches() function description is quite clear on api.d.o, it specifies that this hook should just return cache table names. During cron execution, it drops expired entries from these tables, which is, quite logic. At 'cache clear' time, it fully wipes out data from these tables, which make the hook signature still logic to me. What's wrong here maybe the lack of a hook_drupal_is_actually_flushing_the_cache_here_do_whatever_you_have_to().
In any case, that wasn't really relevant to my point, which is that what you're asking about is the Pressflow model of caching, and you should look into it.
That's true, but it worth mention it. Pierre.
On 10/18/2010 8:25 AM, Pierre Rineau wrote:
hook_flush_caches() function description is quite clear on api.d.o, it specifies that this hook should just return cache table names.
It is, but the current documentation does not match the name of the hook. My belief is that the evolution of this hook happened late in one of the cycles and it got changed to suit what was needed at that very moment, rather than being done properly.
By the way, I just read CTools code, and the test with the cron semaphore is ugly, but this is well done! :D
I completely agree. Ugly but the only solution we could come up with. It took months to really understand how badly we were getting screwed with that.
On 10/18/10 11:44 AM, Earl Miles wrote:
On 10/18/2010 8:25 AM, Pierre Rineau wrote:
hook_flush_caches() function description is quite clear on api.d.o, it specifies that this hook should just return cache table names.
It is, but the current documentation does not match the name of the hook. My belief is that the evolution of this hook happened late in one of the cycles and it got changed to suit what was needed at that very moment, rather than being done properly.
hook_flush_caches() was added post-D6 freeze to make good on a promise Dries made at a DrupalCon somewhere. I forget if it was in the alpha or beta stage. --Larry Garfield
Le lundi 18 octobre 2010 à 08:22 -0700, Earl Miles a écrit :
On 10/18/2010 8:14 AM, Pierre Rineau wrote:
Drupal caching model is not wrong, but a lot of modules developers did not understood how to use the hook_flush_caches() and implement massive cache clear or data rebuild within this function, which is absolutely wrong because core will run this hook at each cron run, thus making those modules caches virtually useless.
It's not free critiscism, there was a time where I myself did the error (I'm currently fixing a lot of code right now for site performances). I'm currently reporting those issues with some D6 contrib modules .
I disagree. Drupal's behavior is actually bad here, because hook_flush_caches() is what we use when the 'clear cache' button is pressed. The cache clear that happens during cron runs is actually a misuse of the feature from its original intention, I think.
In any case, that wasn't really relevant to my point, which is that what you're asking about is the Pressflow model of caching, and you should look into it.
By the way, I just read CTools code, and the test with the cron semaphore is ugly, but this is well done! :D Pierre.
Earl, that sounds interesting - but are you sure Pressflow does that currently; all I've located so far is this ancient module (by David Strauss) http://drupal.org/project/pressflow_preempt that mentions such functionality ("Based on the performance data Preempt collects, it may choose to rebuild cached items during cron operations."). I'm looking further, of course, but if you have an URL for more info on that I'd much appreciate it. On Mon, Oct 18, 2010 at 17:04, Earl Miles <merlin@logrus.com> wrote:
On 10/18/2010 3:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
This is the Pressflow caching model. When David Strauss first got into Drupal, he did a lot of work with this kind of thing, where caching wasn't done when requested, but on cron.
Le lundi 18 octobre 2010 à 08:04 -0700, Earl Miles a écrit :
On 10/18/2010 3:22 AM, Tomáš Fülöpp (vacilando.org) wrote:
Hi,
In D6, after all caches are cleared, or after a lot of them expire and get emptied by cron, the server load spikes seriously because all such caches need to be re-populated.
Since this happens more and more on sites I work on, I have been thinking about using another approach in my modules, in the sense that caches would be /refreshed/ rather than cleared and re-populated. Each cache refresh would run depending on e.g. a simple variable storing last time stamp of any other cache refresh.
This would assure that a) all cached values would be available at all times, b) caches would never be re-calculated all at the (near) same time.
I am about to write logic for this, but wanted to first check with others in the list -- perhaps some of you know or can point to an elegant solution that already exists.
This is the Pressflow caching model. When David Strauss first got into Drupal, he did a lot of work with this kind of thing, where caching wasn't done when requested, but on cron.
If we look at the core functions, we can see Drupal as a framework, it provides more an API than the caching model itself -even if the caching model based on massive cache wipeout is implemented in core some modules- it's up to the module developers to make a good usage of it. I'd rather implement the refresh caching approach too of course, and Drupal core does not prevent me to do it. Pierre.
participants (6)
-
António P. P. Almeida -
Chris Skene -
Earl Miles -
larry@garfieldtech.com -
Pierre Rineau -
Tomáš Fülöpp (vacilando.org)