[drupal-devel] [feature] Alternative caching for high traffic websites

Jeremy drupal-devel at drupal.org
Wed Mar 23 12:53:10 UTC 2005


Issue status update for http://drupal.org/node/19298

 Project:      Drupal
 Version:      cvs
 Component:    base system
 Category:     feature requests
 Priority:     normal
 Assigned to:  Jeremy at kerneltrap.org
 Reported by:  Jeremy at kerneltrap.org
 Updated by:   Jeremy at kerneltrap.org
 Status:       patch

When time permits, I will combine the patches into one file, and update
the help text.  I included them as separate files for now to simplify
code review.  I also see a logical error I need to fix when performing
garbage collection that could cause the cache to flush uneccessarily.
I did not make the functionality automatic for two reasons:  it makes
the code slightly more complex, and it makes the functionality more
difficult to benchmark.  While I think it would be a good idea, I will
not modify the patch to implement automatic cache configuration unless
it is requested by Dries or other core developers.
To implement automatic cache configuration, it could simply be tied to
the throttle.  If the throttle is off (ie the throttle module is
disabled, or the site is not busy), use the strict cache.  If the
throttle is on (ie the throttle module is enabled and the site is
busy), use the loose cache.  
Alternatively, we could leave the configuration as it currently is in
this patch (with disabled, strict and loose), and the loose caching
functionality itself could be tied to the throttle.  If the throttle is
off, use a low cache_flush_delay (ie 15 seconds).  If the throttle is
on, use a high cache_flush_delay (ie 300 seconds).  The theory being
that the cache_flush_delay enforces a minimum cache life, and the
busier a site the more it will benefit from a longer minimum cache
life.  I believe that this latter implementation would be the best for
the spam floods I've experienced on my website.


Jeremy at kerneltrap.org



Previous comments:
------------------------------------------------------------------------

March 22, 2005 - 23:32 : Jeremy at kerneltrap.org

Attachment: http://drupal.org/files/issues/database.mysql_1.patch (382 bytes)

Drupal's existing caching mechanism doesn't perform well on highly
dynamic websites in which the cache is flushed frequently.  One example
is a site that is under attack by a spambot that is posting spam
comments every few seconds, causing all cached pages to be flushed
every few seconds.
The attached patches provide the following cache methods:
 1) CACHE_DISABLED:  no caching.  This is unchanged from before.
 2) CACHE_ENABLED_STRICT: This is Drupal's existing caching method, in
which the cache is flushed immediately whenever it contains invalid
data.
 3) CACHE_ENABLED_LOOSE: This is the new caching method (feel free to
suggest a better name, this is the best I could come up with tonight). 
Essentially, it immediately flushes the cache only for specific users
who have modified cached data (whether or not they are logged in),
delaying the flushing of data for other users by several minutes.
This feature requires a new 'cache' field in the sessions table to
track the cache status for each user, whether or not they are logged
in.
A one line change to the system.module updates the cache configuration
settings to include an additional option for loose caching.
The rest of the changes are in bootstrap.inc, as follows:
 - cache_clear_all: if strict caching is enabled, nothing is changed. 
If loose caching is enabled, we delay the flushing of the entire cache
for 5 minutes.  Instead, we update this user's cache field in the
sessions table so we know any cache pages created before the current
time are no longer valid for this user.  We only set the global
"cache_flush" variable if it's not already set.  If the global
"cache_flush" variable was set more than 5 minutes ago, we flush all
cache pages that were expired when the "cache_flush" variable was set.
 - cache_get:  if strict caching is enabled, nothing is changed.  If
loose caching is enabled, we first check the global "cache_flush"
variable and if older than 5 minutes we do a garbage collection call to
cache_clear_all.  Next, we see if there is a valid cache entry for us. 
If the cache entry is older than our session table's "cache" field, we
don't use it.  This will cause the cache to be regenerated by the
current user, and ultimately updated for all users.
Attached is the patch to database.mysql.  To add manually, execute the
following:
ALTER TABLE sessions ADD cache int(11) NOT NULL default '0' AFTER
timestamp;


------------------------------------------------------------------------

March 22, 2005 - 23:33 : Jeremy at kerneltrap.org

Attachment: http://drupal.org/files/issues/system.module_5.patch (1.49 KB)

Attached is the patch for the system.module, adding the new cache
configuration option.


------------------------------------------------------------------------

March 22, 2005 - 23:36 : Jeremy at kerneltrap.org

Attachment: http://drupal.org/files/issues/bootstrap.inc.patch (3.62 KB)

And finally, the patch for bootstrap.inc.
BTW:  Do we want to make the 'cache_flush_delay' configurable?  Or
perhaps just stick with a sane default?  Perhaps 60 seconds is more
sane than 300 seconds?


------------------------------------------------------------------------

March 22, 2005 - 23:44 : Jeremy at kerneltrap.org

Attachment: http://drupal.org/files/issues/bootstrap.inc_0.patch (3.61 KB)

Updated version of the bootstrap.inc.patch, removing an erroneous "else"
that prevented the cache from being properly updated for a specific user
after a delayed flush.


------------------------------------------------------------------------

March 23, 2005 - 02:12 : Anonymous

Looks very good. Please provide patches in one file, though.
Gerhard


------------------------------------------------------------------------

March 23, 2005 - 03:35 : stefan nagtegaal

First of all, I do not have huge sites under my juristriction, so maybe
my points here aren't valid.. If so, tell me and I'll shut up..
I took a look at your patches, and I truly do not understand why you
set more Caching methods in 'admin/system'. I think that it would be
better if we stay with the cuurent options (Enabled, Disabled) and let
Drupal suggest the way of the caching methods (CACHE_ENABLED_LOOSE and
CACHE_ENABLED_SCTRICT). This would make your patch to system.module a
little smaller and easier to understand for newbies..
Also, as you said yourself, I don't see why we should make the
'cache_flush_delay' configurable. I mean, how would you know what a
good time would be? Couldn't drupal find out the right value for that,
so the administrator doesn't have to?
Again, I'm not speaking as someone who is in need of something like
this, but I can understand some people are.. What I'm trying to tell,
is that the idea is probably pretty good only with less options and
more automagical cache settings it would be more userfriendly ad
doesn't scare off newbies (or me)...


------------------------------------------------------------------------

March 23, 2005 - 05:21 : Dries

Looks like this patch can go in soon after the DRUPAL-4-6 has been
created.  You'll want to update the cache related documentation in
system_help() (then again it might not be necessary).  
I wouldn't bother making this more configurable until we have
evaluated/tested this some more.   I'm very interested in testing this
out or in getting some performance numbers.





More information about the drupal-devel mailing list