Issue status update for http://drupal.org/node/19298 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Jeremy@kerneltrap.org Reported by: Jeremy@kerneltrap.org Updated by: moshe weitzman Status: patch changing the cache mode depending on throttle status is a really good idea IMO. This patch is a great step forward for speeding up Drupal for anon users. The biggest innovation in this area since bootstrap, I think. moshe weitzman Previous comments: ------------------------------------------------------------------------ March 22, 2005 - 23:32 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/database.mysql_1.patch (382 bytes) Drupal's existing caching mechanism doesn't perform well on highly dynamic websites in which the cache is flushed frequently. One example is a site that is under attack by a spambot that is posting spam comments every few seconds, causing all cached pages to be flushed every few seconds. The attached patches provide the following cache methods: 1) CACHE_DISABLED: no caching. This is unchanged from before. 2) CACHE_ENABLED_STRICT: This is Drupal's existing caching method, in which the cache is flushed immediately whenever it contains invalid data. 3) CACHE_ENABLED_LOOSE: This is the new caching method (feel free to suggest a better name, this is the best I could come up with tonight). Essentially, it immediately flushes the cache only for specific users who have modified cached data (whether or not they are logged in), delaying the flushing of data for other users by several minutes. This feature requires a new 'cache' field in the sessions table to track the cache status for each user, whether or not they are logged in. A one line change to the system.module updates the cache configuration settings to include an additional option for loose caching. The rest of the changes are in bootstrap.inc, as follows: - cache_clear_all: if strict caching is enabled, nothing is changed. If loose caching is enabled, we delay the flushing of the entire cache for 5 minutes. Instead, we update this user's cache field in the sessions table so we know any cache pages created before the current time are no longer valid for this user. We only set the global "cache_flush" variable if it's not already set. If the global "cache_flush" variable was set more than 5 minutes ago, we flush all cache pages that were expired when the "cache_flush" variable was set. - cache_get: if strict caching is enabled, nothing is changed. If loose caching is enabled, we first check the global "cache_flush" variable and if older than 5 minutes we do a garbage collection call to cache_clear_all. Next, we see if there is a valid cache entry for us. If the cache entry is older than our session table's "cache" field, we don't use it. This will cause the cache to be regenerated by the current user, and ultimately updated for all users. Attached is the patch to database.mysql. To add manually, execute the following: ALTER TABLE sessions ADD cache int(11) NOT NULL default '0' AFTER timestamp; ------------------------------------------------------------------------ March 22, 2005 - 23:33 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/system.module_5.patch (1.49 KB) Attached is the patch for the system.module, adding the new cache configuration option. ------------------------------------------------------------------------ March 22, 2005 - 23:36 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/bootstrap.inc.patch (3.62 KB) And finally, the patch for bootstrap.inc. BTW: Do we want to make the 'cache_flush_delay' configurable? Or perhaps just stick with a sane default? Perhaps 60 seconds is more sane than 300 seconds? ------------------------------------------------------------------------ March 22, 2005 - 23:44 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/bootstrap.inc_0.patch (3.61 KB) Updated version of the bootstrap.inc.patch, removing an erroneous "else" that prevented the cache from being properly updated for a specific user after a delayed flush. ------------------------------------------------------------------------ March 23, 2005 - 02:12 : Anonymous Looks very good. Please provide patches in one file, though. Gerhard ------------------------------------------------------------------------ March 23, 2005 - 03:35 : stefan nagtegaal First of all, I do not have huge sites under my juristriction, so maybe my points here aren't valid.. If so, tell me and I'll shut up.. I took a look at your patches, and I truly do not understand why you set more Caching methods in 'admin/system'. I think that it would be better if we stay with the cuurent options (Enabled, Disabled) and let Drupal suggest the way of the caching methods (CACHE_ENABLED_LOOSE and CACHE_ENABLED_SCTRICT). This would make your patch to system.module a little smaller and easier to understand for newbies.. Also, as you said yourself, I don't see why we should make the 'cache_flush_delay' configurable. I mean, how would you know what a good time would be? Couldn't drupal find out the right value for that, so the administrator doesn't have to? Again, I'm not speaking as someone who is in need of something like this, but I can understand some people are.. What I'm trying to tell, is that the idea is probably pretty good only with less options and more automagical cache settings it would be more userfriendly ad doesn't scare off newbies (or me)... ------------------------------------------------------------------------ March 23, 2005 - 05:21 : Dries Looks like this patch can go in soon after the DRUPAL-4-6 has been created. You'll want to update the cache related documentation in system_help() (then again it might not be necessary). I wouldn't bother making this more configurable until we have evaluated/tested this some more. I'm very interested in testing this out or in getting some performance numbers. ------------------------------------------------------------------------ March 23, 2005 - 07:53 : Jeremy@kerneltrap.org When time permits, I will combine the patches into one file, and update the help text. I included them as separate files for now to simplify code review. I also see a logical error I need to fix when performing garbage collection that could cause the cache to flush uneccessarily. I did not make the functionality automatic for two reasons: it makes the code slightly more complex, and it makes the functionality more difficult to benchmark. While I think it would be a good idea, I will not modify the patch to implement automatic cache configuration unless it is requested by Dries or other core developers. To implement automatic cache configuration, it could simply be tied to the throttle. If the throttle is off (ie the throttle module is disabled, or the site is not busy), use the strict cache. If the throttle is on (ie the throttle module is enabled and the site is busy), use the loose cache. Alternatively, we could leave the configuration as it currently is in this patch (with disabled, strict and loose), and the loose caching functionality itself could be tied to the throttle. If the throttle is off, use a low cache_flush_delay (ie 15 seconds). If the throttle is on, use a high cache_flush_delay (ie 300 seconds). The theory being that the cache_flush_delay enforces a minimum cache life, and the busier a site the more it will benefit from a longer minimum cache life. I believe that this latter implementation would be the best for the spam floods I've experienced on my website.