[drupal-devel] caching issues

Jeremy Andrews jeremy at kerneltrap.org
Wed Mar 9 12:28:32 UTC 2005


On Wed, 09 Mar 2005 05:16:22 +0100
Steven Wittens <steven at acko.net> wrote:

[...]

> - Enforcing a minimum cache lifetime for pages is pretty easy with the 
> timestamp/expiration parameter for cache.

This is my proposal #1?  Simple time-based caching.

> - It is important that a user sees his/her changes reflected 
> immediately, otherwise they might think an error occured and post twice.
> 
> Possible solution: disable caching for a user's session as soon as a 
> they have posted something. For your case this would mean the spambot 
> get fresh pages all the time, but the rest doesn't.

I disagree that this is important.  There are many sites on the internet
where once you post something, you get a message that says something like
"your comment will be visible within n minutes".

However, invalidating the cache for specific users is an interesting idea.
 This would be per-IP...  I will look into this.

> - Clearing the cache selectively is difficult because sidebar blocks 
> like "active forum topics" change easily. Still, clearing out the "main"
> page for a certain item (e.g. a node view) is doable.

I don't plan to pursue selective caching.  It introduces more complexity
than I am interested in maintaining as a patch against core.

> - The cache has a much higher miss rate than expected on drupal.org at 
> least because the site is constantly being crawled by spiders. Pages 
> that are visited often get re-cached quickly after a wipe, but this 
> doesn't happen for the random access pattern that is common for spiders 
> and also for posts reached through searching.

Yes, this is true.  And I remember the data you and Dries came up with. 
However I still get a large boost from the cache.  I believe this is
because much of the anonymous traffic is due to links from other news
sites, and thus the same page is loaded many times in rapid succession. 
Thus, only a percentage of page loads benefit from the cache, but it's a
significant enough percentage to have a noticeable affect on performance. 
Disabling the cache (or flushing it every second) has a negative affect on
performance.

> - Any aggressive caching should be implemented as an optional feature as
> it is useless for small sites. Perhaps we could change the cache option 
> into three states: "No caching" "Mild caching" "Aggressive caching".

Yes, I agree.  But is there interest in merging an agressive type caching
mechanism?

> Still, it sounds to me like your problem could be fixed by imposing a 
> throttle on submissions (we used to have this, but it got lost in one of
> the node system rewrites) or by trying to detect spammy behaviour and 
> imposing a (temporary) ban.

I neglected to mention that the spam bots use an obscenely large number of
proxies.  Each comment submission is made from a different IP address. 
Thus, as far as Drupal is concerned they are each a different user.

> If you dig around the mailinglist archive some more, you might find some
> more things.

I have followed such discussions with much interest in the past.  If
anyone else has practical ideas, please speak up.

I am adding "temporarily disable cache for specific users" to my list of
potential improvements.  That should be simple enough, and may be all I
need.

Thanks,
 -Jeremy




More information about the drupal-devel mailing list