On Wed, 09 Mar 2005 05:16:22 +0100 Steven Wittens <steven@acko.net> wrote: [...]
- Enforcing a minimum cache lifetime for pages is pretty easy with the timestamp/expiration parameter for cache.
This is my proposal #1? Simple time-based caching.
- It is important that a user sees his/her changes reflected immediately, otherwise they might think an error occured and post twice.
Possible solution: disable caching for a user's session as soon as a they have posted something. For your case this would mean the spambot get fresh pages all the time, but the rest doesn't.
I disagree that this is important. There are many sites on the internet where once you post something, you get a message that says something like "your comment will be visible within n minutes". However, invalidating the cache for specific users is an interesting idea. This would be per-IP... I will look into this.
- Clearing the cache selectively is difficult because sidebar blocks like "active forum topics" change easily. Still, clearing out the "main" page for a certain item (e.g. a node view) is doable.
I don't plan to pursue selective caching. It introduces more complexity than I am interested in maintaining as a patch against core.
- The cache has a much higher miss rate than expected on drupal.org at least because the site is constantly being crawled by spiders. Pages that are visited often get re-cached quickly after a wipe, but this doesn't happen for the random access pattern that is common for spiders and also for posts reached through searching.
Yes, this is true. And I remember the data you and Dries came up with. However I still get a large boost from the cache. I believe this is because much of the anonymous traffic is due to links from other news sites, and thus the same page is loaded many times in rapid succession. Thus, only a percentage of page loads benefit from the cache, but it's a significant enough percentage to have a noticeable affect on performance. Disabling the cache (or flushing it every second) has a negative affect on performance.
- Any aggressive caching should be implemented as an optional feature as it is useless for small sites. Perhaps we could change the cache option into three states: "No caching" "Mild caching" "Aggressive caching".
Yes, I agree. But is there interest in merging an agressive type caching mechanism?
Still, it sounds to me like your problem could be fixed by imposing a throttle on submissions (we used to have this, but it got lost in one of the node system rewrites) or by trying to detect spammy behaviour and imposing a (temporary) ban.
I neglected to mention that the spam bots use an obscenely large number of proxies. Each comment submission is made from a different IP address. Thus, as far as Drupal is concerned they are each a different user.
If you dig around the mailinglist archive some more, you might find some more things.
I have followed such discussions with much interest in the past. If anyone else has practical ideas, please speak up. I am adding "temporarily disable cache for specific users" to my list of potential improvements. That should be simple enough, and may be all I need. Thanks, -Jeremy