Hello, I ran into a problem with the way Drupal caches data today. A spam bot started crawling my site for the past 36 hours or so, posting dozens of comments every minute (frequently several a second). Combined with normal traffic, my site was serving 3-400 pages every 60 seconds. Because the spam comments were being posted at such a high speed, the cache was being flushed too quickly to do any good. I may as well have disabled the cache. The site became sluggish. (It has handled that large a load before, just not with comments being posted so quickly) I am planning to patch my site to modify how the cache is flushed, and perhaps to work again on file-based caching. No matter how optimized a CMS, there will come a time when the limitations of the hardware prevent a site from updating in real-time. Large websites (ie Slashdot) have to rebuild their caches every n minutes, instead of every time a new comment is posted. Like it or not, this is significantly more efficient. I would like to target the effort for core inclusion into 4.7. Thus, I would like to brainstorm now and work out an acceptable design. If nobody is interested and this has no chance of getting into 4.7, I'll do it on my own anyway, but I'd much prefer to get something into core so I don't have to redo it with every release. Here are some proposals. I personally would like to see one or more of these available _in addition_ to the current method. ie, most sites would leave the cache as we're all used to. Busier sites would enable one of these alternative caching mechanisms (in order of coding complexity): 0) Current Drupal caching. What is in the cache is always valid. If new content is posted, the cache is potentially invalid so it is flushed and everything is rebuilt. (Some stuff sticks around, but that stuff will be unmodified by my proposals) 1) Time-based caching. Simply flush the cache every n minutes. When new content is posted, a message such as "your comment will become visible in n minutes" would need to be displayed. (This would have saved me most recently with the spambot problem I had today...) 2) Fuzzy time-based caching. Patches against 4.2 exist in CVS [1] to see what I'm referring to (patches apply in order: 1, then 2, then 3...). It's similar to idea #1, but slightly more complicated. The cache becomes "dirty" every n minutes. When a "dirty" cache page is requested, it may or may not be rebuilt by the requester (a call to random makes the determination). If after n+x minutes the cache entry still hasn't been rebuilt, it is flushed (forcing a rebuild). (The idea is to soften the affect of flushing the cache. In example 1, there will be a CPU spike every n minutes. In example 2, the CPU load is distributed randomly.) In other words, there's a soft timeout, and a hard timeout. After the soft timeout, the cache entry may be rebuilt. After the hard timeout, the cache entry has to be rebuilt. 3) File-based caching. Patches against 4.0 exist in CVS [2] to see what I'm referring to. The simplest mechanism would be like #1 above, but with files stored in the filesystem instead of in the database. When I utilized this in 4.0, the performance boost was phenomenal. Additionally, the site could continue to serve pages with the database stopped. 4) Fuzzy file-based caching. This is actually how I implemented file-based caching against 4.0 long ago. If you got this far and understood examples 1, 2, and 3, then no further explanation is needed here. Thoughts? Feedback? Suggestions? Dries? I'll work up patches. But if someone has better design ideas, now is a good time to suggest it. Thanks, -Jeremy [1] http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/jeremy/4.2.0/cach... [2] http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/jeremy/4.0.0/file...