Caching idea #4: Use feedback to control the maximum number of non-cached pages per minute by adjusting the cache delay in idea #2 below. Site administrators could define the maximum, then when site load passes that threshold the cache increases its delay until the number of rendered pages is no more than the maximum setting. Of course, when the load is low, act normally -- don't try to increase the load. ;-) Nic On May 26, 2005, at 6:06 PM, Nicholas Ivy wrote:
I haven't heard any new caching ideas for a while, so I sat down and brainstormed a bit. I began by re-phrasing Dries's statistics:
Assuming that caching applies only to anonymous visitors for the sake of approximation, these numbers tell me that 35% of all anonymous requests (30 / 85) are identical to one of the last 170 anonymous requests (85% of 200). Then, every 2.5 minutes or so (200 pages / 93,000 pages * 20 hours), the entire cache is reset. But roughly 65% of the anonymous requests (55 / 85) are unique within a 2.5-minute span, which means that at least 65% of drupal.org's pages wait more than 2.5 minutes between anonymous page requests.
So the goal is obviously to extend the lifetime of the cache for anonymous visitors. Wiping the entire cache seems overkill, but what else can be done when every page may contain dynamic information? Other people have already said this already.
So what options do we have to improve caching performance?
1) Perhaps we could use statistics, like a Poisson distribution, to predict how much time is likely to occur between requests to a certain page given the activity we recently recorded [1]. If the predicted wait is greater than 2.5 minutes, don't immediately clear the cache for that page. Instead, wait until there's at least a 50% chance someone will look again. After all, it's likely that no one is looking at it, so why keep the page up-to-date?
2) Wait more than 2.5 minutes before clearing the cache for anonymous users. If most of the visitors have authenticated, this scheme won't help much.
3) Cache page elements as other people have suggested. Assuming most of the effort to create a page is spent rendering page elements, this scheme should work too.
Feel free to point out mistakes!
Nic
---- [1] Using a Poisson distribution, there is a 50% chance of waiting less than t seconds between requests, where t = -1 * number_of_seconds_in_sample_period / number_of_requests_in_sample_period * log(.5).