[drupal-devel] Drupal.org cache statistics

Nicholas Ivy nji at njivy.org
Thu May 26 23:05:30 UTC 2005

I haven't heard any new caching ideas for a while, so I sat down and 
brainstormed a bit.  I began by re-phrasing Dries's statistics:

Assuming that caching applies only to anonymous visitors for the sake 
of approximation, these numbers tell me that 35% of all anonymous 
requests (30 / 85) are identical to one of the last 170 anonymous 
requests (85% of 200).  Then, every 2.5 minutes or so (200 pages / 
93,000 pages * 20 hours), the entire cache is reset.  But roughly 65% 
of the anonymous requests (55 / 85) are unique within a 2.5-minute 
span, which means that at least 65% of drupal.org's pages wait more 
than 2.5 minutes between anonymous page requests.

So the goal is obviously to extend the lifetime of the cache for 
anonymous visitors.  Wiping the entire cache seems overkill, but what 
else can be done when every page may contain dynamic information?  
Other people have already said this already.

So what options do we have to improve caching performance?

1)  Perhaps we could use statistics, like a Poisson distribution, to 
predict how much time is likely to occur between requests to a certain 
page given the activity we recently recorded [1].  If the predicted 
wait is greater than 2.5 minutes, don't immediately clear the cache for 
that page.  Instead, wait until there's at least a 50% chance someone 
will look again.  After all, it's likely that no one is looking at it, 
so why keep the page up-to-date?

2)  Wait more than 2.5 minutes before clearing the cache for anonymous 
users.  If most of the visitors have authenticated, this scheme won't 
help much.

3)  Cache page elements as other people have suggested.  Assuming most 
of the effort to create a page is spent rendering page elements, this 
scheme should work too.

Feel free to point out mistakes!


[1] Using a Poisson distribution, there is a 50% chance of waiting less 
than t seconds between requests, where t = -1 * 
number_of_seconds_in_sample_period / 
number_of_requests_in_sample_period * log(.5).

On May 26, 2005, at 1:16 PM, Dries Buytaert wrote:

> Hello world,
> november last year, I profiled drupal.org's cache observations.  
> Yesterday, Moshe asked to profile it again so we could evaluate the 
> usefulness of Jeremy's "loose caching" mechanism.
> The past 20 hours, I logged 93.000 unique page requests using the 
> patch at http://buytaert.net/temporary/cache-statistics.patch.  Loose 
> caching was enabled.
> Results:
> 1. Last year we found that authenticated users were responsible for 
> 15,8 % of all page views.  A year later, we see that authenticated 
> users are responsible for 14,9% of all page views.
> 2. Last year we found that only 27.9% of the page requests actually 
> benefited from the cache system.  That is, for more than 2/3th of the 
> page requests, we had to generate a page dynamically.  A year later, 
> using "loose caching" rather than "strict caching", we see that 30,7% 
> of the page requests benefit from the cache system.  Read: we still 
> have a lot of page cache misses! :(
> 3. Last year, we found that the cache got flushed once every 207 page 
> requests.  A year later, we observe that the cache got flushed once 
> every 190 page requests.
> We conclude that:
> 1. Loose caching does not significantly -- or not necessarily -- 
> improve the behavior of drupal.org's page cache (though I'd like to 
> believe that it does when there are sudden traffic spikes/bursts).
> 2. When writing code, we can NOT assume that a page will benefit from 
> being cached.
> --
> Dries Buytaert  ::  http://www.buytaert.net/

More information about the drupal-devel mailing list