[drupal-devel] Drupal cache statistics
Hello world, I did some more experiments: 1. Enabled strict caching on drupal.org 2. Enabled loose caching on drupal.org (300 sec. delay, default) 3. Enabled loose caching on drupal.org (600 sec. delay) Per configuration we measured more than 150.000 page requests. Results ------- If we look at the cache miss percentages of anonymous user requests only (excl. authenticated users) we get: 1. Strict : 66% 2. Loose, 300ms : 65% 3. Loose, 600ms : 64% If we look at the average number of pages served between two flushes (excl. authenticated users) we get: 1. Strict : 280 pages/flush 2. Loose, 300ms : 182 pages/flush 3. Loose, 600ms : 327 pages/flush Conclusion ---------- 1. Loose caching has NO effect on drupal.org's page cache. 2. Loose caching has NO well-defined effect on drupal.org's flush ratio. Possible solutions ------------------ Unless we can serve 2000+ pages/flush (random estimate), the cache will never be efficient. 1. Rework loose caching to flush less frequently. 2. Flush partially. 3. Maybe there is a bug? 4. Remove 'loose caching'? -- Dries Buytaert :: http://www.buytaert.net/
On Wed, 1 Jun 2005, Dries Buytaert wrote:
Conclusion ----------
1. Loose caching has NO effect on drupal.org's page cache. 2. Loose caching has NO well-defined effect on drupal.org's flush ratio.
Possible solutions ------------------
Unless we can serve 2000+ pages/flush (random estimate), the cache will never be efficient.
1. Rework loose caching to flush less frequently. 2. Flush partially.
I am willing to continue to work on a partially flushing solution for comments if you agree to make the custom PHP settings for blocks return an array of allowed paths instead of just TRUE or FALSE.
3. Maybe there is a bug? 4. Remove 'loose caching'?
We'll see. Cheers, Gerhard
On 01 Jun 2005, at 13:39, Gerhard Killesreiter wrote:
1. Rework loose caching to flush less frequently. 2. Flush partially.
I am willing to continue to work on a partially flushing solution for comments if you agree to make the custom PHP settings for blocks return an array of allowed paths instead of just TRUE or FALSE.
I'm afraid that isn't a workable solution. -- Dries Buytaert :: http://www.buytaert.net/
On Wed, 1 Jun 2005, Dries Buytaert wrote:
On 01 Jun 2005, at 13:39, Gerhard Killesreiter wrote:
1. Rework loose caching to flush less frequently. 2. Flush partially.
I am willing to continue to work on a partially flushing solution for comments if you agree to make the custom PHP settings for blocks return an array of allowed paths instead of just TRUE or FALSE.
I'm afraid that isn't a workable solution.
Why do you think so? Your current code is something like: if (arg(0) == 'forum') return TRUE; if (arg(0) == 'node') return TRUE; return FALSE; Why not: return array('node/*' => TRUE, 'forum/*' => TRUE); or return array('node/*', 'forum/*'); and assume FALSE otherwise? This would be similar to what is stored in the DB if you don't use PHP but the more simple solution. Note that I am only caring for the $user->uid == 0 case as this is what we cache for. Role based setups wouldn't be affected (for roles other than rid == 1). Cheers, Gerhard
comments if you agree to make the custom PHP settings for blocks return an array of allowed paths instead of just TRUE or FALSE.
I'm afraid that isn't a workable solution.
Why do you think so? Your current code is something like:
if (arg(0) == 'forum') return TRUE; if (arg(0) == 'node') return TRUE;
The custom PHP settings are, well, custom PHP code. You may want to display a block only for new users. On the week of christmas. Under red moon. err... whatever.
On Wed, 1 Jun 2005, Karoly Negyesi wrote:
comments if you agree to make the custom PHP settings for blocks return an array of allowed paths instead of just TRUE or FALSE.
I'm afraid that isn't a workable solution.
Why do you think so? Your current code is something like:
if (arg(0) == 'forum') return TRUE; if (arg(0) == 'node') return TRUE;
The custom PHP settings are, well, custom PHP code. You may want to display a block only for new users. On the week of christmas. Under red moon. err... whatever.
No problem: if ($moon == 'red') return array('node/*'); Try again. :) Cheers, Gerhard
Regardless of cache granularity, we will need to validate and reset the cache somehow. I see two basic methods: 1. An event triggers the cache to reset. Currently cache_clear_all() accomplishes this when we publish nodes, comments, and do certain other things. If we decide to cache page elements, perhaps we could use event hooks that module authors can use to trigger cache resets. The challenge is to identify every possible event and consider the ramifications of each event for each user, but at least the cache is validated only when needed. 2. Page elements poll the cache for every page request. When a page is loaded, the page element computes a cache key and checks for cached results. For example, in nodelist.module I used an md5-checksum of the sql query as a cache key. If the sql query changes, the cache misses. In this case, the challenge is to efficiently compute accurate cache keys and handle stale results, but at least we don't have to worry about every event in the system. If we used a functional style of programming, Drupal could intercept the arguments to certain module functions and check for cached results. But we interact with global dynamic variables (including the database), so this is probably not a possibility. I don't see a way for Drupal to automatically handle caching per page element without cooperation from module authors. So the question is, 1. Do we rely on module authors to hook into every relevant system event, or 2. Do we rely on module authors to manage their own cache keys on every page load? Nic On Jun 1, 2005, at 3:04 AM, Dries Buytaert wrote:
Conclusion ----------
1. Loose caching has NO effect on drupal.org's page cache. 2. Loose caching has NO well-defined effect on drupal.org's flush ratio.
Possible solutions ------------------
Unless we can serve 2000+ pages/flush (random estimate), the cache will never be efficient.
1. Rework loose caching to flush less frequently. 2. Flush partially. 3. Maybe there is a bug? 4. Remove 'loose caching'?
-- Dries Buytaert :: http://www.buytaert.net/
On 01 Jun 2005, at 18:32, Nicholas Ivy wrote:
2. Page elements poll the cache for every page request. When a page is loaded, the page element computes a cache key and checks for cached results. For example, in nodelist.module I used an md5- checksum of the sql query as a cache key. If the sql query changes, the cache misses. In this case, the challenge is to efficiently compute accurate cache keys and handle stale results, but at least we don't have to worry about every event in the system.
Do you suggest we render the page, compute and md5-sum on the generated HTML, drop the generated page on the floor, and serve the page from the cache? Or do you suggest computing the md5-sum on something else? I don't understand how that could work ... -- Dries Buytaert :: http://www.buytaert.net/
On Jun 2, 2005, at 1:36 AM, Dries Buytaert wrote:
On 01 Jun 2005, at 18:32, Nicholas Ivy wrote:
2. Page elements poll the cache for every page request. When a page is loaded, the page element computes a cache key and checks for cached results. For example, in nodelist.module I used an md5-checksum of the sql query as a cache key. If the sql query changes, the cache misses. In this case, the challenge is to efficiently compute accurate cache keys and handle stale results, but at least we don't have to worry about every event in the system.
Do you suggest we render the page, compute and md5-sum on the generated HTML, drop the generated page on the floor, and serve the page from the cache? Or do you suggest computing the md5-sum on something else? I don't understand how that could work ...
No, I am not suggesting that we render pages and throw away the results. I am suggesting we compute the cache key using something else, something that will have to be identified per page element. Near the top of a block of code, we could use input values as cache keys to see if we have done this work already. If so, we skip the rest of the block and return the cache. I see a couple problems with this scheme, however. 1. We need to identify the important inputs that characterize the output. 2. We need to check early enough in the code for a cached result in order for the work to be worthwhile. As for nodelist.module, I looked again at how db_rewrite_sql() works and realized I got it wrong -- the structure of the query doesn't change per user. I thought I had an input to the page that completely characterized the result, but I really don't. So I'll have to work on that. Nic
Hi Dries,
Per configuration we measured more than 150.000 page requests.
Do you know roughly how many pages Drupal generates per minute, on average? Does this fluctuate greatly from minute to minute? How long did it take to get 150,000 page requests?
If we look at the cache miss percentages of anonymous user requests only (excl. authenticated users) we get:
1. Strict : 66% 2. Loose, 300ms : 65% 3. Loose, 600ms : 64%
I have a feeling something is wrong with the loose cache implementation.
If we look at the average number of pages served between two flushes (excl. authenticated users) we get:
1. Strict : 280 pages/flush 2. Loose, 300ms : 182 pages/flush 3. Loose, 600ms : 327 pages/flush
You mean 300 and 600 seconds, not milliseconds, right? Three reasons you might not see any increase in the number of pages served from cache even with loose caching enabled come to mind: 1) you are mis-measuring 2) Drupal.org does not have a consistent traffic load, serving pages in bursts spread apart greater than the timer on the loose cache 3) There is something wrong with the loose caching implementation that was merged into core.
1. Loose caching has NO effect on drupal.org's page cache. 2. Loose caching has NO well-defined effect on drupal.org's flush ratio.
I'm not in a position to review patches and such at the moment, so these may be stupid questions. But, how are you measuring the flush ratio? Is it based on when a certain function is called, or when we actually query the database to mass-delete cache entries? Do you do any measurements of CPU utilization with the various cache configurations?
Unless we can serve 2000+ pages/flush (random estimate), the cache will never be efficient.
1. Rework loose caching to flush less frequently. 2. Flush partially. 3. Maybe there is a bug? 4. Remove 'loose caching'?
I'm leaning toward #3 at the moment. Unfortunately I'm not in a position to look at this for another 7 days. When I return to Florida, I'll review what has been merged against what I'm using on KernelTrap, and retest it all to verify it's working as I originally designed it. I'll also think about ways to further improve the design. Cheers, -Jeremy
participants (5)
-
Dries Buytaert -
Gerhard Killesreiter -
Jeremy Andrews -
Karoly Negyesi -
Nicholas Ivy