[development] Split up the cache table

Gerhard Killesreiter gerhard at killesreiter.de
Wed Aug 2 22:39:27 UTC 2006


Hi there!

In my quest to improve caching for Drupal, I've devised a cache that 
would let us have more than one cache table. The idea came from the 
observation that some cached items will quickly be invalidated again 
(page cache) while others will last a while (menu, filter). This is a 
problem for the mysql query cache, which will then also be invalidated. 
Also, operations on smaller tables will be faster.

The patch can be found here:

http://drupal.org/node/72290

I have now run this patch on drupal.org for several hours and here are 
some results.

First, let us look at the cache hits vs cache call ratio as a percentage 
of the total tries. In our case that is the number of SELECTs minus the 
number of UPDATEs devided by the SELECTs on a particular cache table:

(# selects - # updates) / # selects

Explanation: We run a SELECT in every case to determine if there is a 
cached item, but only the ones which do not trigger a subsequent cache 
update were successfull.

filter:    (242318-21780)/242318 = 91%
page:      (13315-7285)/13315 = 45%
menu:      (9989-161)/9989 = 98%
the rest:  (986-488)/986 = 50%

So, we have now proof that our page cache isn't working well for a busy 
site such as drupal.org, but the filter cache seems to work fine while 
the menu cache is truly excellent. The "rest" are a few feeble attemps 
on caching in cvs and project module, which apparently don't work too 
well either.

The high part of cache misses for the page cache is probably in part due 
to the fact that several crawlers are doing their business on drupal.org 
at all times. These will generate a cached page, but if the page is in 
fact very uninteresting to humans then the only likely receiver of a 
cached page is another crawler. And there aren't that many of them that 
this would happen. I've suggested to make Drupal less attractive for 
crawlers here: http://drupal.org/node/65017 If we were successful with 
this, we'd have less pages generated and thus the percentage of 
successfull cache hits would increase.

I have to note that the filter on drupal.org usually only caches the 
entries for four hours, not 24 h as a stock drupal. This change was made 
to keep the number of cached items at bay. I did change this to the full 
24 hours. This was not intentional as I can not compare my data with the 
older data I got in all cases.

Number of cached items

This numbers obviously fluctuates a lot:

filter: 132825 (usually about 7000 for the 4h cache)
page: 7345 (a very high value, usually between 1000 and 3000)
menu: 1040
rest: 79328 (I had forgotten to remove the previous items from the 
table, there were only 500 relevant items)


Speed

The individual cached items differ in number and size. Also does the 
time these queries take.

          SELECT     INSERT     UPDATE    DELETE
filter:   0.002      0.005      0.002     0.087
page:     0.007      0.009      0.003     0.188
menu:     0.011      0.014      0.016       n/a
rest:     0.002      0.007      0.004       n/a

The page and menu items are the largest, so no surprise that the time is 
also longest. The number of entries in the table does not seem to matter 
that much.

Total weighted average:

          SELECT    INSERT
Now       0.003     0.007
Test 1    0.010     0.011
Test 2    0.007     0.008

There is a marked improvement here. Splitting up the cache apparently 
has the benefits I tried to achieve: Reduction of query times by having 
smaller tables and splitting off less-often-changing data into separate 
tables to allow the query cache to operate more effectively.

Considering that the cache_get query is the query which accumulates more 
time than any other query on drupal.org, I think that we should strongly 
consider this patch.

Cheers,
	Gerhard


More information about the development mailing list