[development] Split up the cache table
Gerhard Killesreiter
gerhard at killesreiter.de
Wed Aug 2 22:39:27 UTC 2006
Hi there!
In my quest to improve caching for Drupal, I've devised a cache that
would let us have more than one cache table. The idea came from the
observation that some cached items will quickly be invalidated again
(page cache) while others will last a while (menu, filter). This is a
problem for the mysql query cache, which will then also be invalidated.
Also, operations on smaller tables will be faster.
The patch can be found here:
http://drupal.org/node/72290
I have now run this patch on drupal.org for several hours and here are
some results.
First, let us look at the cache hits vs cache call ratio as a percentage
of the total tries. In our case that is the number of SELECTs minus the
number of UPDATEs devided by the SELECTs on a particular cache table:
(# selects - # updates) / # selects
Explanation: We run a SELECT in every case to determine if there is a
cached item, but only the ones which do not trigger a subsequent cache
update were successfull.
filter: (242318-21780)/242318 = 91%
page: (13315-7285)/13315 = 45%
menu: (9989-161)/9989 = 98%
the rest: (986-488)/986 = 50%
So, we have now proof that our page cache isn't working well for a busy
site such as drupal.org, but the filter cache seems to work fine while
the menu cache is truly excellent. The "rest" are a few feeble attemps
on caching in cvs and project module, which apparently don't work too
well either.
The high part of cache misses for the page cache is probably in part due
to the fact that several crawlers are doing their business on drupal.org
at all times. These will generate a cached page, but if the page is in
fact very uninteresting to humans then the only likely receiver of a
cached page is another crawler. And there aren't that many of them that
this would happen. I've suggested to make Drupal less attractive for
crawlers here: http://drupal.org/node/65017 If we were successful with
this, we'd have less pages generated and thus the percentage of
successfull cache hits would increase.
I have to note that the filter on drupal.org usually only caches the
entries for four hours, not 24 h as a stock drupal. This change was made
to keep the number of cached items at bay. I did change this to the full
24 hours. This was not intentional as I can not compare my data with the
older data I got in all cases.
Number of cached items
This numbers obviously fluctuates a lot:
filter: 132825 (usually about 7000 for the 4h cache)
page: 7345 (a very high value, usually between 1000 and 3000)
menu: 1040
rest: 79328 (I had forgotten to remove the previous items from the
table, there were only 500 relevant items)
Speed
The individual cached items differ in number and size. Also does the
time these queries take.
SELECT INSERT UPDATE DELETE
filter: 0.002 0.005 0.002 0.087
page: 0.007 0.009 0.003 0.188
menu: 0.011 0.014 0.016 n/a
rest: 0.002 0.007 0.004 n/a
The page and menu items are the largest, so no surprise that the time is
also longest. The number of entries in the table does not seem to matter
that much.
Total weighted average:
SELECT INSERT
Now 0.003 0.007
Test 1 0.010 0.011
Test 2 0.007 0.008
There is a marked improvement here. Splitting up the cache apparently
has the benefits I tried to achieve: Reduction of query times by having
smaller tables and splitting off less-often-changing data into separate
tables to allow the query cache to operate more effectively.
Considering that the cache_get query is the query which accumulates more
time than any other query on drupal.org, I think that we should strongly
consider this patch.
Cheers,
Gerhard
More information about the development
mailing list