[development] Caching, caching, caching...

Gerhard Killesreiter gerhard at killesreiter.de
Tue Jul 11 22:01:28 UTC 2006


Hi there,

you may have noticed my recent interest in better caching. This is due
to me monitoring the drupal.org hardware. Luckily, the problems we've
had (namely the CPU on the db server being saturated) were resolved by
more mundane means (optimze your database on occassion...), but since we
continue to experience a growing number of visitors on drupal.org,
better caching is no waste of time.

I've taken two different approaches:

1) to splt up the cache table. The idea is that parts of the cache table
don't change that often (the cached menu structures for each user) while
others do (page cache, filter cache, ...). The idea is to put the not
often changing parts into their own table to allow the mysql query cache
to effectively cache this table. The cached queries will be invalidated
and removed once one of the underlying tables changes.

This cache has been tried on drupal.org, the results were inconclusive,
more tests need to be done.

2) to cache expensive queries in Drupal's cache table. I've chosen the
forum module as an example and cached the two blocks it provides and the
queries in forum_get_forums (forum_get_topics would be another candidate).

During initial benchmarking, I've found that the results with and
without my patch hardly differed. Why? The mysql query cache cached the
queries. Without the mysql query cache my patch improved the time for
delivering /forum with both blocks enabled by about 20%. But of course
nobody would run mysql without the query cache so the result is hardly
interesting. The question is now: Under which circumstances can my
caching inside Drupal be better than the mysql query cache?

Let's look at the tables involved:

{node}
{node_comment_statistics}
{term_node}
{users}

Any time one of these tables changes, the mysql cache for queries using
the table in question will be invalidated. The approach to caching I've
chosen is more fine grained: Only if the node involved is of type forum,
the cached query results will be invalidated. Changes to the users table
don't matter at all, but we should probably invalidate the cache once a
user change his name.

On a site such as drupal.org where the majority of comments are against
forum nodes, the results of my patch will probably be hardly noticable.
If however we convert the project module to use real comments,
the situation will change.

This discussion seems to make it fairly obvious  that caching results of
queries to tables that hardly ever change (think taxonomy (unless you
use tagging)) is most likely a waste of code. Caching the result of
taxonomy_get_tree does seem to confirm this.

Now the question is if we want to go down this road. To properly
invalidate the cache requires to implement all sorts of hooks (forum
module needs _comment and _user for example) which you would not need
otherwise. This complicates the code quite a bit.

I think that the answer should depend on whether the mysql query cache
is as readily available as I assumed it is. It is possible that it isn't
available on shared hosting, for example. It is also possible that the
example I've chosen was a bad one and caching other structures
(especially those where a fair amount of php processing is done) might
even provide advantages for people who have the query cache at their
disposal.

The tests I've done were all done on my laptop while on battery power
which is hardly a valid test case. Better tests are needed to arrive at
a conclusive result.

Cheers,
    Gerhard



More information about the development mailing list