[development] Caching, caching, caching...

Darrel O'Pry dopry at thing.net
Tue Jul 11 22:21:54 UTC 2006


On Wed, 2006-07-12 at 00:01 +0200, Gerhard Killesreiter wrote:
> Hi there,
> 
> you may have noticed my recent interest in better caching. This is due
> to me monitoring the drupal.org hardware. Luckily, the problems we've
> had (namely the CPU on the db server being saturated) were resolved by
> more mundane means (optimze your database on occassion...), but since we
> continue to experience a growing number of visitors on drupal.org,
> better caching is no waste of time.
> 
> I've taken two different approaches:
> 
> 1) to splt up the cache table. The idea is that parts of the cache table
> don't change that often (the cached menu structures for each user) while
> others do (page cache, filter cache, ...). The idea is to put the not
> often changing parts into their own table to allow the mysql query cache
> to effectively cache this table. The cached queries will be invalidated
> and removed once one of the underlying tables changes.
> 
> This cache has been tried on drupal.org, the results were inconclusive,
> more tests need to be done.
> 
> 2) to cache expensive queries in Drupal's cache table. I've chosen the
> forum module as an example and cached the two blocks it provides and the
> queries in forum_get_forums (forum_get_topics would be another candidate).
> 
> During initial benchmarking, I've found that the results with and
> without my patch hardly differed. Why? The mysql query cache cached the
> queries. Without the mysql query cache my patch improved the time for
> delivering /forum with both blocks enabled by about 20%. But of course
> nobody would run mysql without the query cache so the result is hardly
> interesting. The question is now: Under which circumstances can my
> caching inside Drupal be better than the mysql query cache?
> 
> Let's look at the tables involved:
> 
> {node}
> {node_comment_statistics}
> {term_node}
> {users}
> 
> Any time one of these tables changes, the mysql cache for queries using
> the table in question will be invalidated. The approach to caching I've
> chosen is more fine grained: Only if the node involved is of type forum,
> the cached query results will be invalidated. Changes to the users table
> don't matter at all, but we should probably invalidate the cache once a
> user change his name.
> 
> On a site such as drupal.org where the majority of comments are against
> forum nodes, the results of my patch will probably be hardly noticable.
> If however we convert the project module to use real comments,
> the situation will change.
> 
> This discussion seems to make it fairly obvious  that caching results of
> queries to tables that hardly ever change (think taxonomy (unless you
> use tagging)) is most likely a waste of code. Caching the result of
> taxonomy_get_tree does seem to confirm this.
> 
> Now the question is if we want to go down this road. To properly
> invalidate the cache requires to implement all sorts of hooks (forum
> module needs _comment and _user for example) which you would not need
> otherwise. This complicates the code quite a bit.
> 
> I think that the answer should depend on whether the mysql query cache
> is as readily available as I assumed it is. It is possible that it isn't
> available on shared hosting, for example. It is also possible that the
> example I've chosen was a bad one and caching other structures
> (especially those where a fair amount of php processing is done) might
> even provide advantages for people who have the query cache at their
> disposal.
> 
> The tests I've done were all done on my laptop while on battery power
> which is hardly a valid test case. Better tests are needed to arrive at
> a conclusive result.
> 
> Cheers,
>     Gerhard

I personally am of the opinion to trust my RDMS's query caching
mechanism. That stuff has been being worked on for a long time.

For catering to the LCD shared hosting sites I think caching could be
helpful, after all things can get pushed out of the query cache much
more easily when there are 50 sites all querying the same local db.

I prefer the idea of caching more output items, rendered blocks,
nodes(teaser and full), than trying to beat out the query level caching
provided by the db server. I think with the new file caching code that
could be a good avenue to explore.



More information about the development mailing list