[support] how cache works: on the fly content creation (not node) and caching

Ivan Sergio Borgonovo mail at webthatworks.it
Mon Jan 4 13:58:01 UTC 2010


On Sat, 2 Jan 2010 14:26:02 -0600
Larry Garfield <larry at garfieldtech.com> wrote:

> > Actually for few tens of a second I was thinking to use ISAM just
> > for cache[1]... I wonder if this would become easier for D7. Yeah
> > I'm aware of fastpath... but I still meant *easier*.

> Actually, I'd go the other way.  In Drupal, the cache tables are
> one of the busier tables.  Cache, session, etc. are some of the
> heavier-write tables that you don't want to do table-level locking
> on.  watchdog or search indexes are good candidates for MyISAM,
> but not cache.

mmm I just noticed that a
db_lock_table($table);
has disappeared from D5 to D6
Why was that needed? and what made it possible to remove it?

Code is pretty different in D7. I noticed that data is retrieved
from the DB and then checked for validity in prepareItem.
But it seems that all the conditions could be checked in the DB,
avoiding to return stale data earlier.
What am I missing?
I couldn't see any place where $user->cache is set to anything other
than 0 unless when cache is cleared...
So yeah if most users have $user->cache==0 then filtering in the DB
may waste "secondary cache" space (eg. cached DB queries), but
then... what's the use of $user->cache?

> > Anyway even in pure speed if you compare pg with InnoDB
> > Postgresql became quite competitive in the past years, if you
> > include other factors other than "pure speed" when comparing pg
> > with InnoDB, pg is more than just competitive. So maybe you're
> > right... other factors come to play, and not just performance.
> > Prejudice?

> Exactly my point.  Old prejudices about MySQL *and* Postgres are
> no longer really accurate, and to continue to spread them is

I really admit I don't have too much experience in tweaking MySQL...
still being so popular I had to solve some problems with it.
I never run into corrupted DB on a default installation in Debian
with pg but I regularly experience corrupted DB in a default
installation of MySQL/ISAM. Considering I'm a "humble programmer"
and not an admin or DBA this is far from constituting statistical
evidence... but... it build up prejudices for sure.

Still as a programmer I appreciate when stuff have consistent,
predictable behaviours and return me understandable error
messages and not when they silently trim or cast etc...
I still feel more comfortable writing and *debugging* code for pg.

> > What is the mechanism governing "next general cache wipe"?

> Any time cache_clear_all() is called, anything that's not marked
> as CACHE_PERMANENT gets cleared.  Unfortunately it gets called a
> little more often than many people realize, because the cache is
> not as fine-grained as it should be.

So... it looks it get cleared when:
- feeds are refreshed
- blocks are touched
- comments are touched
- menu are touched
- a single node is touched
... etc... etc... etc...
but most importantly... it seems in each cron run.
that's for D6

When is cache_page going to be cleaned in D5 when cache_lifetime==0?

How am I going to influence when a cached page is wiped other than
setting a minimum lifetime?
Even in D7 $expire is not accessible/configurable directly if at all.
A not always working trick could be to set
$GLOBALS['conf']['cache_lifetime'] /* didn't check if it is the
right place but I guess you got the idea */
but in many places $expire is hard coded.

Furthermore there is no way I can see to "late invalidate" a page
cache.
eg. let's say I saved something like "from this moment on, don't use
cache for A, B and C" in the session... since hook_init fire later
than when the cache is already loaded, I'm obliged to return the
cached version.

What method am I going to use to offer custom content to anonymous
users without invalidating the cache for everyone else?

Suppose I just would like to add a greeting at the top of all the
pages for all users (anonymous) that just took the time to fill in
their name (I bet you can find more useful examples)...
I set a cookie/flag in the session... but whatever I write will come
into play to late to decide if I can serve a cached page or not.

How am I going to handle this?
Even having block and page cache separated this force to turn off
page cache for everyone even if it reduces the cost of completely
regenerating content.
There should be a way to disable cache use at least according to
$_SESSION content.

> Perhaps what you could do is just use your own cache table, and
> don't identify that cache table to the cache system.  That way it
> gets ignored by the drupal_flush_all_caches() command, but you can

I went for a functional index... that's not portable... but it is
very very cheap to implement and test.
Then when the underlying DB data change... I can selectively
invalidate the cached pages reasonably quickly.
Another way would be to create a (cid, pk) table and use a join to
delete stale pages, but then when a cached page get deleted I'd have
to do some clean-up of the (cid, pk) table and that's not easy or
quick if you can't use triggers, that will make the solution equally
not portable.

One way would have to have an hook_cache_clear() poor's man trigger
for all the content that is passed to drupal without passing through
its "objects".

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it



More information about the support mailing list