[drupal-devel] CCK Node Caching Strategies

Sun Mar 6 04:50:25 UTC 2005

The CCK implementation is going to require a resurrection of node 
caching, but I'm considering some different approaches. This is an 
attempt to document the design process, while at the same time 
soliciting suggestions.

For this analysis, assume that each content type has 10 simple text 
fields, as well as a multiple-entry text field with an average of 10 
entries. We will be looking at the queries necessary to load and 
display a listing of 10 of these nodes.

One approach is to attempt to join the tables together, and load all 
the fields in one query. This can work in general (Flexinode does 
this), but has a few drawbacks. First, the JOIN operation is slow 
(though faster than doing the queries separately). Second, there is a 
hard limit of 31 joins in a query for MySQL, so this prevents complex 
content types from being constructed. Some users have hit this limit 
when using Flexinode. Third, the multiple-entry field would have to be 
serialized for this to work, which has a significant impact on its 
flexibility when performing queries.
     1x query to fetch node IDs for the ten nodes (10 results)
     10x queries to fetch each node, each with 11 joins (1 result each, 
+ deserialize)
The joins can't be performed in the original query, because the content 
types could be different in each case.

A simpler implementation is to load each field through a separate 
query. This is a very flexible approach, and allows simple 
compartmentalization of the query code into the field modules. The 
obvious downside is that you need many queries for a simple node load.
     1x query to fetch node IDs for the ten nodes (10 results)
     100x queries to fetch each normal field (1 result each)
     10x queries to fetch the multi-value fields (10 results each)

Next, we can add caching to this, the way Gerhard proposed. This means 
that in node_load(), a query is performed to the cache table to load 
the serialized node data.
     1x query to fetch node IDs for the ten nodes (10 results)
     10x queries to fetch each node (1 result each, + deserialize)

I think we can do a bit better, though. By placing the node cache into 
a column in the node table, we can grab it at the same time as the node 
IDs. Then node_load() would look to see if the cache is present in the 
array presented to it, and if so use that rather than perform any 
queries at all.
     1x query to fetch node IDs and cache for the ten nodes (10 results 
+ deserialize)
Note that we can use this method with the cache in the cache table as 
well, but that requires a JOIN using a CONCAT() in the ON clause, which 
seems risky, and involves direct knowledge of the cache table structure 
rather than using the cache API anyway.

Any other ideas?
--
Jonathan Chaffer
Applications Developer, structure:interactive
(616) 364-7423    http://www.structureinteractive.com/