[drupal-devel] CCK Node Caching Strategies
The CCK implementation is going to require a resurrection of node caching, but I'm considering some different approaches. This is an attempt to document the design process, while at the same time soliciting suggestions. For this analysis, assume that each content type has 10 simple text fields, as well as a multiple-entry text field with an average of 10 entries. We will be looking at the queries necessary to load and display a listing of 10 of these nodes. One approach is to attempt to join the tables together, and load all the fields in one query. This can work in general (Flexinode does this), but has a few drawbacks. First, the JOIN operation is slow (though faster than doing the queries separately). Second, there is a hard limit of 31 joins in a query for MySQL, so this prevents complex content types from being constructed. Some users have hit this limit when using Flexinode. Third, the multiple-entry field would have to be serialized for this to work, which has a significant impact on its flexibility when performing queries. 1x query to fetch node IDs for the ten nodes (10 results) 10x queries to fetch each node, each with 11 joins (1 result each, + deserialize) The joins can't be performed in the original query, because the content types could be different in each case. A simpler implementation is to load each field through a separate query. This is a very flexible approach, and allows simple compartmentalization of the query code into the field modules. The obvious downside is that you need many queries for a simple node load. 1x query to fetch node IDs for the ten nodes (10 results) 100x queries to fetch each normal field (1 result each) 10x queries to fetch the multi-value fields (10 results each) Next, we can add caching to this, the way Gerhard proposed. This means that in node_load(), a query is performed to the cache table to load the serialized node data. 1x query to fetch node IDs for the ten nodes (10 results) 10x queries to fetch each node (1 result each, + deserialize) I think we can do a bit better, though. By placing the node cache into a column in the node table, we can grab it at the same time as the node IDs. Then node_load() would look to see if the cache is present in the array presented to it, and if so use that rather than perform any queries at all. 1x query to fetch node IDs and cache for the ten nodes (10 results + deserialize) Note that we can use this method with the cache in the cache table as well, but that requires a JOIN using a CONCAT() in the ON clause, which seems risky, and involves direct knowledge of the cache table structure rather than using the cache API anyway. Any other ideas? -- Jonathan Chaffer Applications Developer, structure:interactive (616) 364-7423 http://www.structureinteractive.com/
On Sat, 5 Mar 2005, Jonathan Chaffer wrote:
The CCK implementation is going to require a resurrection of node caching,
Yay! [...]
I think we can do a bit better, though. By placing the node cache into a column in the node table, we can grab it at the same time as the node IDs. Then node_load() would look to see if the cache is present in the array presented to it, and if so use that rather than perform any queries at all. 1x query to fetch node IDs and cache for the ten nodes (10 results + deserialize)
So, what you will do is: $result = db_query("SELECT ...."); while ($node = db_fetch_object($result)) { if($node->cache == '') { $node = node_load(array('nid' => $node->nid), 'no cache'); } else { $node = unserialize($node->cache); } } ?
Note that we can use this method with the cache in the cache table as well, but that requires a JOIN using a CONCAT() in the ON clause, which seems risky, and involves direct knowledge of the cache table structure rather than using the cache API anyway.
Any other ideas?
I've never liked the structure of the cache table much. It is a one size fits all thing. Why not split is up? cache_page key: url cache_filter key: md5 of string cache_node key: nid cache_menu key: uid cache_misc key: as now This way we can keep the cache out of the node table (where it does not really belong, IMHO). I am pretty sure that SELECTs from smaller tables will also be faster. The unknown number of entries of the filter cache makes me feel uncomfortable. Cheers, Gerhard
Jonathan Chaffer wrote:
A simpler implementation is to load each field through a separate query. This is a very flexible approach, and allows simple compartmentalization of the query code into the field modules. The obvious downside is that you need many queries for a simple node load. 1x query to fetch node IDs for the ten nodes (10 results) 100x queries to fetch each normal field (1 result each) 10x queries to fetch the multi-value fields (10 results each)
111 queries is not acceptable. Using a node cache is essential. Where to store the cached nodes? Well, I'd expect the answer to depend on how we are going to deal with node revisions. Plus, no decision is set in stone. We can move the cached copies around if necessary. -- Dries Buytaert :: http://www.buytaert.net/
participants (3)
-
Dries Buytaert -
Gerhard Killesreiter -
Jonathan Chaffer