Greetings, I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different. This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible. This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality. I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas? cheers -josh
I think this is an awesome idea, especially if we make it optional through settings.php. Example - fully loaded nodes can be cached, but only if there's a node_cache specified in settings.php, so it doesn't break existing stuff (or even the content_type overrides like in advanced cache). I know it would be nice to not have to worry about patching files when a new version does come out, but that's the lazy in me talking. Josh Koenig wrote:
Greetings,
I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different.
This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible.
This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality.
I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas?
cheers -josh
There's a patch for node_load() here - http://drupal.org/node/111127 - has been sitting at RTBC for weeks though and just gone stale again. Since terms, users (hopefully, soon) etc. have almost exactly the same code for loading now, it'd be easy to add the same pattern there too, and I was thinking about trying that if node_load() gets in. The new core field API also allows objects to declare themselves as already cached - so the field cache is skipped for those objects and we save filling up memcache bins with the same stuff, so no issues there. However I'm not sure what you mean by a global object caching strategy - actually the same code? Or just parallel APIs? Nedjo Rogers has a patch for the former at http://drupal.org/node/365899 - adding cache_set/get to it wouldn't be hard at all (at least when loading by IDs, which is all we need really). Although I'd be tempted to add the caching in separate patches then unify it all later rather than trying to do both at once. +1, anyway. Nat
Josh Koenig wrote:
Greetings,
I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different.
This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible.
This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality.
I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas?
cheers -josh
Thanks for the pointers on where I need to get up to speed! In terms of what I mean by a "global object caching strategy," ideally that means the same code, but at a minimum parallel APIs or methods. Most importantly it should be easy for savvy developers to follow "the Drupal way." Ultimately this is a design pattern thing, where we stop thinking of everything as necessarily coming from a database, and start thinking in terms of using the database as a highly efficient method for pulling up lists of IDs, and then loading the objects that are germane. My initial thoughts are here: http://groups.drupal.org/node/19385 Without object caching this is potentially expensive: think of the difference between a simple view built around specific fields vs one that calls node_load() for each resulting item. However, once you've got your nodes in a memory cloud, it's actually quite a lot faster. Moreover, this follows emerging best-practices for high availablility and scalability. E.g.: http://dev.mysql.com/doc/refman/5.1/en/ha-memcached.html As memcached becomes a commodity service -- e.g. it's trivial to roll out with the latest Debian and CentOS releases -- the upside of having the community put collective brainpower into the best way to utilize these tools (abd bake support into core) is huge. cheers -josh On Sun, Feb 22, 2009 at 12:29 PM, Nathaniel Catchpole < catch56@googlemail.com> wrote:
There's a patch for node_load() here - http://drupal.org/node/111127 - has been sitting at RTBC for weeks though and just gone stale again.
Since terms, users (hopefully, soon) etc. have almost exactly the same code for loading now, it'd be easy to add the same pattern there too, and I was thinking about trying that if node_load() gets in.
The new core field API also allows objects to declare themselves as already cached - so the field cache is skipped for those objects and we save filling up memcache bins with the same stuff, so no issues there.
However I'm not sure what you mean by a global object caching strategy - actually the same code? Or just parallel APIs? Nedjo Rogers has a patch for the former at http://drupal.org/node/365899 - adding cache_set/get to it wouldn't be hard at all (at least when loading by IDs, which is all we need really). Although I'd be tempted to add the caching in separate patches then unify it all later rather than trying to do both at once.
+1, anyway.
Nat
Josh Koenig wrote:
Greetings,
I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different.
This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible.
This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality.
I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas?
cheers -josh
-- -------------------- Josh Koenig, Partner & CTO http://www.chapterthree.com
Very good point, Catch, I think this global object caching is a good idea, but for it to be really useful we need that global object handling too (Nedjo's patch, http://drupal.org/node/365899 ), and in general we need some object API. Once we have that 'object API' in place adding some uniform caching for objects would be much easier. Then node_load, user_load, etc can be just wrappers for the drupal_load() function and we can have a really functional cache, which also invalidates cached objects when a 'write' operation is done. Add in drupal_load_multiple() and we have a really powerful thing. So I think we should move on first with that patch, then adding the global caching feature will be trivial. Nathaniel Catchpole wrote:
There's a patch for node_load() here - http://drupal.org/node/111127 - has been sitting at RTBC for weeks though and just gone stale again.
Since terms, users (hopefully, soon) etc. have almost exactly the same code for loading now, it'd be easy to add the same pattern there too, and I was thinking about trying that if node_load() gets in.
The new core field API also allows objects to declare themselves as already cached - so the field cache is skipped for those objects and we save filling up memcache bins with the same stuff, so no issues there.
However I'm not sure what you mean by a global object caching strategy - actually the same code? Or just parallel APIs? Nedjo Rogers has a patch for the former at http://drupal.org/node/365899 - adding cache_set/get to it wouldn't be hard at all (at least when loading by IDs, which is all we need really). Although I'd be tempted to add the caching in separate patches then unify it all later rather than trying to do both at once.
+1, anyway.
Nat
Josh Koenig wrote:
Greetings,
I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different.
This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible.
This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality.
I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas?
cheers -josh
I am totally for global object caching strategy. +100 There is the issue of distinguishing cacheable and non-cacheable fields. Does D7 Fields API flag fields as cacheable to make this simpler? Cheers, -- Sammy Spets Synerger http://synerger.com Josh Koenig wrote:
Greetings,
I'd like to take the community's temperature on developing a global object caching strategy for drupal core 7.0. In effect, this would mean pulling the functionality of advcache module into core, though the code would doubtless be a bit different.
This has been previously discussed by Robert Douglass and Steve Rude, and I think both the short-term and strategic value here is clear. As memcached permeates the collective consciousness of the web-development community, and as "cloud" computing becomes more and more prevalent, best practices in architecture increasingly point towards caching full objects whenever possible.
This also fits well with Drupal's existing architecture. There are great functional points (e.g. node_load()) to mount this functionality.
I'm certain I'm not the only one thinking along these lines. Anyone want to throw out their ideas?
cheers -josh
There is the issue of distinguishing cacheable and non-cacheable fields. Does D7 Fields API flag fields as cacheable to make this simpler?
Off the top of my head, it would somewhat defeat the purpose of object-caching to try and handle this on the per-field basis. You'd really need field-level caching then, which is an order of magnitude more in complexity (if not more). That said, I'm not sure what a "non cacheable" field would be like, unless it was some kind of php computed value, which seems like an edge case. Am I mistaking your meaning here? In mose cases, the field values should only change when the object itself is updated, at which point the cache would be invalidated anyway, and the next load would be fresh. This does break down if there's a computed value in a field. The most basic example I can think of is a node with PHP content for the body. Maybe something that displays the current time. As it stands, I believe Drupal's page caching system grabs the output of that PHP and stores it for anonymous users, meaning they'll get out-of-date computations. That's potentially a big deal if you're talking about object-level caching for logged-in users, so perhaps we would treat nodes like this as "uncacheable." Anyway, let me know if I'm way off here. I'll also be checking up on Nedjo's patch, as that definitely seems like the right architectural solution for drupal 7 core. cheers -j
In the node_load() caching patch we've added a hook_nodeapi_post_load() which is uncached - so poll implements hook_nodeapi_load() to get the options in there, and hook_nodeapi_post_load() to get user-specific information. There may well be a use case for a field_attach equivalent which does the same thing. Field already has a cache though, so if we're caching objects persistently there'd actually be no change on this specific point. Nat On Wed, Feb 25, 2009 at 8:58 PM, Josh Koenig <josh@chapterthree.com> wrote:
There is the issue of distinguishing cacheable and non-cacheable fields. Does D7 Fields API flag fields as cacheable to make this simpler?
Off the top of my head, it would somewhat defeat the purpose of object-caching to try and handle this on the per-field basis. You'd really need field-level caching then, which is an order of magnitude more in complexity (if not more).
That said, I'm not sure what a "non cacheable" field would be like, unless it was some kind of php computed value, which seems like an edge case. Am I mistaking your meaning here? In mose cases, the field values should only change when the object itself is updated, at which point the cache would be invalidated anyway, and the next load would be fresh.
This does break down if there's a computed value in a field. The most basic example I can think of is a node with PHP content for the body. Maybe something that displays the current time. As it stands, I believe Drupal's page caching system grabs the output of that PHP and stores it for anonymous users, meaning they'll get out-of-date computations.
That's potentially a big deal if you're talking about object-level caching for logged-in users, so perhaps we would treat nodes like this as "uncacheable."
Anyway, let me know if I'm way off here.
I'll also be checking up on Nedjo's patch, as that definitely seems like the right architectural solution for drupal 7 core.
cheers -j
This is my first post on this list so pleas enlighten the ignorant. IMO having a hook_nodeapi_load and hook_nodeapi_post_load could get a little confusing and is redundant. I agree with Josh in that there should be a $node->cacheable field. The question is where does drupal check this field? For example I could create a cache friendly content type and set the nodes of that type to cacheable in (say hook_load). Another module could come along and alter that node (hook_nodeapi_*) in a way that isn't cache friendly and designate to as non-cacheable, what then? On Wed, 2009-02-25 at 23:28 -0500, Nathaniel Catchpole wrote:
In the node_load() caching patch we've added a hook_nodeapi_post_load() which is uncached - so poll implements hook_nodeapi_load() to get the options in there, and hook_nodeapi_post_load() to get user-specific information.
There may well be a use case for a field_attach equivalent which does the same thing. Field already has a cache though, so if we're caching objects persistently there'd actually be no change on this specific point.
Nat
On Wed, Feb 25, 2009 at 8:58 PM, Josh Koenig <josh@chapterthree.com> wrote: > There is the issue of distinguishing cacheable and non-cacheable fields. > Does D7 Fields API flag fields as cacheable to make this simpler?
Off the top of my head, it would somewhat defeat the purpose of object-caching to try and handle this on the per-field basis. You'd really need field-level caching then, which is an order of magnitude more in complexity (if not more).
That said, I'm not sure what a "non cacheable" field would be like, unless it was some kind of php computed value, which seems like an edge case. Am I mistaking your meaning here? In mose cases, the field values should only change when the object itself is updated, at which point the cache would be invalidated anyway, and the next load would be fresh.
This does break down if there's a computed value in a field. The most basic example I can think of is a node with PHP content for the body. Maybe something that displays the current time. As it stands, I believe Drupal's page caching system grabs the output of that PHP and stores it for anonymous users, meaning they'll get out-of-date computations.
That's potentially a big deal if you're talking about object-level caching for logged-in users, so perhaps we would treat nodes like this as "uncacheable."
Anyway, let me know if I'm way off here.
I'll also be checking up on Nedjo's patch, as that definitely seems like the right architectural solution for drupal 7 core.
cheers -j
-- ---------------------------------- Nabil Alsharif Bright Tree 573-499-1244 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. Please consider the environment before printing this email or its attachment(s).
On Thu, Feb 26, 2009 at 11:21 AM, Nabil Alsharif wrote:
IMO having a hook_nodeapi_load and hook_nodeapi_post_load could get a little confusing and is redundant. I agree with Josh in that there should be a $node->cacheable field.
This was in some of the earlier iterations of http://drupal.org/node/111127- the issue is that one rogue contrib module could completely disable node caching on a site by setting this flag. With _load() and _post_load() the worst that happens if you get them confused is you end up with some stale data or a single module not taking advantage of the cache as well as it could. Either way, would be really great to have some more reviews of the issue itself, since it's been sitting at RTBC for a while, and discussion on this list doesn't get taken into account on the issue queue. Nat
Nabil Alsharif wrote:
This is my first post on this list so pleas enlighten the ignorant.
IMO having a hook_nodeapi_load and hook_nodeapi_post_load could get a little confusing and is redundant. I agree with Josh in that there should be a $node->cacheable field.
The question is where does drupal check this field? For example I could create a cache friendly content type and set the nodes of that type to cacheable in (say hook_load). Another module could come along and alter that node (hook_nodeapi_*) in a way that isn't cache friendly and designate to as non-cacheable, what then?
You've just answered your own question. :-) Your second paragraph is the very reason why there's a two step build process (cacheable and not) rather than a single kill-switch flag. (At least that's how I interpret it; I wasn't involved in the original patch.) --Larry Garfield
participants (7)
-
Jamie Holly -
Jose A. Reyero -
Josh Koenig -
larry@garfieldtech.com -
Nabil Alsharif -
Nathaniel Catchpole -
Sammy Spets