This is a fork from the Caching, caching, caching thread. I have opened an issue to start testing the gains for using an object cache on commonly-loaded objects ($user, $node, $block, $profile, $comment, $term) at http://drupal.org/node/74020 The patches apply against HEAD. (Be gentle, they are my first CVS patches.) I'm currently using this approach in 4.6 to run the (very large) SavannahNow.com site and reduce database overhead. I'm currently testing a stripped-down HEAD install on a Mac PowerBook G4, so the testing numbers listed there may be misleading. I toyed with the idea of creating some new functions for cache.inc to make retrieving objects easier, but for the time being am just using cache_set, cache_get, and cache_clear_all. One question: has file-based caching made it into HEAD yet? -- Ken Rickard drupal: agentrickard gmail: agentrickard
On Jul 16, 2006, at 3:12 PM, Ken Rickard wrote:
One question: has file-based caching made it into HEAD yet?
There was interest in having it and then it seems the feeling was it's complex. the Caching API allows modules to implement a cache. However, the real benefit of file based caching is 1) Use lightweight webservers threads to serve files 2) Don't need to bootstrap Drupal 3) Don't need an expensive MySQL thread. Keep asking for it and it's more likely to get in since the code is available to be patched against head. Kieran
There was interest in having it and then it seems the feeling was it's complex. the Caching API allows modules to implement a cache. However, the real benefit of file based caching is 1) Use lightweight webservers threads to serve files 2) Don't need to bootstrap Drupal 3) Don't need an expensive MySQL thread.
Keep asking for it and it's more likely to get in since the code is available to be patched against head.
the consensus was that we would make core flexible enough such that this functionality can grow in Contrib for a while. At this point, I think effort is best expended in changing this code from a patch and into a Contrib cache.inc.
On Jul 16, 2006, at 4:51 PM, Moshe Weitzman wrote:
There was interest in having it and then it seems the feeling was it's complex. the Caching API allows modules to implement a cache. However, the real benefit of file based caching is 1) Use lightweight webservers threads to serve files 2) Don't need to bootstrap Drupal 3) Don't need an expensive MySQL thread. Keep asking for it and it's more likely to get in since the code is available to be patched against head.
the consensus was that we would make core flexible enough such that this functionality can grow in Contrib for a while. At this point, I think effort is best expended in changing this code from a patch and into a Contrib cache.inc.
I don't understand how you can have a contributed alternate bootstrap? If cache.inc would allow for avoiding avoid the Drupal bootstrap I'd be interested. Kieran
I don't understand how you can have a contributed alternate bootstrap? If cache.inc would allow for avoiding avoid the Drupal bootstrap I'd be interested.
Kieran
a) Drupal configuration is loaded from settings.php. If herein we find $conf['page_cache_fastpath'] = TRUE then the following happens: b) cache.inc is loaded c) page_cache_fastpath is called d) Drupal exits. That avoids pretty much all of Drupal. This was added to the cache.inc patch to let you serve the files as fast as possible. I added this with an eye on the file caching patch.
On Jul 16, 2006, at 8:23 PM, Karoly Negyesi wrote:
I don't understand how you can have a contributed alternate bootstrap? If cache.inc would allow for avoiding avoid the Drupal bootstrap I'd be interested.
Kieran
a) Drupal configuration is loaded from settings.php. If herein we find $conf['page_cache_fastpath'] = TRUE then the following happens: b) cache.inc is loaded c) page_cache_fastpath is called d) Drupal exits.
That avoids pretty much all of Drupal. This was added to the cache.inc patch to let you serve the files as fast as possible. I added this with an eye on the file caching patch.
Perfect! Kieran
On 17 Jul 2006, at 01:51, Moshe Weitzman wrote:
There was interest in having it and then it seems the feeling was it's complex. the Caching API allows modules to implement a cache. However, the real benefit of file based caching is 1) Use lightweight webservers threads to serve files 2) Don't need to bootstrap Drupal 3) Don't need an expensive MySQL thread. Keep asking for it and it's more likely to get in since the code is available to be patched against head.
the consensus was that we would make core flexible enough such that this functionality can grow in Contrib for a while. At this point, I think effort is best expended in changing this code from a patch and into a Contrib cache.inc.
Plus, the file caching didn't yield a significant performance improvement (at least not on my setup) unless the "fast path" option was enabled. The "fast path" option makes for a more lightweight bootstrap process. It has been committed to CVS HEAD, but is now called "early page cache" (grep for DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE). The thing with the "fast path" or the "earl page cache" is that it is not specific to file caching. You probably get the same benefit using a database cache. Admittedly, file caching could be very useful in the scenario where the database is a significant bottleneck. I could not reproduce such scenario but I'm confident such configuration exists. -- Dries Buytaert :: http://www.buytaert.net/
I have just started using Drupal but have seen issues where the existing database cache is a bottleneck. My menu has 5,000 items and the overhead to build it, serialize,cache in the db,unserialize was very high making it unuseable. The serialized menu was approx 1.9M according to the DB The workaround was to prevent the builtin caching and store the $_menu in APC user cache.
On 17 Jul 2006, at 01:51, Moshe Weitzman wrote:
There was interest in having it and then it seems the feeling was it's complex. the Caching API allows modules to implement a cache. However, the real benefit of file based caching is 1) Use lightweight webservers threads to serve files 2) Don't need to bootstrap Drupal 3) Don't need an expensive MySQL thread. Keep asking for it and it's more likely to get in since the code is available to be patched against head.
the consensus was that we would make core flexible enough such that this functionality can grow in Contrib for a while. At this point, I think effort is best expended in changing this code from a patch and into a Contrib cache.inc.
Plus, the file caching didn't yield a significant performance improvement (at least not on my setup) unless the "fast path" option was enabled. The "fast path" option makes for a more lightweight bootstrap process. It has been committed to CVS HEAD, but is now called "early page cache" (grep for DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE). The thing with the "fast path" or the "earl page cache" is that it is not specific to file caching. You probably get the same benefit using a database cache.
Admittedly, file caching could be very useful in the scenario where the database is a significant bottleneck. I could not reproduce such scenario but I'm confident such configuration exists.
-- Dries Buytaert :: http://www.buytaert.net/
On 20 Jul 2006, at 23:13, George Kappel wrote:
I have just started using Drupal but have seen issues where the existing database cache is a bottleneck.
My menu has 5,000 items and the overhead to build it, serialize,cache in the db,unserialize was very high making it unuseable.
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that? -- Dries Buytaert :: http://www.buytaert.net/
Right now, I'm just trying to make Drupal work reasonably well under very tight time constraints. FYI site is basically an online catalog of 50,000 items using category module and views module to organize I am also new to Drupal and need some time to discover the Zen under the covers, at this point I might just make it worse, although I would like to dig in when it makes sense. For complicated parts of Drupal, are there formal tests that define/validate that the module/code is working as required? It seems like the menu system does a bunch of stuff that might not be immediately obvious.
On 20 Jul 2006, at 23:13, George Kappel wrote:
I have just started using Drupal but have seen issues where the existing database cache is a bottleneck.
My menu has 5,000 items and the overhead to build it, serialize,cache in the db,unserialize was very high making it unuseable.
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that?
-- Dries Buytaert :: http://www.buytaert.net/
On 20 Jul 2006, at 11:32 PM, Dries Buytaert wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that?
I've been on the verge of bringing this up for a while. The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load. I didn't mention it because i thought it was unlikely that someone had that many menu items i was wrong (and that's one hell of a menu). -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
Adrian Rossouw wrote:
On 20 Jul 2006, at 11:32 PM, Dries Buytaert wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that?
I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load.
Past tense. We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
I didn't mention it because i thought it was unlikely that someone had that many menu items i was wrong (and that's one hell of a menu).
IMNSHO this is a screwed up setup, nothing more. Cheers, Gerhard
Maybe the menu is the wrong tool for the job, but it works well with the APC caching and the jstools menu enhancement. Not really sure what you mean by screwed up setup? Do you mean you can't have that big of a menu?
Adrian Rossouw wrote:
On 20 Jul 2006, at 11:32 PM, Dries Buytaert wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that?
I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we
had with the
aliases until 4.7. We load the entire path alias into memory on every load.
Past tense.
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
I didn't mention it because i thought it was unlikely that someone had that many menu items i was wrong (and that's one hell of a menu).
IMNSHO this is a screwed up setup, nothing more.
Cheers, Gerhard
George Kappel wrote:
Maybe the menu is the wrong tool for the job, but it works well with the APC caching and the jstools menu enhancement.
That's good.
Not really sure what you mean by screwed up setup? Do you mean you can't have that big of a menu?
Yeah, I am thinking that. Cheers, Gerhard
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that? I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load.
Past tense.
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
Memory was an issue, but not the main one. The main issue was using PHP to create an associative array for every page load, which bogged down everything. I installed the backport of the 4.7 patch for a client who has a dedicated SMP server with 2GB of RAM and it made a world of a difference there. You can see the statistics here: http://baaheyeldin.com/click/593/0 I agree that some scenarios (e.g. a sitemap page) where there will be a lot of small queries, but for most pages, this is not the case, and a bottleneck has been removed.
Khalid B wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that? I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load.
Past tense.
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
Memory was an issue, but not the main one.
The main issue was using PHP to create an associative array for every page load, which bogged down everything. I installed the backport of the 4.7 patch for a client who has a dedicated SMP server with 2GB of RAM and it made a world of a difference there.
You can see the statistics here:
Yeah, if you have many aliases, then this is an improvement, if you only have a few, then it isn't. Cheers, Gerhard
Op vr, 21-07-2006 te 01:34 +0200, schreef Gerhard Killesreiter:
Khalid B wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that? I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load.
Past tense.
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
Memory was an issue, but not the main one.
The main issue was using PHP to create an associative array for every page load, which bogged down everything. I installed the backport of the 4.7 patch for a client who has a dedicated SMP server with 2GB of RAM and it made a world of a difference there.
You can see the statistics here:
Yeah, if you have many aliases, then this is an improvement, if you only have a few, then it isn't.
How about loading all aliases at once when the number of aliases is smaller than <insert some magic number here>, and loading them one by one when there are more. This is one of the first things i changed after the last upgrade because I have only 10 aliases or so, no worth the additional 50 queries it took to generate my front page ;)
Bart Jansens wrote:
Op vr, 21-07-2006 te 01:34 +0200, schreef Gerhard Killesreiter:
Khalid B wrote:
That is one hell of a menu. Would be good if we could actually improve the menu system. Have you looked into that? I've been on the verge of bringing this up for a while.
The issue we have with the menu is the exact same issue we had with the aliases until 4.7. We load the entire path alias into memory on every load. Past tense.
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem. Memory was an issue, but not the main one.
The main issue was using PHP to create an associative array for every page load, which bogged down everything. I installed the backport of the 4.7 patch for a client who has a dedicated SMP server with 2GB of RAM and it made a world of a difference there.
You can see the statistics here:
http://baaheyeldin.com/click/593/0 Yeah, if you have many aliases, then this is an improvement, if you only have a few, then it isn't.
How about loading all aliases at once when the number of aliases is smaller than <insert some magic number here>, and loading them one by one when there are more.
Sounds like a good idea. The magic number should probably be set to some fixed value and have a variable_get without a visible interface. Unless somebody comes up with an easy way to calculate the magic number. :p
This is one of the first things i changed after the last upgrade because I have only 10 aliases or so, no worth the additional 50 queries it took to generate my front page ;)
Please post a patch :) Cheers, Gerhard
On 22 Jul 2006, at 11:06, Gerhard Killesreiter wrote:
Sounds like a good idea. The magic number should probably be set to some fixed value and have a variable_get without a visible interface.
Unless somebody comes up with an easy way to calculate the magic number. :p
Or a configuration option, rather than a magic number. Or add a LRU- flag to the alias table, and pre-load a subset of the aliases that are most recently used. You'll probably end up with a hit rate of at least 80%. You wouldn't need a magic number or a configuration setting. Another improvement might be to stop executing SQL queries at the point where all aliases are known to be in the local alias cache. It wouldn't make a difference for sites with many aliases, but for small sites that only alias their primary links, it might make a big difference. Patches? :) -- Dries Buytaert :: http://www.buytaert.net/
Dries Buytaert skrev:
Or add a LRU-flag to the alias table, and pre-load a subset of the aliases that are most recently used. You'll probably end up with a hit rate of at least 80%. You wouldn't need a magic number or a configuration setting. Or why not pre-load most recently used by default, and have as a configurable setting to to allow more aliases in the same query?
Or maybe a much more general solution would be dividing ALL the lookups into queries of "magic size"? Would gain performance both few and many aliases? -- Samuel Lampa SKYPE: samuel_lampa RIL Partner AB www.rilnet.com
On 21 Jul 2006, at 01:19, Gerhard Killesreiter wrote:
We've replaces the one big load by a plethora of small loads. I am convinced this isn't a an improvement in all cases. E.g. on drupal.org, we got plenty of memory to burn, so the old setup probably wasn't much of a problem.
But even on drupal.org, Drupal 4.7's "many small queries" worked better than the "one query"-appoach we used in Drupal 4.6. I attached a conceptual graph that illustrates the impact of the URL aliasing changes for those who aren't well aware . The path aliasing in Drupal 4.7 is better than the one in Drupal 4.6, but that doesn't mean there is no room for further improvements. :) Personally, I find algorithmic improvements much more interesting than caching-based solutions. -- Dries Buytaert :: http://www.buytaert.net/
Personally, I find algorithmic improvements much more interesting than caching-based solutions.
In addition to "algorithmic improvements" there's also the "programmatic improvements". I've posted a few comments on this subject but they, so far, have garnered zero response, possibly because they talk about things with are "out of scope" for (imho) 99% of Drupal sites anyway. An simple example is the MySQL INT type. The MySQL C API returns everything as a string regardless of how the DB actually stores the data. So, you can be more prudent in how you use these types in PHP userspace rather than doing type conversations (the ctype_*/is_numeric issue recently an example of this). PHP is a great language for "getting things done" but the convenience can come as a performance price. These sort of issues only pertain to a small minority of cases but if those few cases are often called they tend to add up in the long run. best regards, --AjK
On Jul 21, 2006, at 4:52 AM, Dries Buytaert wrote:
The path aliasing in Drupal 4.7 is better than the one in Drupal 4.6, but that doesn't mean there is no room for further improvements. :)
Personally, I find algorithmic improvements much more interesting than caching-based solutions.
Other CMS's have been improving their URL alias feature by storing each alias with the data it represents. We could store node aliases in the node table, user aliases in the user table and menu aliases in the menu table. This reduces the amount of queries we need to generate and would avoid an alias-specific caching solution. Of course there are other rogue aliases that couldn't be accounted for with this approach, but I bet we'd reduce the number of alias queries by 80%. Matt Westgate Lullabot
On 22 Jul 2006, at 21:35, Matt Westgate wrote:
Other CMS's have been improving their URL alias feature by storing each alias with the data it represents. We could store node aliases in the node table, user aliases in the user table and menu aliases in the menu table. This reduces the amount of queries we need to generate and would avoid an alias-specific caching solution.
Of course there are other rogue aliases that couldn't be accounted for with this approach, but I bet we'd reduce the number of alias queries by 80%.
That's easy to count. Just add some counter logic to drupal_lookup_path() and we'd be able to estimate the gain. :) Anyway, this brings us back to the de-normalization discussion we've had lately. We could potentially cache a lot of node related properties in the node table (eg. the comment count, last comment timestamp, username, etc). We're running around in circles, me thinks. Time to implement some approaches and to compare their implementation and performance? -- Dries Buytaert :: http://www.buytaert.net/
On 7/22/06, Dries Buytaert <dries.buytaert@gmail.com> wrote:
On 22 Jul 2006, at 21:35, Matt Westgate wrote:
Other CMS's have been improving their URL alias feature by storing each alias with the data it represents. We could store node aliases in the node table, user aliases in the user table and menu aliases in the menu table. This reduces the amount of queries we need to generate and would avoid an alias-specific caching solution.
Of course there are other rogue aliases that couldn't be accounted for with this approach, but I bet we'd reduce the number of alias queries by 80%.
That's easy to count. Just add some counter logic to drupal_lookup_path() and we'd be able to estimate the gain. :)
Anyway, this brings us back to the de-normalization discussion we've had lately. We could potentially cache a lot of node related properties in the node table (eg. the comment count, last comment timestamp, username, etc).
We're running around in circles, me thinks. Time to implement some approaches and to compare their implementation and performance?
Matt has a point here ... I don't think it is bad denormalization in this case. What we have now is a one to many relationship of node -> alias, and we allow multiple aliases for the same node. If we go to a 1:1 relationship, then the logical place for the alias is in the object that it represents (term, user, node), and this would speed up things for certain operations (e.g. emitting an alias instead of the native path), but potentially slow others (looking up an incoming alias and converting to the native path?) What we lose here is the ability for multiple aliases. Or we can keep the other (older/additional/...etc) aliases where they are now, and have the current alias in the object's table. To clarify: I have node/123 which starts life as this_is_an_alias, based on what pathauto does. This is indexed, linked to, ...etc. Then later, the title changes, and it becomes this_is_a_new_alias. this_is_a_new_alias is stored in the node table, and this_is_an_alias is in the url_alias table. The same goes for a user who calls himself kbahey then decides to change his Drupal name to KhalidB. This is not bad denormalization if it can be called that. The current alias represents something and the older one represent something else. If there is an index on the current alias field, it will take care of the slowness part to some extent.
On Saturday 22 July 2006 16:19, Khalid B wrote:
I don't think it is bad denormalization in this case.
What we have now is a one to many relationship of node -> alias, and we allow multiple aliases for the same node.
If we go to a 1:1 relationship, then the logical place for the alias is in the object that it represents (term, user, node), and this would speed up things for certain operations (e.g. emitting an alias instead of the native path), but potentially slow others (looking up an incoming alias and converting to the native path?)
What we lose here is the ability for multiple aliases.
I don't see how we do. It just declares one alias "primary". That is, if node 5 has an alias of my/fifth/node in the node table, that is the alias that is used for any output. Vis, system-generated links to node/5 are always rewritten to my/fifth/node. The separate alias table then is for *incoming* paths. You can have as many aliases for a page as you want for incoming requests, but only one for Drupal output. (And really, why would you want to have multiple aliases that get printed by Drupal? That only confuses people.) As an example, suppose you have a weekly newsletter, with each newsletter being a node. The alias for each newsletter would be its date, say newsletter/2006-07-22. That's the primary alias, and that's in the node table. Then there's also an alias newsletter/latest that points to whichever is most recent (updated however), which you can put into mailings and such. Once the user's there, however, there's no reason to not send them to the dated alias. So we're not losing multi-alias support at all, at least not really. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On 7/23/06, Larry Garfield <larry@garfieldtech.com> wrote:
On Saturday 22 July 2006 16:19, Khalid B wrote:
I don't think it is bad denormalization in this case.
What we have now is a one to many relationship of node -> alias, and we allow multiple aliases for the same node.
If we go to a 1:1 relationship, then the logical place for the alias is in the object that it represents (term, user, node), and this would speed up things for certain operations (e.g. emitting an alias instead of the native path), but potentially slow others (looking up an incoming alias and converting to the native path?)
What we lose here is the ability for multiple aliases.
I don't see how we do. It just declares one alias "primary".
That is, if node 5 has an alias of my/fifth/node in the node table, that is the alias that is used for any output. Vis, system-generated links to node/5 are always rewritten to my/fifth/node.
The separate alias table then is for *incoming* paths. You can have as many aliases for a page as you want for incoming requests, but only one for Drupal output. (And really, why would you want to have multiple aliases that get printed by Drupal? That only confuses people.)
As an example, suppose you have a weekly newsletter, with each newsletter being a node. The alias for each newsletter would be its date, say newsletter/2006-07-22. That's the primary alias, and that's in the node table. Then there's also an alias newsletter/latest that points to whichever is most recent (updated however), which you can put into mailings and such. Once the user's there, however, there's no reason to not send them to the dated alias.
So we're not losing multi-alias support at all, at least not really.
You are right. I sort of negated that in my previous message when I said: "Or we can keep the other (older/additional/...etc) aliases where they are now, and have the current alias in the object's table.", and the example says that. So, I am in support of the main/current/primary alias being in the node table and the others where they are today for incoming requests. That should improve page generation for taxonomy, node and users, since there will be zero extra queries.
Larry Garfield wrote:
On Saturday 22 July 2006 16:19, Khalid B wrote:
I don't think it is bad denormalization in this case.
What we have now is a one to many relationship of node -> alias, and we allow multiple aliases for the same node.
If we go to a 1:1 relationship, then the logical place for the alias is in the object that it represents (term, user, node), and this would speed up things for certain operations (e.g. emitting an alias instead of the native path), but potentially slow others (looking up an incoming alias and converting to the native path?)
What we lose here is the ability for multiple aliases.
I don't see how we do. It just declares one alias "primary".
That is, if node 5 has an alias of my/fifth/node in the node table, that is the alias that is used for any output. Vis, system-generated links to node/5 are always rewritten to my/fifth/node.
The separate alias table then is for *incoming* paths. You can have as many aliases for a page as you want for incoming requests, but only one for Drupal output. (And really, why would you want to have multiple aliases that get printed by Drupal? That only confuses people.)
As an example, suppose you have a weekly newsletter, with each newsletter being a node. The alias for each newsletter would be its date, say newsletter/2006-07-22. That's the primary alias, and that's in the node table. Then there's also an alias newsletter/latest that points to whichever is most recent (updated however), which you can put into mailings and such. Once the user's there, however, there's no reason to not send them to the dated alias.
So we're not losing multi-alias support at all, at least not really.
Multi-aliases are BAD. We should avoid them as much as possible, unfortunately we can't drop support for them. newsletter/2006-07-22 is good, but newsletter/latest should be 301 redirect to newsletter/2006-07-22. Path module should support this -- Jakub Suchý <jakub.suchy@logios.cz> GSM: +420 - 777 817 949 LOGIOS s.r.o, V Podhájí 776/30, 400 01 Ústí nad Labem tel.: +420 - 474 745 159, fax: +420 - 474 745 160 e-mail: info@logios.cz, web: http://www.logios.cz
On 22 Jul 2006, at 9:40 PM, Dries Buytaert wrote:
That's easy to count. Just add some counter logic to drupal_lookup_path() and we'd be able to estimate the gain. :)
Anyway, this brings us back to the de-normalization discussion we've had lately. We could potentially cache a lot of node related properties in the node table (eg. the comment count, last comment timestamp, username, etc).
http://drupal.org/node/63635 there was a patch for this. but it needs updating. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On Sat, 22 Jul 2006, Matt Westgate wrote:
The path aliasing in Drupal 4.7 is better than the one in Drupal 4.6, but that doesn't mean there is no room for further improvements. :)
Personally, I find algorithmic improvements much more interesting than caching-based solutions.
Other CMS's have been improving their URL alias feature by storing each alias with the data it represents. We could store node aliases in the node table, user aliases in the user table and menu aliases in the menu table. This reduces the amount of queries we need to generate and would avoid an alias-specific caching solution.
Of course there are other rogue aliases that couldn't be accounted for with this approach, but I bet we'd reduce the number of alias queries by 80%.
The problem is with incoming aliases. How would you know, where to look for, if you receive this URL: /joeblue/testing Maybe this is a user page subtab? Maybe a node? Menu item? Gabor
On 24 Jul 2006, at 13:14, Gabor Hojtsy wrote:
The path aliasing in Drupal 4.7 is better than the one in Drupal 4.6, but that doesn't mean there is no room for further improvements. :) Personally, I find algorithmic improvements much more interesting than caching-based solutions.
Other CMS's have been improving their URL alias feature by storing each alias with the data it represents. We could store node aliases in the node table, user aliases in the user table and menu aliases in the menu table. This reduces the amount of queries we need to generate and would avoid an alias-specific caching solution.
Of course there are other rogue aliases that couldn't be accounted for with this approach, but I bet we'd reduce the number of alias queries by 80%.
The problem is with incoming aliases. How would you know, where to look for, if you receive this URL:
/joeblue/testing
Maybe this is a user page subtab? Maybe a node? Menu item?
Easy, we'd continue to maintain a URL alias table with all the path aliases. In the node/user/taxonomy object we'd just 'cache' the active path alias (duplication). (I'm not claiming this is the best solution.) -- Dries Buytaert :: http://www.buytaert.net/
Easy, we'd continue to maintain a URL alias table with all the path aliases. In the node/user/taxonomy object we'd just 'cache' the active path alias (duplication).
(I'm not claiming this is the best solution.)
Sounds like a legitimate case for de-normalization for performance. The tricky part is making sure that who ever updates one updates the other, so we stay consistent. This better be a function that is called and does the work behind the scene, and no one touches the raw tables.
Hello, Op dinsdag 25 juli 2006 16:53, schreef Khalid B:
Easy, we'd continue to maintain a URL alias table with all the path aliases. In the node/user/taxonomy object we'd just 'cache' the active path alias (duplication).
(I'm not claiming this is the best solution.)
Sounds like a legitimate case for de-normalization for performance. The tricky part is making sure that who ever updates one updates the other, so we stay consistent. This better be a function that is called and does the work behind the scene, and no one touches the raw tables.
It is not persé. The problem is far simpler, and IMO we are circling around the wrong solution all the time: fact: a lot of db queris (in a huge table) has bad performance. fact: anything that is created automatically can be de-created automatically. fact: with a very few exceptions everyone with large alias tables is using pautauto or another automated method. example: node/123 -> posts/news/man-falls-in-water which was generated with a "pattern" posts/[term_name]/[title] why do you want to store that in the first place? It is just as simple to tear apart then to mak. It is not even a heavy piece that yuo might want to cache. IMO we should look beyond the "path table is slow, let us make the table faster". A few other concepts to look at are: * routers: menus not only handle incoming links but the outgoing ones too. (detail:http://lists.drupal.org/archives/development/2006-05/msg00864.html ) * hook_alias (with caching, quit-after-first-hit and other speed optimisations): pathauto can deliver a on-the-fly-crafted link. path.module can deliver aliases from the DB * callbacks: simple version of routers: each path *pattern* gets a callback function that will handle the (de)composing of that path. I guess there are a lot moer options, options that have more potential in performance, flexibility and simplicity of code then just tweaking the path tables. Bèr
On 26 Jul 2006, at 09:57, Bèr Kessels wrote:
A few other concepts to look at are: * routers: menus not only handle incoming links but the outgoing ones too. (detail:http://lists.drupal.org/archives/development/2006-05/ msg00864.html ) * hook_alias (with caching, quit-after-first-hit and other speed optimisations): pathauto can deliver a on-the-fly-crafted link. path.module can deliver aliases from the DB * callbacks: simple version of routers: each path *pattern* gets a callback function that will handle the (de)composing of that path.
I don't understand how the proposed solutions would work AND be more lightweight than what we have now. Please be more specific or post some pseudo-code. Would love to know more about these. -- Dries Buytaert :: http://www.buytaert.net/
At 4:13 PM -0500 20/7/06, George Kappel wrote:
I have just started using Drupal but have seen issues where the existing database cache is a bottleneck.
My menu has 5,000 items and the overhead to build it, serialize,cache in the db,unserialize was very high making it unuseable.
The serialized menu was approx 1.9M according to the DB
The workaround was to prevent the builtin caching and store the $_menu in APC user cache.
Do you have any figures showing the improvement from this change? I benchmarked APC a while ago (v3.0.8) and couldn't manage to show a significant performance increase. Data stored in APC is also serialized and unserialized. You're only saving the database call and with a busy site and a well-tuned database this result should be served from RAM. Not to mention, you still need to build the menu when it changes and that's where the real overhead occurs. ...R.
Working on a development server 1GB RAM, Mysql caching turned on performance was poor with 1 user With the following change performance was hugely improved, I guess it is possible that something is wrong With the mysql setup but it seems to work very fast except for huge blob data. For this site the menu does not change function menu_get_menu() /* if ($cached = cache_get($cid)) { $_menu = unserialize($cached->data); } else { _menu_build(); // Cache the menu structure for this user, to expire after one day. cache_set($cid, serialize($_menu), time() + (60 * 60 * 24)); } */ if ($_menu = apc_fetch($cid)) { //loaded menu from cache } else { _menu_build(); apc_store($cid,$_menu,36000); }
Do you have any figures showing the improvement from this change? I benchmarked APC a while ago (v3.0.8) and couldn't manage to show a significant performance increase.
Data stored in APC is also serialized and unserialized. You're only saving the database call and with a busy site and a well-tuned database this result should be served from RAM.
Not to mention, you still need to build the menu when it changes and that's where the real overhead occurs.
...R.
if ($_menu = apc_fetch($cid)) { //loaded menu from cache } else { _menu_build(); apc_store($cid,$_menu,36000); }
Is it worth it considering this and other performance tweeks and making them into a .inc that checks for function_exists() and uses APC (or memcached or what have you)?
participants (18)
-
Adrian Rossouw -
AjK -
Bart Jansens -
Bèr Kessels -
Dries Buytaert -
Gabor Hojtsy -
George Kappel -
Gerhard Killesreiter -
Jakub Suchy -
Karoly Negyesi -
Ken Rickard -
Khalid B -
Kieran Lal -
Larry Garfield -
Matt Westgate -
Moshe Weitzman -
Richard Archer -
Samuel Lampa