Hi there! In my quest to improve caching for Drupal, I've devised a cache that would let us have more than one cache table. The idea came from the observation that some cached items will quickly be invalidated again (page cache) while others will last a while (menu, filter). This is a problem for the mysql query cache, which will then also be invalidated. Also, operations on smaller tables will be faster. The patch can be found here: http://drupal.org/node/72290 I have now run this patch on drupal.org for several hours and here are some results. First, let us look at the cache hits vs cache call ratio as a percentage of the total tries. In our case that is the number of SELECTs minus the number of UPDATEs devided by the SELECTs on a particular cache table: (# selects - # updates) / # selects Explanation: We run a SELECT in every case to determine if there is a cached item, but only the ones which do not trigger a subsequent cache update were successfull. filter: (242318-21780)/242318 = 91% page: (13315-7285)/13315 = 45% menu: (9989-161)/9989 = 98% the rest: (986-488)/986 = 50% So, we have now proof that our page cache isn't working well for a busy site such as drupal.org, but the filter cache seems to work fine while the menu cache is truly excellent. The "rest" are a few feeble attemps on caching in cvs and project module, which apparently don't work too well either. The high part of cache misses for the page cache is probably in part due to the fact that several crawlers are doing their business on drupal.org at all times. These will generate a cached page, but if the page is in fact very uninteresting to humans then the only likely receiver of a cached page is another crawler. And there aren't that many of them that this would happen. I've suggested to make Drupal less attractive for crawlers here: http://drupal.org/node/65017 If we were successful with this, we'd have less pages generated and thus the percentage of successfull cache hits would increase. I have to note that the filter on drupal.org usually only caches the entries for four hours, not 24 h as a stock drupal. This change was made to keep the number of cached items at bay. I did change this to the full 24 hours. This was not intentional as I can not compare my data with the older data I got in all cases. Number of cached items This numbers obviously fluctuates a lot: filter: 132825 (usually about 7000 for the 4h cache) page: 7345 (a very high value, usually between 1000 and 3000) menu: 1040 rest: 79328 (I had forgotten to remove the previous items from the table, there were only 500 relevant items) Speed The individual cached items differ in number and size. Also does the time these queries take. SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a The page and menu items are the largest, so no surprise that the time is also longest. The number of entries in the table does not seem to matter that much. Total weighted average: SELECT INSERT Now 0.003 0.007 Test 1 0.010 0.011 Test 2 0.007 0.008 There is a marked improvement here. Splitting up the cache apparently has the benefits I tried to achieve: Reduction of query times by having smaller tables and splitting off less-often-changing data into separate tables to allow the query cache to operate more effectively. Considering that the cache_get query is the query which accumulates more time than any other query on drupal.org, I think that we should strongly consider this patch. Cheers, Gerhard
Gerhard Killesreiter wrote:
Total weighted average:
SELECT INSERT Now 0.003 0.007 Test 1 0.010 0.011 Test 2 0.007 0.008
There is a marked improvement here.
Moshe pointed out that it is now clear what is what. "Now" is with the "split cache patch" applied. "test 1" is stock Drupal-4.7-CVS aka 4.7.3, "test 2" was with the patch for extra caching of forum blocks and navigation applied. And of course: all times are given in seconds (mea maxima culpa). In other words, with this patch applied the average cache_get was faster by a factor of three. I think this pretty much kicks ass and we should get this patch applied. I'll bring it up to date and will incorporate the suggestions I got if our Maximo Lider agrees that the patch is a sound idea. Cheers, Gerhard
Thanks Gerhard for this, In light of my recent complaint about Drupal's CPU usage, I can only applaud Gerhard's efforts in this area... a question... On Thursday 03 August 2006 06:39 am, Gerhard Killesreiter wrote:
filter: (242318-21780)/242318 = 91% page: (13315-7285)/13315 = 45% menu: (9989-161)/9989 = 98% the rest: (986-488)/986 = 50%
So, we have now proof that our page cache isn't working well for a busy site such as drupal.org, but the filter cache seems to work fine while the menu cache is truly excellent. The "rest" are a few feeble attemps on caching in cvs and project module, which apparently don't work too well either.
I must admit that I am not sure what the 'filter' is. Is that a table with a row for the content of each node body, and each comment body, after all filters have been applied? A. -- http://www.wechange.org/ Because we and the world need to change. http://www.reuniting.info/ Intimate Relationships, peace and harmony in the couple.
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
So, we have now proof that our page cache isn't working well for a busy site such as drupal.org, but the filter cache seems to work fine while the menu cache is truly excellent. The "rest" are a few feeble attemps on caching in cvs and project module, which apparently don't work too well either.
Can we do a really blunt experiment and do per-user caching of entire pages? The cache will grew big, but it would be interesting to know how it affects performance. -- Dries Buytaert :: http://www.buytaert.net/
Dries Buytaert wrote:
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
So, we have now proof that our page cache isn't working well for a busy site such as drupal.org, but the filter cache seems to work fine while the menu cache is truly excellent. The "rest" are a few feeble attemps on caching in cvs and project module, which apparently don't work too well either.
Can we do a really blunt experiment and do per-user caching of entire pages?
Sure :)
The cache will grew big, but it would be interesting to know how it affects performance.
The question is what we would gain, ie how the cache hit/miss ratio develops, but I will give it a try. Personally, I think the outcome won't be too different or even worse from what I observered with the anonymous page cache: The pages will be invalidated too soon to get hit a second time. But maybe I am wrong. Cheers, Gerhard
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a
Is that in seconds or ms? -- Dries Buytaert :: http://www.buytaert.net/
Dries Buytaert wrote:
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a
Is that in seconds or ms?
That's seconds, ie your sql query to get the menu cache will on average contribute 0.011 seconds to the page build time, if you use my patch. Cheers, Gerhard
Gerhard Killesreiter wrote:
Dries Buytaert wrote:
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a
Is that in seconds or ms?
That's seconds,
I admit, it is a bit confusing: The devel module usually multiplies the times it reports by 1000. I mistakenly copied the header to the output function. :p I will change the output function to match the rest of devel.module. Cheers, Gerhard
On 03 Aug 2006, at 18:52, Gerhard Killesreiter wrote:
Dries Buytaert wrote:
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a
Is that in seconds or ms?
That's seconds, ie your sql query to get the menu cache will on average contribute 0.011 seconds to the page build time, if you use my patch.
So we improve performance by 3ms? That is rather small-ish, not? -- Dries Buytaert :: http://www.buytaert.net/
Dries Buytaert wrote:
On 03 Aug 2006, at 18:52, Gerhard Killesreiter wrote:
Dries Buytaert wrote:
On 03 Aug 2006, at 00:39, Gerhard Killesreiter wrote:
SELECT INSERT UPDATE DELETE filter: 0.002 0.005 0.002 0.087 page: 0.007 0.009 0.003 0.188 menu: 0.011 0.014 0.016 n/a rest: 0.002 0.007 0.004 n/a
Is that in seconds or ms?
That's seconds, ie your sql query to get the menu cache will on average contribute 0.011 seconds to the page build time, if you use my patch.
So we improve performance by 3ms? That is rather small-ish, not?
No, by 7ms per cache_get, you need to look at the other table, which compares the average times of the SELECTs (3ms vs 10ms). I agree that it is a small contribution if you consider page creation time (the total depends on how many cache_gets happen on a page), but if you consider the time spent on the mysql server it is quite a lot because the cache_get queries are run that often. To get some idea of how much this change affects a the page generation time, we look at the number of cache_gets from the cache_menu table, 10997. That means we sampled 10997 page views (remember we only sample 10% of all page views). Over the same time we've had a total of 298891 cache_get function calls. That means that there are on average 298891 / 10997 = 27 cache_get calls per page. From this I infer that we save 7ms * 27 = 189 ms per page view. Not /that/ bad. Cheers, Gerhard
I just wanted to ask if there's something I don't know about uploading patches to the issue tracking system. Strange things happend: I made a patch called "forum.module.patch" and uploaded it in good faith everything's OK. Then beginner noticed that the patch isn't good. I was sure it was, but then I took a look and noticed that the link for the patch opened a file dating to 21 Jan 2005 20:09:30. I tried to upload the patch again -> it was now renamed to "forum.module_0.patch" and it was too, linked to an older patch from 21 Jan 2005 20:12:34. Then I renamed my patch to "forum.module-cannot-edit-forum.patch", uploaded it -> BUT the link to the patch would not appear. I retried and now both, the current and also a link in the previous post to the correct patch appeared. Am I supposed to give patches unique filenames? Tadej
On Aug 4, 2006, at 2:24 AM, Tadej Baša wrote:
I just wanted to ask if there's something I don't know about uploading patches to the issue tracking system. Strange things happend:
http://drupal.org/node/add/project_issue/75232
Am I supposed to give patches unique filenames?
for now, it probably wouldn't hurt. ;) all issue attachments are stuffed into a single directory. the code is supposed to gracefully recover from collisions and automatically rename files, but i haven't looked closely at that code and can't vouch for its correctness. the whole way in which project_issue.module (recently split off from project.module) handles file attachments is old, crusty, and partially broken. it's all going to change once issue followups are real comments (http://drupal.org/node/18920). however, until then, using unique names will probably help ensure there's no confusion... -derek (dww)
participants (5)
-
Augustin (Beginner) -
Derek Wright -
Dries Buytaert -
Gerhard Killesreiter -
Tadej Baša