[drupal-devel] Fwd: history-table
Begin forwarded message:
From: Dries Buytaert <dries@buytaert.net> Date: Sat 7 May 2005 13:45:43 CEST To: drupal-devel@drupal.org Subject: [drupal-devel] history-table Reply-To: drupal-devel@drupal.org
From profiling drupal.org, it seems that updating entries in (or inserting entries into) the history-table can be quite slow. Turns out the table it pretty big, so I'm not sure it is missing an index or two.
mysql> select count(*) from history; +----------+ | count(*) | +----------+ | 123805 | +----------+ 1 row in set (0.00 sec)
I spent the past 2 days keeping an eye on this, and it is certainly a performance bottleneck. Querying or updating the history table takes 6 ms whereas other SQL queries take about 0.2 - 0.3 ms on average. That is, we can do 20 to 30 other SQL queries in 6 ms. Something to think about. -- Dries Buytaert :: http://www.buytaert.net/
On Mon, 9 May 2005, Dries Buytaert wrote:
Begin forwarded message:
From: Dries Buytaert <dries@buytaert.net> Date: Sat 7 May 2005 13:45:43 CEST
From profiling drupal.org, it seems that updating entries in (or inserting entries into) the history-table can be quite slow. Turns out the table it pretty big, so I'm not sure it is missing an index or two.
mysql> select count(*) from history; +----------+ | count(*) | +----------+ | 123805 | +----------+ 1 row in set (0.00 sec)
I spent the past 2 days keeping an eye on this, and it is certainly a performance bottleneck. Querying or updating the history table takes 6 ms whereas other SQL queries take about 0.2 - 0.3 ms on average. That is, we can do 20 to 30 other SQL queries in 6 ms. Something to think about.
I think a possible "solution" would be to put the updating of the history table towards the end of a page request. Currently, the table is updated from node_show. If it were updated (with all neccessery node views) at the end of the page request, it would probably not be faster, but the user experience would be better (or do we only start to deliver the page after _everything_ has been processed?). So, in node_tag_new() we should only collect the nids in a static array and then later on node_tag_new(NULL, 'update') we could execute the queries (maybe bunching them together in a more effective way, we also do one query per history check in node_last_viewed(), Dries how bad is that query?). This could be invoked in node_footer(). Cheers, Gerhard
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09 May 2005, at 10:13 AM, Gerhard Killesreiter wrote:
or do we only start to deliver the page after _everything_ has been processed?).
Bingo. There's only one print theme('page'); And at that point everything has already been processed. Except for perhaps the blocks, which get populated within theme_page() - -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFCfyzqgegMqdGlkasRAur2AKC/oHHA/avMf7akyhJhG/xXFqUxywCgr8Fj xMiupZtGh1sA/IGOJROypMQ= =UkqI -----END PGP SIGNATURE-----
On Mon, 9 May 2005, Adrian Rossouw wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09 May 2005, at 10:13 AM, Gerhard Killesreiter wrote:
or do we only start to deliver the page after _everything_ has been processed?).
Bingo.
*bummer*
There's only one print theme('page'); And at that point everything has already been processed. Except for perhaps the blocks, which get populated within theme_page()
Well, if the blocks would be processed seperately, that would be quite some gain. I suppose that for some pages the blocks take up more processing time than the main page. The ideal place for updating the history table would be in hook_exit(), but that would cause node.module to be loaded for cached pages. Alternatively, we could move the call to hook_footer into drupal_page_footer(). It does not seem to be used anywhere (added: the copyright module uses it). BTW, I'd like to have a hook_header back to avoid emitting extra CSS and JS in the _menu hook. I think I can rally support for this. :) Cheers, Gerhard
On 09 May 2005, at 11:07, Gerhard Killesreiter wrote:
Well, if the blocks would be processed seperately, that would be quite some gain. I suppose that for some pages the blocks take up more processing time than the main page.
The ideal place for updating the history table would be in hook_exit(), but that would cause node.module to be loaded for cached pages.
I'm not in favor of such a solution. It doesn't address the real problem. The real problem is the fact that the operations on the history-table perform poorly (compared to operations on other tables). One way to solve this is to lower the value of NODE_NEW_LIMIT (a timestamp) so the history gets pruned faster and the size of the table shrinks. However, similar problems exist with the watchdog, accesslog and sessions tables. They can grow unwieldy and without profiling your Drupal installation, this problem isn't easily identified. One solution might be to prune or rotate these tables based on the number of rows (table size) rather than by the information's lifetime. Like, only maintain the session information of the last 100k sessions, or limit the history table to 25k rows. Whether that behavior is desirable, I do not know, but given sensible defaults, I don't think it would be a problem. -- Dries Buytaert :: http://www.buytaert.net/
On Mon, May 09, 2005 at 11:07:12AM +0200, Gerhard Killesreiter wrote:
On Mon, 9 May 2005, Adrian Rossouw wrote:
On 09 May 2005, at 10:13 AM, Gerhard Killesreiter wrote:
or do we only start to deliver the page after _everything_ has been processed?). Bingo.
*bummer*
There's only one print theme('page'); And at that point everything has already been processed. Except for perhaps the blocks, which get populated within theme_page()
Well, if the blocks would be processed seperately, that would be quite some gain. I suppose that for some pages the blocks take up more processing time than the main page.
The ideal place for updating the history table would be in hook_exit(), but that would cause node.module to be loaded for cached pages.
Alternatively, we could move the call to hook_footer into drupal_page_footer(). It does not seem to be used anywhere (added: the copyright module uses it). BTW, I'd like to have a hook_header back to avoid emitting extra CSS and JS in the _menu hook. I think I can rally support for this. :)
Remember that hook_exit() is executed before the headers are sent by a call to drupal_goto() so it does still take time which is passed on to the user in some cases. However I don't think that will be an issue in this case since I think the history table is updated on viewing only and drupal_goto() happens only on some posts. -Neil
On Mon, 9 May 2005, Gerhard Killesreiter wrote:
On Mon, 9 May 2005, Dries Buytaert wrote:
Begin forwarded message:
From: Dries Buytaert <dries@buytaert.net> Date: Sat 7 May 2005 13:45:43 CEST
From profiling drupal.org, it seems that updating entries in (or inserting entries into) the history-table can be quite slow. Turns out the table it pretty big, so I'm not sure it is missing an index or two.
mysql> select count(*) from history; +----------+ | count(*) | +----------+ | 123805 | +----------+ 1 row in set (0.00 sec)
I spent the past 2 days keeping an eye on this, and it is certainly a performance bottleneck. Querying or updating the history table takes 6 ms whereas other SQL queries take about 0.2 - 0.3 ms on average. That is, we can do 20 to 30 other SQL queries in 6 ms. Something to think about.
We have two different implementations of checking the new-ness of a node wrt the current user: _forum_user_last_visit selects all history entries and populates a static array when first called. and node_last_viewed will execute a query per nid (and cache the result in a static array too). We should find out which one is faster, use only that, and remove the other function (the new function should be in node.module of course). Cheers, Gerhard
On 09 May 2005, at 13:05, Gerhard Killesreiter wrote:
We have two different implementations of checking the new-ness of a node wrt the current user:
_forum_user_last_visit selects all history entries and populates a static array when first called.
and
node_last_viewed will execute a query per nid (and cache the result in a static array too).
We should find out which one is faster, use only that, and remove the other function (the new function should be in node.module of course).
That or _forum_user_last_visit() should only load/cache information about the nodes in the relevant forum (i.e. add a left join). Even so, we risk doing some work twice. We'll need to do some profiling for this ... -- Dries Buytaert :: http://www.buytaert.net/
On 09 May 2005, at 18:55, Dries Buytaert wrote:
On 09 May 2005, at 13:05, Gerhard Killesreiter wrote:
We have two different implementations of checking the new-ness of a node wrt the current user:
_forum_user_last_visit selects all history entries and populates a static array when first called.
and
node_last_viewed will execute a query per nid (and cache the result in a static array too).
We should find out which one is faster, use only that, and remove the other function (the new function should be in node.module of course).
That or _forum_user_last_visit() should only load/cache information about the nodes in the relevant forum (i.e. add a left join). Even so, we risk doing some work twice. We'll need to do some profiling for this ...
I did some quick measurements. 1. If one has a small history (40 rows in the history table), _forum_user_last_visit() is faster than node_last_viewed(). 2. If one has a large history (550 rows in the history table), node_last_viewed() is slightly faster than _forum_user_last_visit(). The difference is less than 10 ms but most of the time _forum_user_last_visit() is the fastest. If we change the current behavior to tidy up or unify the code, performance is going to degrade slightly. Unless, of course we can improve the performance of node_last_viewed(). -- Dries Buytaert :: http://www.buytaert.net/
On Mon, 9 May 2005, Dries Buytaert wrote:
On 09 May 2005, at 18:55, Dries Buytaert wrote:
That or _forum_user_last_visit() should only load/cache information about the nodes in the relevant forum (i.e. add a left join). Even so, we risk doing some work twice. We'll need to do some profiling for this ...
I did some quick measurements.
:) I created an issue for this: http://drupal.org/node/22420
1. If one has a small history (40 rows in the history table), _forum_user_last_visit() is faster than node_last_viewed().
2. If one has a large history (550 rows in the history table), node_last_viewed() is slightly faster than _forum_user_last_visit().
This is 550 rows per user I assume?
The difference is less than 10 ms but most of the time _forum_user_last_visit() is the fastest. If we change the current behavior to tidy up or unify the code, performance is going to degrade slightly. Unless, of course we can improve the performance of node_last_viewed().
no idea how to achieve this. Cheers, Gerhard
I think a possible "solution" would be to put the updating of the history table towards the end of a page request. Currently, the table is updated from node_show. If it were updated (with all neccessery node views) at the end of the page request, it would probably not be faster, but the user experience would be better (or do we only start to deliver the page after _everything_ has been processed?).
So, in node_tag_new() we should only collect the nids in a static array and then later on node_tag_new(NULL, 'update') we could execute the queries (maybe bunching them together in a more effective way, we also do one query per history check in node_last_viewed(), Dries how bad is that query?). This could be invoked in node_footer().
We can easily do the updates in one query, since the history dates are always populated with time() for the complete nids viewed. This would tremendouly help a printer friendly book view, where multiple nids are completely displayed on the same page. Other pages might not gain much from this... Goba
participants (5)
-
Adrian Rossouw -
Dries Buytaert -
Gabor Hojtsy -
Gerhard Killesreiter -
neil@civicspacelabs.org