Development
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
May 2005
- 81 participants
- 503 discussions
Issue status update for http://drupal.org/node/22035
Project: Drupal
Version: cvs
Component: base system
Category: tasks
Priority: normal
Assigned to: Anonymous
Reported by: mikeryan
Updated by: Dries
Status: patch
Looks like in all cases, performance actually improves. I've run a
couple tests myself using the drupal.org database and measured
performance improvements up to 15%. Of course, this was done on a
machine where the MySQL database wasn't hammered on by 50+ concurrent
requests.
It would be appreciated if you could extend your patch with a couple
lines of documentation to document the behavior (so people understand
why we are doing it this way), the code and the API. Maybe add an
entry to CHANGELOG.txt? Do we need an update for updates.inc?
As for future work, I think we're better off looking at l(). As said,
one call to l() takes about 0.9 ms and we easily do 100+ such calls per
page. Caching at the level of l() is likely to buy us more but I don't
know whether it is feasible. Maybe we can think of other optimizations
too. If anything, it might be a good idea to add a TODO/NOTE to the
PHPdoc of l(). Or, if you feel like messing with l() first, by all
means! Count me in -- it's fun. :-)
Dries
Previous comments:
------------------------------------------------------------------------
May 5, 2005 - 03:10 : mikeryan
See Investigate use of conf_url_rewrite [1] for context...
The current core support for translating incoming path aliases to the
internal Drupal path (drupal_get_path_alias) and substituting aliases
when generating links to internal paths (drupal_get_normal_path) does
not scale well with many aliases. The ease with which pathauto enables
site administrators to generate large numbers of aliases exposes this
issue, but it is inherent in the core implementation, because
drupal_get_path_map reads in the entire url_alias table at bootstrap
time. I'd like to discuss ideas on how to improve the performance...
Most obviously, why not simply query the url_alias table as needed
instead of loading the whole table? In the incoming case, only a single
simple SELECT is necessary, which will always be more efficient than
reading the table. In the outgoing case, there might be some slight
performance advantage to caching the table with a small number of
aliases, but the disadvantage can become huge as the alias table grows.
One note - I've noticed that the src column in url_alias is not
indexed, I think adding an index should significantly help the
performance in the outgoing case if we were to do individual SELECTs.
Any other thoughts?
[1] http://drupal.org/node/21938
------------------------------------------------------------------------
May 5, 2005 - 04:02 : mikeryan
Attachment: http://drupal.org/files/issues/bootstrap.inc_2.patch (1.06 KB)
What the hell, I went ahead and gave it a shot... On my home system,
where I'm testing out 4.6.0, page loads have been taking several
seconds, which I attributed to the fact that it's an old computer and
I'm multi-tasking like crazy. I implemented my own suggestion, and now
pages load in about one second, it makes an incredible difference
(FWIW, my url_alias table has over 4000 rows).
Patches attached, go give it a try...
------------------------------------------------------------------------
May 5, 2005 - 04:02 : mikeryan
Attachment: http://drupal.org/files/issues/common.inc_11.patch (723 bytes)
One attachment per note, that's tedious...
------------------------------------------------------------------------
May 5, 2005 - 04:03 : mikeryan
Attachment: http://drupal.org/files/issues/path.module_0.patch (1.34 KB)
------------------------------------------------------------------------
May 5, 2005 - 04:03 : mikeryan
Attachment: http://drupal.org/files/issues/database.mysql_4.patch (553 bytes)
------------------------------------------------------------------------
May 5, 2005 - 04:04 : mikeryan
Attachment: http://drupal.org/files/issues/database.pgsql_2.patch (502 bytes)
------------------------------------------------------------------------
May 5, 2005 - 04:07 : mikeryan
Note: I just noticed that the database.*sql patches showed an extra diff
(not mine) to the location field in locales_source. I did my diffs
against a freshly-updated DRUPAL-4-6, and had made the edits against a
release from a few days ago... Those locales_source changes should
probably be removed from my patches before applying.
------------------------------------------------------------------------
May 8, 2005 - 01:13 : mikeryan
I upgraded Fenway Views [2] to Drupal 4.6.0 today, incorporating these
patches. Performance is very noticeably improved.
[2] http://fenway-views.com/
------------------------------------------------------------------------
May 8, 2005 - 15:31 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/pathalias.patch (4.98 KB)
I've merged the patch into one. Much more convenient. I also changed it
to cvs as only there new features will be added.
Upgrade path needs to be added.
Mike, can you run some more tests? We are especially interested in
"hard" data ie numbers with possibly error bars. it would be
interesting to know how this patch affects sites that have only a few
path aliases vs those with a lot of them. Also, how pages with a lot of
node links (tracker) are affected.
It might be worthwhile to add static caching in drupal_get_normal_path
if it gets calle dmore than once for a link on one page view.
------------------------------------------------------------------------
May 8, 2005 - 19:21 : mikeryan
Thanks for merging the patches, I wasn't aware you could patch multiple
files at once. If this is accepted, of course, adding the index on src
should be incorporated into updates.inc.
In terms of hard data, I've never profiled PHP code - what tool(s) do
you use? I couldn't find anything referenced in the PHP manual...
re: pages with lots of links - on Fenway Views the big test is the
calendar [3], and the performance improvement seems even more dramatic
here. I don't know why - the url_alias table is read once per page, so
I would expect as you do that preloading the table would tend to look
better when you have a hundred internal links on the page. Maybe we're
both underestimating exactly how fast a simple MySQL query on an
indexed key can be (I'm using MySQL 4.0.20, BTW)...
[3] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 8, 2005 - 19:26 : killes(a)www.drop.org
As we speak, Mathias is doing some tests. The tool of choice is usually
apache bench. If you have a lot of url aliases, the generated array
will be huge. Part of the improvement could be due to the fact that you
need less memory now.
------------------------------------------------------------------------
May 8, 2005 - 19:28 : mikeryan
Great, thanks!
------------------------------------------------------------------------
May 8, 2005 - 19:34 : Dries
On drupal.org it takes 800 ms or more to generate a page. Of these 800
ms, only 4 ms is spent building the path alias map (incl. the SQL query
time which takes about 1 ms). That is, building the map takes 0.5% of
the total time which is neglible. In our url_alias table are only 321
aliases though. How many aliases do you have?
What is more, when we build drupal.org's main page, we query this map
215 times. I believe your patch would introduce 215 SQL queries ...
I'm afraid that if we'd apply your patch, we'd pay a serious
performance penalty (unless we have many more aliases).
Can you provide some numbers/measurements?
------------------------------------------------------------------------
May 8, 2005 - 19:37 : Dries
Note: my measurements did not take into account the time spent searching
the map. On the main page, we search this array 215 times.
------------------------------------------------------------------------
May 8, 2005 - 20:08 : puregin
I believe that it's felt that adding path aliases would improve the
Drupal documentation. This may well add 400+ paths to the URL aliases
table on Drupal.org
------------------------------------------------------------------------
May 8, 2005 - 20:27 : Dries
I spent some more time investigating this.
The relevant code in bootstrap.inc is this:
function drupal_get_path_alias($path) {
if (($map = drupal_get_path_map()) && ($newpath = array_search($path,
$map))) {
return $newpath;
}
...
Turns out that the time to build the map is neglible (< 5ms, see
previous comments), however the total time spent on 'path aliasing' it
about 70ms per page! 50ms of these 70ms are spent on $map =
drupal_get_path_map(). The remaining 20ms is spent on $newpath =
array_search($path, $map).
The first call to drupal_get_path_map() takes 3 to 5 ms, each
subsequent call takes about 0.3 ms. Searching the array costs 0.1 ms.
However, if you have to do this 130+ times, this adds up to a whopping
70 ms. Remind that we have 321 aliases in our map.
As drupal_get_path_alias() is typically called from code>url(), which
in turn is typically called from l() I set out to investigate the time
spent in l(). Looks like l() easily gets called 120 times/page, and
that each call to l() costs about 0.9 ms. That is about 2 or 3 times
the cost of the /average SQL query/. The total time spent in l() is
115 ms!
------------------------------------------------------------------------
May 9, 2005 - 00:25 : adrian
This is a complete aside, and probably off-topic, but I would like to
mention that I was actually
looking at rewriting url() to be able to handle 'external urls' ,
because I want to do aliases using the site subdomain, like for
instance :
http://category.sitename.com/article/title_goes_here -> node/23
Rewriting url to allow this would also allow us to use l() for external
links of any kind, getting rid of a bunch of inline html.
------------------------------------------------------------------------
May 9, 2005 - 00:46 : chx
adrian, I have written a patch for l to support external links but Ber
said that a module like weblink shall handle those.
------------------------------------------------------------------------
May 9, 2005 - 01:03 : mikeryan
I've got over 4000 aliases on Fenway Views - another pathauto user
reported over 6000.
I don't have a profiling tool to give timing data, but adding a quick
counter shows that for the Fenway Views calendar [4],
drupal_get_path_alias() is called 404 times and drupal_get_normal_path
is called 227 times. And trust me, this page loads MUCH faster with my
patch than it does with the path map.
I think those SQL queries are a lot cheaper than one might expect...
[4] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 9, 2005 - 06:24 : mathias
Attachment: http://drupal.org/files/issues/pathalias-with-caching.patch (5.62 KB)
Some benchmarks, with MySQL and Drupal caching disabled.
2 SETS OF TESTS
=======================================
1) 500 nodes and aliases
2) 20,000 nodes and aliases
3 TYPES OF TESTS PER SET
=======================================
1) Baseline unmodified Drupal
2) The following change inside drupal_get_path_map():
- if (is_null($map)) {
+ if ($map === NULL) {
$map = array(); // Make $map non-null in case no aliases
are defined.
3) Pull aliases only when needed rather than loading the entire alias
table.
SET 1 - 500 Nodes and 500 Aliases
using: ab -c 10 -n 100 [homepage]
=======================================
Baseline
Time taken for tests: 13.00 seconds
Requests per second: 3.85 [#/sec] (mean)
Transfer rate: 56.35 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 12.463 seconds
Requests per second: 4.01 [#/sec] (mean)
Transfer rate: 57.63 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 11.502 seconds
Requests per second: 4.35 [#/sec] (mean)
Transfer rate: 63.30 [Kbytes/sec] received
SET 2 - 20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 127.629 seconds
Requests per second: 0.39 [#/sec] (mean)
Transfer rate: 5.35 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 28.788 seconds
Requests per second: 1.74 [#/sec] (mean)
Transfer rate: 23.59 [Kbytes/sec] received
ANALYSIS
=======================================
Converting 'is_null()' to '=== NULL' is a no brainer and results in a
nice performance gain. And while per instance alias lookups also give a
huge boost for site with thousands of aliases, they're benefits are
entirely dependent on the number of system URLs and menu items visible
per request. As noted above I tested the standard homepage which added
32 additional queries. A request for something like the admin interface
would presumably show less benefits. I would've tested this, but I
didn't know how to invoke an authenticated page request with 'ab'. A
positive of this approach is we no longer would be storing all the
aliases in memory.
What other concerns do we have with this last approach? Am I properly
testing the strain on the database?
------------------------------------------------------------------------
May 9, 2005 - 07:08 : Dries
Can you try making the following changes?
1. To common.inc:
function drupal_rebuild_path_map() {
- drupal_get_path_map('rebuild');
+ drupal_get_path_map(TRUE);
}
2. To bootstrap.inc:
-function drupal_get_path_map($action = '') {
+function drupal_get_path_map($rebuild = FALSE) {
static $map = NULL;
- if ($action == 'rebuild') {
+ if ($rebuild) {
$map = NULL;
}
That is, replace the string comparison with a boolean comparison. Not
sure it is going to make a significant difference but it might be
another micro-improvement.
I'll test your patch shortly.
------------------------------------------------------------------------
May 9, 2005 - 14:21 : mathias
Dries. Those changes had no real performance gain.
20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
String VS Boolean
Time taken for tests: 204.190 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
------------------------------------------------------------------------
May 9, 2005 - 15:01 : Dries
Before this patch can be committed, we need to do more testing. I'd
like to know how this behaves on sites with few or no path aliases, and
on sites like drupal.org, with a modest amount of path aliases.
The reason I'm asking is because MySQL is often the bottleneck and not
PHP/Apache. This is the case on drupal.org, for example. The proposed
patch moves some of the processing costs from PHP to MySQL. On
drupal.org, the amount of SQL queries/page would double. Needless to
say, this is somewhat scary. ;)
More numbers would be appreciated.
------------------------------------------------------------------------
May 10, 2005 - 04:56 : mikeryan
I'm curious, why test with MySQL caching disabled? Since much of the
issue is the expense of making (potentially many) more queries, this
wouldn't seem to reflect the performance gain in practice.
The caching (compared to making the query each time) seems like
overkill to me - with MySQL caching enabled, I would think this
complicates the code for little (if any) performance gain. Did you do
any profiling using my original (cacheless) patch?
Thanks.
------------------------------------------------------------------------
May 10, 2005 - 07:23 : mathias
In an ideal world everyone would be running a query cache but I wanted
to see how this would hold up under the worst-case scenario.
I tested the original patch which didn't use static variables and the
amount of queries doubled. For example, to load the front page, Drupal
queried the url_alias table 22 times looking for an alias for 'node'.
There's no need to send duplicate queries to the DB since it seems
that's where most bottlenecks seem to lie.
------------------------------------------------------------------------
May 11, 2005 - 00:41 : mikeryan
re: worst-case scenario - I understand that.
re: # queries - I would not want to assume adding to the number of
queries is necessarily a significant performance hit (with database
caching on, of course). I understand the suspicion - I've been around
long enough to remember when you did everything you could to avoid
making one single SQL query more than you absolutely had to. But modern
databases (including MySQL) are very well optimized for this kind of
application (lots and lots of small queries). And, I have seen
real-life cases where adding caching of data that's already being
cached somewhere upstream actually degrades performance.
As long as we're examining this particular area of performance under a
microscope, let's make sure we squeeze every millisecond of savings we
can out of it - I'd just like to see the numbers for a cached
database/no Drupal caching combination for comparison.
Thanks.
------------------------------------------------------------------------
May 11, 2005 - 07:12 : mathias
Dries asked me to do some tests using the drupal.org database. Once
again it appears that plucking aliases out of the database as needed
performed better than grabbing the entire alias table at the beginning
of the request lifecycle, even if the alias table is small. Here's the
numbers.
using: ab -c 5 -n 50 [homepage]
Drupal.org Baseline
=============
Time taken for tests: 18.081 seconds
Requests per second: 2.77 [#/sec] (mean)
Transfer rate: 52.69 [Kbytes/sec] received
1 page request was 83 queries in 60ms for a cycle execution time of
255ms
Drupal.org Per Instance Query
==================
Time taken for tests: 16.536 seconds
Requests per second: 3.02 [#/sec] (mean)
Transfer rate: 56.63 [Kbytes/sec] received
1 page request was 125 queries in 65ms for a cycle execution time of
230ms
Notice that the database does work a little harder to retrieve the
queries one by one. And it also doesn't make sense to run 40 extra
queries on a site with only 50 aliases.
I think at this point we have a couple of options to explore:
1. Continue to try and squeeze out any other optimizations we can in
the original path aliasing code. Maybe look at output buffering or a
mechnism that'll allow us to start working the database resultset as
soon data is fed into the pipe rather than waiting for the request to
finish.
2. Try to identify the point (e.g., the number aliases) at which it
becomes more effecient to fetch aliases and change patterns on the fly.
As with most optimizations, the end result will probably be a
combination of both.
1
0
Issue status update for http://drupal.org/node/22035
Project: Drupal
Version: cvs
Component: base system
Category: tasks
Priority: normal
Assigned to: Anonymous
Reported by: mikeryan
Updated by: mathias
Status: patch
Dries asked me to do some tests using the drupal.org database. Once
again it appears that plucking aliases out of the database as needed
performed better than grabbing the entire alias table at the beginning
of the request lifecycle, even if the alias table is small. Here's the
numbers.
using: ab -c 5 -n 50 [homepage]
Drupal.org Baseline
=============
Time taken for tests: 18.081 seconds
Requests per second: 2.77 [#/sec] (mean)
Transfer rate: 52.69 [Kbytes/sec] received
1 page request was 83 queries in 60ms for a cycle execution time of
255ms
Drupal.org Per Instance Query
==================
Time taken for tests: 16.536 seconds
Requests per second: 3.02 [#/sec] (mean)
Transfer rate: 56.63 [Kbytes/sec] received
1 page request was 125 queries in 65ms for a cycle execution time of
230ms
Notice that the database does work a little harder to retrieve the
queries one by one. And it also doesn't make sense to run 40 extra
queries on a site with only 50 aliases.
I think at this point we have a couple of options to explore:
1. Continue to try and squeeze out any other optimizations we can in
the original path aliasing code. Maybe look at output buffering or a
mechnism that'll allow us to start working the database resultset as
soon data is fed into the pipe rather than waiting for the request to
finish.
2. Try to identify the point (e.g., the number aliases) at which it
becomes more effecient to fetch aliases and change patterns on the fly.
As with most optimizations, the end result will probably be a
combination of both.
mathias
Previous comments:
------------------------------------------------------------------------
May 4, 2005 - 19:10 : mikeryan
See Investigate use of conf_url_rewrite [1] for context...
The current core support for translating incoming path aliases to the
internal Drupal path (drupal_get_path_alias) and substituting aliases
when generating links to internal paths (drupal_get_normal_path) does
not scale well with many aliases. The ease with which pathauto enables
site administrators to generate large numbers of aliases exposes this
issue, but it is inherent in the core implementation, because
drupal_get_path_map reads in the entire url_alias table at bootstrap
time. I'd like to discuss ideas on how to improve the performance...
Most obviously, why not simply query the url_alias table as needed
instead of loading the whole table? In the incoming case, only a single
simple SELECT is necessary, which will always be more efficient than
reading the table. In the outgoing case, there might be some slight
performance advantage to caching the table with a small number of
aliases, but the disadvantage can become huge as the alias table grows.
One note - I've noticed that the src column in url_alias is not
indexed, I think adding an index should significantly help the
performance in the outgoing case if we were to do individual SELECTs.
Any other thoughts?
[1] http://drupal.org/node/21938
------------------------------------------------------------------------
May 4, 2005 - 20:02 : mikeryan
Attachment: http://drupal.org/files/issues/bootstrap.inc_2.patch (1.06 KB)
What the hell, I went ahead and gave it a shot... On my home system,
where I'm testing out 4.6.0, page loads have been taking several
seconds, which I attributed to the fact that it's an old computer and
I'm multi-tasking like crazy. I implemented my own suggestion, and now
pages load in about one second, it makes an incredible difference
(FWIW, my url_alias table has over 4000 rows).
Patches attached, go give it a try...
------------------------------------------------------------------------
May 4, 2005 - 20:02 : mikeryan
Attachment: http://drupal.org/files/issues/common.inc_11.patch (723 bytes)
One attachment per note, that's tedious...
------------------------------------------------------------------------
May 4, 2005 - 20:03 : mikeryan
Attachment: http://drupal.org/files/issues/path.module_0.patch (1.34 KB)
------------------------------------------------------------------------
May 4, 2005 - 20:03 : mikeryan
Attachment: http://drupal.org/files/issues/database.mysql_4.patch (553 bytes)
------------------------------------------------------------------------
May 4, 2005 - 20:04 : mikeryan
Attachment: http://drupal.org/files/issues/database.pgsql_2.patch (502 bytes)
------------------------------------------------------------------------
May 4, 2005 - 20:07 : mikeryan
Note: I just noticed that the database.*sql patches showed an extra diff
(not mine) to the location field in locales_source. I did my diffs
against a freshly-updated DRUPAL-4-6, and had made the edits against a
release from a few days ago... Those locales_source changes should
probably be removed from my patches before applying.
------------------------------------------------------------------------
May 7, 2005 - 17:13 : mikeryan
I upgraded Fenway Views [2] to Drupal 4.6.0 today, incorporating these
patches. Performance is very noticeably improved.
[2] http://fenway-views.com/
------------------------------------------------------------------------
May 8, 2005 - 07:31 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/pathalias.patch (4.98 KB)
I've merged the patch into one. Much more convenient. I also changed it
to cvs as only there new features will be added.
Upgrade path needs to be added.
Mike, can you run some more tests? We are especially interested in
"hard" data ie numbers with possibly error bars. it would be
interesting to know how this patch affects sites that have only a few
path aliases vs those with a lot of them. Also, how pages with a lot of
node links (tracker) are affected.
It might be worthwhile to add static caching in drupal_get_normal_path
if it gets calle dmore than once for a link on one page view.
------------------------------------------------------------------------
May 8, 2005 - 11:21 : mikeryan
Thanks for merging the patches, I wasn't aware you could patch multiple
files at once. If this is accepted, of course, adding the index on src
should be incorporated into updates.inc.
In terms of hard data, I've never profiled PHP code - what tool(s) do
you use? I couldn't find anything referenced in the PHP manual...
re: pages with lots of links - on Fenway Views the big test is the
calendar [3], and the performance improvement seems even more dramatic
here. I don't know why - the url_alias table is read once per page, so
I would expect as you do that preloading the table would tend to look
better when you have a hundred internal links on the page. Maybe we're
both underestimating exactly how fast a simple MySQL query on an
indexed key can be (I'm using MySQL 4.0.20, BTW)...
[3] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 8, 2005 - 11:26 : killes(a)www.drop.org
As we speak, Mathias is doing some tests. The tool of choice is usually
apache bench. If you have a lot of url aliases, the generated array
will be huge. Part of the improvement could be due to the fact that you
need less memory now.
------------------------------------------------------------------------
May 8, 2005 - 11:28 : mikeryan
Great, thanks!
------------------------------------------------------------------------
May 8, 2005 - 11:34 : Dries
On drupal.org it takes 800 ms or more to generate a page. Of these 800
ms, only 4 ms is spent building the path alias map (incl. the SQL query
time which takes about 1 ms). That is, building the map takes 0.5% of
the total time which is neglible. In our url_alias table are only 321
aliases though. How many aliases do you have?
What is more, when we build drupal.org's main page, we query this map
215 times. I believe your patch would introduce 215 SQL queries ...
I'm afraid that if we'd apply your patch, we'd pay a serious
performance penalty (unless we have many more aliases).
Can you provide some numbers/measurements?
------------------------------------------------------------------------
May 8, 2005 - 11:37 : Dries
Note: my measurements did not take into account the time spent searching
the map. On the main page, we search this array 215 times.
------------------------------------------------------------------------
May 8, 2005 - 12:08 : puregin
I believe that it's felt that adding path aliases would improve the
Drupal documentation. This may well add 400+ paths to the URL aliases
table on Drupal.org
------------------------------------------------------------------------
May 8, 2005 - 12:27 : Dries
I spent some more time investigating this.
The relevant code in bootstrap.inc is this:
function drupal_get_path_alias($path) {
if (($map = drupal_get_path_map()) && ($newpath = array_search($path,
$map))) {
return $newpath;
}
...
Turns out that the time to build the map is neglible (< 5ms, see
previous comments), however the total time spent on 'path aliasing' it
about 70ms per page! 50ms of these 70ms are spent on $map =
drupal_get_path_map(). The remaining 20ms is spent on $newpath =
array_search($path, $map).
The first call to drupal_get_path_map() takes 3 to 5 ms, each
subsequent call takes about 0.3 ms. Searching the array costs 0.1 ms.
However, if you have to do this 130+ times, this adds up to a whopping
70 ms. Remind that we have 321 aliases in our map.
As drupal_get_path_alias() is typically called from code>url(), which
in turn is typically called from l() I set out to investigate the time
spent in l(). Looks like l() easily gets called 120 times/page, and
that each call to l() costs about 0.9 ms. That is about 2 or 3 times
the cost of the /average SQL query/. The total time spent in l() is
115 ms!
------------------------------------------------------------------------
May 8, 2005 - 16:25 : adrian
This is a complete aside, and probably off-topic, but I would like to
mention that I was actually
looking at rewriting url() to be able to handle 'external urls' ,
because I want to do aliases using the site subdomain, like for
instance :
http://category.sitename.com/article/title_goes_here -> node/23
Rewriting url to allow this would also allow us to use l() for external
links of any kind, getting rid of a bunch of inline html.
------------------------------------------------------------------------
May 8, 2005 - 16:46 : chx
adrian, I have written a patch for l to support external links but Ber
said that a module like weblink shall handle those.
------------------------------------------------------------------------
May 8, 2005 - 17:03 : mikeryan
I've got over 4000 aliases on Fenway Views - another pathauto user
reported over 6000.
I don't have a profiling tool to give timing data, but adding a quick
counter shows that for the Fenway Views calendar [4],
drupal_get_path_alias() is called 404 times and drupal_get_normal_path
is called 227 times. And trust me, this page loads MUCH faster with my
patch than it does with the path map.
I think those SQL queries are a lot cheaper than one might expect...
[4] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 8, 2005 - 22:24 : mathias
Attachment: http://drupal.org/files/issues/pathalias-with-caching.patch (5.62 KB)
Some benchmarks, with MySQL and Drupal caching disabled.
2 SETS OF TESTS
=======================================
1) 500 nodes and aliases
2) 20,000 nodes and aliases
3 TYPES OF TESTS PER SET
=======================================
1) Baseline unmodified Drupal
2) The following change inside drupal_get_path_map():
- if (is_null($map)) {
+ if ($map === NULL) {
$map = array(); // Make $map non-null in case no aliases
are defined.
3) Pull aliases only when needed rather than loading the entire alias
table.
SET 1 - 500 Nodes and 500 Aliases
using: ab -c 10 -n 100 [homepage]
=======================================
Baseline
Time taken for tests: 13.00 seconds
Requests per second: 3.85 [#/sec] (mean)
Transfer rate: 56.35 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 12.463 seconds
Requests per second: 4.01 [#/sec] (mean)
Transfer rate: 57.63 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 11.502 seconds
Requests per second: 4.35 [#/sec] (mean)
Transfer rate: 63.30 [Kbytes/sec] received
SET 2 - 20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 127.629 seconds
Requests per second: 0.39 [#/sec] (mean)
Transfer rate: 5.35 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 28.788 seconds
Requests per second: 1.74 [#/sec] (mean)
Transfer rate: 23.59 [Kbytes/sec] received
ANALYSIS
=======================================
Converting 'is_null()' to '=== NULL' is a no brainer and results in a
nice performance gain. And while per instance alias lookups also give a
huge boost for site with thousands of aliases, they're benefits are
entirely dependent on the number of system URLs and menu items visible
per request. As noted above I tested the standard homepage which added
32 additional queries. A request for something like the admin interface
would presumably show less benefits. I would've tested this, but I
didn't know how to invoke an authenticated page request with 'ab'. A
positive of this approach is we no longer would be storing all the
aliases in memory.
What other concerns do we have with this last approach? Am I properly
testing the strain on the database?
------------------------------------------------------------------------
May 8, 2005 - 23:08 : Dries
Can you try making the following changes?
1. To common.inc:
function drupal_rebuild_path_map() {
- drupal_get_path_map('rebuild');
+ drupal_get_path_map(TRUE);
}
2. To bootstrap.inc:
-function drupal_get_path_map($action = '') {
+function drupal_get_path_map($rebuild = FALSE) {
static $map = NULL;
- if ($action == 'rebuild') {
+ if ($rebuild) {
$map = NULL;
}
That is, replace the string comparison with a boolean comparison. Not
sure it is going to make a significant difference but it might be
another micro-improvement.
I'll test your patch shortly.
------------------------------------------------------------------------
May 9, 2005 - 06:21 : mathias
Dries. Those changes had no real performance gain.
20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
String VS Boolean
Time taken for tests: 204.190 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
------------------------------------------------------------------------
May 9, 2005 - 07:01 : Dries
Before this patch can be committed, we need to do more testing. I'd
like to know how this behaves on sites with few or no path aliases, and
on sites like drupal.org, with a modest amount of path aliases.
The reason I'm asking is because MySQL is often the bottleneck and not
PHP/Apache. This is the case on drupal.org, for example. The proposed
patch moves some of the processing costs from PHP to MySQL. On
drupal.org, the amount of SQL queries/page would double. Needless to
say, this is somewhat scary. ;)
More numbers would be appreciated.
------------------------------------------------------------------------
May 9, 2005 - 20:56 : mikeryan
I'm curious, why test with MySQL caching disabled? Since much of the
issue is the expense of making (potentially many) more queries, this
wouldn't seem to reflect the performance gain in practice.
The caching (compared to making the query each time) seems like
overkill to me - with MySQL caching enabled, I would think this
complicates the code for little (if any) performance gain. Did you do
any profiling using my original (cacheless) patch?
Thanks.
------------------------------------------------------------------------
May 9, 2005 - 23:23 : mathias
In an ideal world everyone would be running a query cache but I wanted
to see how this would hold up under the worst-case scenario.
I tested the original patch which didn't use static variables and the
amount of queries doubled. For example, to load the front page, Drupal
queried the url_alias table 22 times looking for an alias for 'node'.
There's no need to send duplicate queries to the DB since it seems
that's where most bottlenecks seem to lie.
------------------------------------------------------------------------
May 10, 2005 - 16:41 : mikeryan
re: worst-case scenario - I understand that.
re: # queries - I would not want to assume adding to the number of
queries is necessarily a significant performance hit (with database
caching on, of course). I understand the suspicion - I've been around
long enough to remember when you did everything you could to avoid
making one single SQL query more than you absolutely had to. But modern
databases (including MySQL) are very well optimized for this kind of
application (lots and lots of small queries). And, I have seen
real-life cases where adding caching of data that's already being
cached somewhere upstream actually degrades performance.
As long as we're examining this particular area of performance under a
microscope, let's make sure we squeeze every millisecond of savings we
can out of it - I'd just like to see the numbers for a cached
database/no Drupal caching combination for comparison.
Thanks.
1
0
Issue status update for http://drupal.org/node/7582
Project: Drupal
Version: cvs
Component: node system
Category: bug reports
Priority: normal
Assigned to: killes(a)www.drop.org
Reported by: killes(a)www.drop.org
Updated by: killes(a)www.drop.org
Status: patch
Attachment: http://drupal.org/files/issues/revisions_29.patch (54.83 KB)
Ok, here I am again.
What I did:
1) Ask Dries to let me have drupal.org database
2) get 400MB of SQL inserts...
3) take 23 minutes to import said data
4) Remove all image and project nodes (don't want to install their
modules), 11765 nodes left
5) back up data
6) take tests on non-cached /node page (as anonymous user).
ab results:
-c 1 -n 25:
Requests per second: 1.29 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Total: 663 775 179.7 689 1264
7) Do the same for the tracker page:
Requests per second: 0.83 [#/sec] (mean)
Total: 1182 1199 7.4 1199 1217
8) Apply my patch (rev. 28).
9) run db update and hold breath
10) update times out...
11) play back backup from 5)
12) wait
13) getting annoyed and removing cache, watchdog, and accesslog before
playing in backup.
14) wait again. Understand why Dries doesn't try this patch often.
Maybe a smaller DB would do for testing?
15) wait more. get really annoyed.
16) Set time limit to 18000 in update.php
17) try again
18) fails again before the second update is completed.
19) curse.
20) delete search stuff from db. Ooops, sooo much smaller...
21) import again, below 2 minutes...
22) rewrite to use extended insert. Found a bug.
23) still does not complete. Mysql logging to the rescue!
24) tid = 0? Not good.
25) Well, the update works fine till node 10834. 5595 nodes done, 6136
to go.
26) Writing shell based update script. Discovery: 24MB aren't enough.
Hopefully 64 are. Nope.
extended inserts for revisions are apparently not the brightest idea:
Huge memory consumption.
Hmm, no, all updates got through. Selecting the revisions to put them
into old_revisions table screwed it. Learned about CREATE TABLE
old_revisions SELECT syntax.
Yay! finally. 24 MB are just not enough the update.php script seems to
still break.
27) Benchmarks!
/node
Requests per second: 1.54 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Total: 632 649 40.5 636 791
/tracker
Requests per second: 0.86 [#/sec] (mean)
Total: 1119 1165 65.8 1160 1461
Ok: So we get an improvement for many node_loads, but none for simple
selects from node.
More tests can be done.
28) roll new patch
Ain't Drupal fun?
killes(a)www.drop.org
Previous comments:
------------------------------------------------------------------------
May 5, 2004 - 18:25 : killes(a)www.drop.org
Currently all node revisions are stored in a serialized field in
node.table and retrieved for _each_ page view although they are rarely
needed. However, we have agreed that serializing data is bad and that
we should try to keep the memory foot print pf Drupal small.
Therefore I propose to create a separate revisions table which would be
in principle identical to the node table, only that it could have
several old copies of the same node. Extra data added by other modules
could be added in a serialized field unless we find a better solution.
------------------------------------------------------------------------
May 5, 2004 - 19:06 : jhriggs
I too think the serialized approach is less than desirable, but here's
an alternative. This would likely take some considerable rework in
core and contrib, but the following is how we handle similar types of
situations in our databases at work. It is more elegant that a
separate table, and avoids the (almost exact) duplication of a table.
Instead of separate tables, keep all revisions of nodes in the node
table as follows:
* add field: active (0/1 or Y/N)
* add field: revision
* every revision of a node is stored in the node table; however, only
one revision can be active at any given time
* nid can no longer be unique -- primary/unique key becomes (nid,
active)
* any time a node is loaded, updated (without revision), etc., the
active version is used.
Thoughts?
------------------------------------------------------------------------
May 5, 2004 - 19:57 : killes(a)www.drop.org
I am not opposed to your scheme, but I want to stress the following:
* Duplicating a table's structure is not bad (IMHO) as long as the
content is different.
* having two tables will allow us to have a rather small node table.
This is (maybe) a performance gain.
------------------------------------------------------------------------
May 5, 2004 - 20:37 : jhriggs
I don't necessarily think that duplicating a table's structure is _bad_.
It just seems to be wasteful and a pain to maintain. (Every change to
the node table is made twice...easy to do, but also easy to miss
perhaps.)
As for performance, as long as nid and the active indicator are
indexed, there shouldn't be any performance loss. Also, archiving an
old version when making a new revision will be much simpler: just
change the active indicator rather than copying an entire node to
another table (and ensuring everything gets copied...again a potential
maintainance issue).
To be honest, I would just like to see the serialized data go away,
regarless of what approach is taken.
------------------------------------------------------------------------
July 30, 2004 - 21:49 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_07-30-2004.p… (10.47 KB)
I'm interested in using Drupal for a large scale wiki-type project. In
order to do this, I need revisions to be in their own table.
Attached is a patch to do just that. Most of the changes are pretty
self explanatory. Spreading out node data across two tables meant that
I had to add database functions to do locking/transactions. Without
this, race conditions in which the database becomes corrupted are
possible.
------------------------------------------------------------------------
July 30, 2004 - 21:54 : Nick Nassar
Oh yeah... The patch is a diff against Drupal CVS
------------------------------------------------------------------------
July 31, 2004 - 02:00 : Anonymous
Gerhard speaking.
Nick, thanks a lot for your nice patch! It saves me a great deal of
labour. I looked through it and immediately liked it. You not only put
the old revisions into a new table but also the current one. Do you
have an estimate how much more expensive the additional join is?
Besides a few minor coding style issues I found a major one: Just a few
hours before you uploaded your patch JonBob's node access patch hit
core. That means your patch won't apply anymore as all the queries you
change have been changed. Can I bug you to update your patch?
------------------------------------------------------------------------
July 31, 2004 - 03:11 : Anonymous
Also I think that your upgrade path loses existing revisions.
------------------------------------------------------------------------
July 31, 2004 - 04:39 : drumm
I think this is the proper way to do things. No columns are duplicated,
there is no serialized data, and only the fields that are logically
revised are stored. Nothing jumped out at me as a way to have my node
modules be able to keep a table of revisions of additional fields. I'm
guessing this could be done within the confines of _insert and _update.
Assuming the upgrade path works and modules can extend it I give it a
+1.
------------------------------------------------------------------------
July 31, 2004 - 16:40 : Nick Nassar
It figures that just as I finish a big patch, another patch comes along
and breaks it. Oh well, it should be a pretty easy to fix. I'll work on
it.
Fixing the upgrade path to keep revisions should be fairly painless.
I found another issue that needs to be fixed before this patch gets
merged. There format of a node needs to be stored for each revision.
Otherwise, for modules that store a format for the nodes, such as page
and book, if you write one revision in PHP and the next in HTML, the
PHP revision will be displayed as HTML. This is part of a larger issue
of how node modules should store revisions of additional fields. I
think each module that wants to do this should create another table
with (nid, revid) as the primary key. Just as when they want to add
fields to a node they create another table with nid as the primary key.
As far as performance goes, for sites that make heavy use of revisions,
an extra join on primary keys is going to be a lot faster than grabbing
all of the revisions from that database everytime. We would need to run
benchmarks to determine is the overall difference in speed is for an
average site is a gain or a loss. I'm guessing it's very minor either
way.
------------------------------------------------------------------------
August 23, 2004 - 15:55 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_08-23-2004.p… (10.92 KB)
Here's an updated patch against CVS that puts revisions in their own
table, provides an upgrade path, and fixes the format related bugs in
the last patch.
Hopefully, this can make it into CVS as soon as the freeze is over.
------------------------------------------------------------------------
August 23, 2004 - 16:10 : moshe weitzman
Interesting patch ... drumm's question is still outstanding. how do
modules store revisions of their fields? Are they expected to manage
this on their own? Thats not how it works today.
As an aside, i am seeing profile_ fields in my node.revisions column.
One could argue that those need not be saved. They pertain to the node
author, not to the node itself.
------------------------------------------------------------------------
August 23, 2004 - 18:14 : Nick Nassar
Having modules be responsible for storing revisions of their own fields
is a side-effect of storing revision data in tables. There's really no
way around it. However, revisions generally don't make sense for node
types that don't have PHP/HTML content, such as polls. I think it's
going to be a pretty rare scenario for a new node type to want another
field to change per-revision, so it's a pretty good trade-off.
Storing fields that shouldn't be part of revisions, such as the
profile_ fields, is a side-effect of storing revisions as serialized
objects. Applying this patch will free up that wasted space. :)
------------------------------------------------------------------------
August 23, 2004 - 19:20 : Anonymous
There should be a hook that let's the module choose whether it supports
history. This way a module author can prevent the user from doing
something that may break his module or just cause undefined behavior.
If the module doesn't support history then don't let the user/admin
choose to add history to nodes of that type.
Craig
------------------------------------------------------------------------
August 23, 2004 - 21:23 : Nick Nassar
I agree, there should be an API change to make specifying support for
revisions easier. In the interests of keeping patches small and keeping
to one change per patch, I think the API change should be a separate
issue.
A sort of ad-hoc API to decide whether or not a module supports
revisions by default already exists. Instead of having a hook, modules
set the default value of the "Create new revision" field in the edit
form. The admin can change this option in
admin/node/configure/defaults. This patch doesn't change that.
Revisions are broken for node types that have their own database
structure, like polls, even when storing them as serialized objects.
This patch doesn't change that, either.
------------------------------------------------------------------------
October 26, 2004 - 04:35 : moshe weitzman
I'm guessing that someone is going to have to demonstrate that this
patch performs as well as current drupal before it gets comitted. i
think this patch is a few benchmarks from being comitted.
------------------------------------------------------------------------
October 27, 2004 - 03:04 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 03:05 : Nick Nassar
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 03:05 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
November 15, 2004 - 07:05 : elias1884
please overthink the revision system default workflow as well. don't
look at the revision system as an isolated system but as a part of the
whole workflow system!
if you combine revisions with the moderation queue the most logic
default workflow would be like that:
auth user creates node (revision #0)
admin approves the node (status = 1, moderation = 0)
=> node publicly available
auth user finds typo and changes node (revision #1, status = 0,
moderation = 1)
-------
what happens at that point at the moment is, that the node is not
accessible anymore at all until the new revision is approved by admin.
of course the new revision should not go online until reviewed and
approved, this is absolutely correct, but there is no reason to not
take the old revision offline, since it was already approved and should
therefore be online until the new revision is approved. it is not
practical if a node disappears only because the author corrected a
typo.
-------
admin approves the node (status = 1, moderation = 0)
eventhough I first thought a plain boolean active field would not be
capable of providing that functionality if finally came to the
conclusion, that it can. The only thing to do is to not set that bit,
when a new revision is created, but when it is approved (in case
moderation is activated under default workflow). Every revision should
have its own moderation, status and active field and on approval they
are set like this (status=1, moderation=0, active=Y).
When you wanna rollback to an old revision, you can chose between all
revisions that already have the moderation bit set back to 0 again and
the published to 1. There should be an extra permission for rollback!
another concern that I have about the default workflow is, that users
can't see the content, they have just created, when moderation is
enabled. Eventhough, there is a big fat "submission accepted" presented
after submissions, unexperienced users tend to question the information
those stupid tincans give them, if they can't find their content
afterwards. Many users are really lazy bastards and they don't even
read the status messages. The best feedback about whether his story was
submitted successfully or not of course is, if he can find the story
somewhere on the site, maybe with a status message on top of it,
mentioning, that the content is currently not publicly available since
it has not been approved yet. there should be a my content section
under my account, like somebody is trying to do with the workspace
module I guess.
so my suggestion is to make (status=0, moderation=1) still available
for the creator under a my content section somewhere!
------------------------------------------------------------------------
November 24, 2004 - 06:21 : Nick Nassar
I agree. The current workflow for moderation queues and revisions needs
to change, but this patch isn't the place for it. The patch is already
too big, and it only does the backend stuff.
Instead of adding more to this patch and making it take even longer to
get into core, would you mind creating a new issue for your UI
suggestions, so the those changes can be added as a separate patch?
Thanks,
Nick
------------------------------------------------------------------------
December 11, 2004 - 14:26 : Dries
This patch is _much_ needed so I'd love to see someone revive it. In
order for this patch to be accepted, the following needs to be done:
Update this patch to CVS HEAD.
Rename revid to vid.
Rename node_rev to node_revisions.
Rename node_rev.changed to node_revisions.timestamp.
Rename $rnode to $revision.
Fix the coding style to match Drupal's: proper spacing, single quotes
where possible, proper variable names.
Benchmark this patch with a large database with enough revisions. I'd
be happy to benchmark this on my local copy of the drupal.org database.
The book.log field should probably move to the node_revisions table.
This can be done in a separate patch.
Investigate whether transactions are well-supported.
------------------------------------------------------------------------
December 13, 2004 - 02:25 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004-r… (11.02 KB)
I've worked a bit on the patch (coding style issues as mentioned by
Dries). One thing I noticed is that the patch uses REPLACE. IIRC this
needs to be chagned to "UPDATE, if fail INSERT" for pgsql
compatibility.
Nick, are you still interested in working on that patch? I'd like to
know how it works on your site and work on getting it into core.
------------------------------------------------------------------------
December 13, 2004 - 14:33 : Dries
Gerhard: your patch does not apply.
------------------------------------------------------------------------
December 13, 2004 - 19:10 : killes(a)www.drop.org
Yes, I know, that was the same version as I mailed to you earlier.
------------------------------------------------------------------------
December 13, 2004 - 23:02 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions.patch (52.96 KB)
Ok, upüdated the patch to cvs.
------------------------------------------------------------------------
December 14, 2004 - 10:58 : Dries
Some more comments:
db_begin_transaction() and db_end_transaction() do not belong in
database.inc, but in database.mysql.inc and database.pgsql.inc
respectively.
The node module calls node_revisionsision_list() which is not defined.
(Fxed that on my local copy.)
Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's
table locking patch?
The upgrade path assigns the wrong user ID to each revision.
The upgrade path assigns the wrong date to each revision (that or a
node's revision page shows the wrong usernames/dates).
The coding style needs a bit of work, but we can worry about that
later.
------------------------------------------------------------------------
December 14, 2004 - 19:34 : Nick Nassar
If you need any help getting those things fixed, just let me know.
------------------------------------------------------------------------
December 14, 2004 - 19:50 : Nick Nassar
How this relates to Jeremy's node locking patch:
There was lots of discussion, and node locking was decided against
because from an end user point of view you never want a node to be
locked. He's now advocating for a much simpler patch that warns users
if their changes will overwrite someone elses. That patch still has a
race condition, which might be fixed using db_begin_transaction().
http://drupal.org/node/6025
------------------------------------------------------------------------
December 15, 2004 - 00:26 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_0.patch (55.96 KB)
Here is an updated patch that tries to address Dries concerns.
------------------------------------------------------------------------
December 15, 2004 - 10:32 : Dries
Attachment: http://drupal.org/files/issues/revisions-bug.png (76.06 KB)
It didn't fix the aforementioned bugs. See attached screenshot.
------------------------------------------------------------------------
January 6, 2005 - 22:15 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_1.patch (51.77 KB)
Ok, here is a new version. Dries and myself worked hart at it, so please
have a look.
what is still missing
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
------------------------------------------------------------------------
January 19, 2005 - 23:43 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_2.patch (49.49 KB)
Here is an updated patch. We discussed to keep the current title in node
module and also in the revisiosn table. This is content duplication but
will save many joins as many queries only need the title of a node.
Discussion is welcome.
------------------------------------------------------------------------
January 20, 2005 - 01:33 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_3.patch (29.93 KB)
I've implemented the aforementioned solution. This makes the patch much
smaller. The patch now also removes taxonomy_node_has_term() which
wasn't used anywhere. I'd really apprciate if some people could test
drive the patch. It will be another huge improvement for 4.6.
------------------------------------------------------------------------
January 20, 2005 - 02:05 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_4.patch (30 KB)
Another revision. Steven didn't like my literal $node->vid in queries.
------------------------------------------------------------------------
January 20, 2005 - 03:10 : killes(a)www.drop.org
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
These issues are still open, btw. Especially the first one needs to be
tackled.
------------------------------------------------------------------------
January 25, 2005 - 22:11 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_5.patch (51.13 KB)
Here is a patch that has the database tables updated for forum, book,
and page module.
------------------------------------------------------------------------
January 30, 2005 - 00:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_6.patch (49.18 KB)
Yet another update to keep it working with head. The patch now also
removes the table definitons for the page table.
------------------------------------------------------------------------
January 30, 2005 - 00:57 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_7.patch (55.69 KB)
Sorry, that was the old version, this is the right one.
------------------------------------------------------------------------
January 31, 2005 - 21:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_8.patch (55.71 KB)
Updated once more.
------------------------------------------------------------------------
January 31, 2005 - 22:52 : Dries
Anyone to help review/test this?
------------------------------------------------------------------------
January 31, 2005 - 23:22 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_9.patch (49.29 KB)
Updated again, the update functions occurred twice. Thanks Bart.
------------------------------------------------------------------------
February 2, 2005 - 02:27 : killes(a)www.drop.org
Don't know if the db I am using is corrupted or what. I still do have
some didficulties.
The latest patch is attached.
------------------------------------------------------------------------
February 2, 2005 - 02:27 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_10.patch (49.67 KB)
I am probably slowly going mad ...
------------------------------------------------------------------------
February 2, 2005 - 03:54 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_11.patch (48.95 KB)
The update issue still needs investigating. This patch is updated for
cvs.
------------------------------------------------------------------------
February 2, 2005 - 22:20 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_12.patch (49.83 KB)
Ok, here is a new version. I've solved my troubles with book.module.
There are still some issues with forum module. Possibly due to
inconsistent database.
------------------------------------------------------------------------
February 2, 2005 - 23:31 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_13.patch (49.83 KB)
Turns out the drupal.org database had indeed some quirks. Please run
this query in your oldest db and tell me the result:
select nid,type from node where type like '%/%';
If you get a non-zero result we might need to add another security
update.
The patch could use still more testing, though.
------------------------------------------------------------------------
February 3, 2005 - 03:16 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_14.patch (49 KB)
Ok, we are getting somewhere. At a first glance the update is working.
There is a problem remaining: the revisions tab will be shown whether
the node has revisions or not. Not sure we can/need to fix this.
People with a drupal.org account can log in at
http://killes.drupaldevs.org/revision/ and poke around. Your
permissions will be the same as on drupal.org. Feel free to vreak
everything but don't forget to file complaints here. (Note: this is
only a pruned version of the drupal.org database with all project nodes
and nodes with nids > 7000 dropped).
------------------------------------------------------------------------
February 3, 2005 - 06:19 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_15.patch (52.39 KB)
There was some error in node_save and also the patches to the
database.inc files got lost...
------------------------------------------------------------------------
February 3, 2005 - 09:07 : robertDouglass
Submitting book pages doesn't work on your test site. It puts the entire
content of the preview inside the body textarea. I wrote a sentence in
the body and the log, and pressing preview put several lines of HTML
containing both sentences in the body textarea on the preview page,
plus the book page wouldn't submit.
-R
------------------------------------------------------------------------
February 3, 2005 - 09:50 : Junyor
0 results here. I started using Drupal with version 4.4, though.
------------------------------------------------------------------------
February 4, 2005 - 01:56 : killes(a)www.drop.org
@Junyor: Thanks, that's a good sign. Maybe somebody else has an older db
to try.
@robertDouglass: The first effect you describe is due to drupaldevs
running on PHP 5. I am unsure why the second thing does not work. In
node_save() the node object has a nid although there is none in the
form. Very strange.
I've enabled display of db queries on the testsite.
------------------------------------------------------------------------
February 4, 2005 - 21:17 : dmjossel
No results here on the query:
select nid,type from node where type like '%/%';
On a database that was put in place prior to Drupal 4 and is now
running on 4.5.2.
------------------------------------------------------------------------
February 4, 2005 - 22:44 : killes(a)www.drop.org
@dmjossel: thanks.
@all. The strange problem I reported was apparently php 5 related.
After applying Steven's php 5 patch it went away. One error is
remaining: If I create a new forum topic it will be shown as part of
the book on preview. Hmm, that was due to a db that got corrupted
during testing so that is fixed too.
Please poke around at the test site and look if you find more errors.
------------------------------------------------------------------------
February 5, 2005 - 09:16 : Steven
By the way, I just remembered that Drupal.org has some blogs lingering
on in the database even though blog.module is not enabled. Perhaps this
is causing troubles?
------------------------------------------------------------------------
February 5, 2005 - 13:22 : Anonymous
I can't see why it would. Drupal.org will need extra updates for images
and project nodes because those have their own tables. GK.
------------------------------------------------------------------------
February 6, 2005 - 14:49 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_16.patch (52.49 KB)
Updated to apply to cvs again.
------------------------------------------------------------------------
February 22, 2005 - 22:15 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_17.patch (49.64 KB)
Updated again.
All we need is a patch to upload module and an upgrade path for it.
------------------------------------------------------------------------
March 4, 2005 - 06:22 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_18.patch (52.31 KB)
Updated once more. Moved log field from book to node_revisions table as
discussed in Antwerp. upload module still missing.
We need to decide under which circumstances the log field should be
displayed. Should that be added to the workflow? Should it depend on
the revisions setting?
------------------------------------------------------------------------
March 5, 2005 - 21:27 : Anonymous
Attachment: http://drupal.org/files/issues/revisions_20.patch (75.52 KB)
Ok, here it is: Yet another revision of this grrrrreat patch.
Changes from previous versions:
- supports versioning for uploaded files. A problem is that if you
delete a file, it will be gone for all revisions.
- the log field is now in the node_revsions table, but each module has
to decide whether to show it or not.
I've implemented it for the page and the book type odes. Also, the
field can be edited when adding non-book nodes to the book. The log is
displayed on the revisions page and if a node is moderated.
- the revisions are moved to an old_revisions table to a) get the node
table smaller and b) still leave the mavailable for contrib modules
that want to retreive old version data.
The patch has been applied to killes.drupaldevs.org/revision where it
can be tested by anybody (especially people who have "site admin"
rights on drupal.org)
The database is from drupal.org and you should b able to log in with
your pass or simply mail yourself a new one.
Gerhard
------------------------------------------------------------------------
March 5, 2005 - 21:51 : Anonymous
Attachment: http://drupal.org/files/issues/revisions_21.patch (59.42 KB)
BTW, I marked this a bug because atm the revisions field can grow quite
big. Neil has reported problems from some users who were not able to
load some nodes due to to many large revisions.
Also, som unrelated stuff crept into the patch. New version attached.
------------------------------------------------------------------------
March 8, 2005 - 07:56 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_22.patch (60.29 KB)
Ok, I think I got it.
Changes to last version:
- uploads are no properly versioned.
Missing are still pgsql checks and updates.
------------------------------------------------------------------------
March 10, 2005 - 18:58 : Anonymous
Was able to get http://drupal.org/files/issues/revisions_21.patch to
work with drupal-cvs.tar.gz (10 March 2005) by:
- includes/database.mysql.inc: Commenting out duplicates for functions
function db_begin_transaction and function db_commit_transaction
- modules: node.module: Removing "'title' => $node->title," from
$node_table_values variable declaration and removing "'title' =>
"'%s'"," from "$node_table_types" variable declaration.
Happy to submit a patch if requested. I'll watch this thread.
------------------------------------------------------------------------
March 12, 2005 - 01:59 : killes(a)www.drop.org
The duplicate function has been removed in rev 22 of this patch.
Why do you think the changes in node_save are needed? Titles are saved
in both tables for performance reasons.
------------------------------------------------------------------------
March 13, 2005 - 18:12 : jlerner
Hi - I posted comment #62. The changes to node_save appear to be needed
because recent patches (both 21 and 22) remove the field 'title' from
table 'node'. So without the changes to node_save, node.module is
broken and generates errors.
Joshua
------------------------------------------------------------------------
March 13, 2005 - 18:29 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_23.patch (61.17 KB)
Thanks, Joshua, for catching this. node:title is there to stay.
------------------------------------------------------------------------
April 13, 2005 - 18:29 : moshe weitzman
since HEAD is open again, perhaps it is a good time to revisit this
patch.
once this is committed, lets address - http://drupal.org/node/11071
"node_validate does not respect group editing"
------------------------------------------------------------------------
April 18, 2005 - 17:43 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_24.patch (60.39 KB)
Updated.
------------------------------------------------------------------------
April 18, 2005 - 18:16 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_24_0.patch (60.39 KB)
Updated.
------------------------------------------------------------------------
April 19, 2005 - 07:19 : Dries
I'll commit this patch later this week! If you haven't checked this
patch already, I urge you to test/check it out because it will have
significant impact on existing code and modules!
------------------------------------------------------------------------
April 19, 2005 - 07:21 : Dries
Also, what do people think about the n.title being duplicated?
------------------------------------------------------------------------
April 19, 2005 - 07:26 : chx
I won't lose any sleep because of duplicated titles...
------------------------------------------------------------------------
April 19, 2005 - 20:35 : killes(a)www.drop.org
Let me explain why I have chosen to duplicate the title (and also the
uid): If you look at all the queries in Drupal, you will find that most
of them only need the title and th uid of a node. That is, by
duplicating it, we save expensive joins on the node_revisions table.
Due to this fact, this patch is actually a performance improvement.
A note about updating contrib module:
Strictly speaking they wouldn't need to be updated. They only need to
if their authors decide that their info should be available for
revisioning. The upgrade path for forum.module in my update.inc patch
(+ the forum patch)
should show you what needs to be done.
I will write a note for the update page once the patch hits core.
------------------------------------------------------------------------
April 24, 2005 - 23:21 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_25.patch (60.38 KB)
Updated to cvs.
Dries: Based on some remarks in #drupal this is the last update I am
going to do. Apply it or won't fix it.
------------------------------------------------------------------------
April 30, 2005 - 05:42 : Jeremy(a)kerneltrap.org
Attachment: http://drupal.org/files/issues/revisions_25.patch.patch (528 bytes)
That's a big patch. I've only started looking through it. I noticed
one little typo, affecting updates. A patch to your last patch is
attached.
I'm running with the revision patch on my dev server now happily. I
like the concept.
What happens if you click 'stop' on your browser in the middle of a
MySQL "transaction"? I assume that kills the connection to MySQL, and
the lock is freed? But this then leaves changes only partially
applied?
What exactly does locking/unlocking the tables buy us in MySQL? I
don't see anywhere that we detect if an apply fails part way through,
and thus roll back...? What am I missing?
------------------------------------------------------------------------
April 30, 2005 - 09:11 : Dries
Jeremy: many of us are worried about the performance ramifications this
patch introduces. Early experiments showed a small performance
improvement (while a performance regression might be expected). More
performance reports from large sites like kerneltrap.org will certainly
help this patch. Mind to do a quick performance comparision and to
report back with some numbers? Thanks.
------------------------------------------------------------------------
April 30, 2005 - 14:38 : Jeremy(a)kerneltrap.org
Dries: I'm not running HEAD on kerneltrap, so this really isn't a
possibility. Furthermore, until I understand why we're locking tables,
I don't like it. The idea of revisions in their own tables is great.
The idea of locking tables to get (without any obvious benefit) there
really worries me.
------------------------------------------------------------------------
April 30, 2005 - 16:16 : killes(a)www.drop.org
@Jeremy: Thanks for looking at the patch! Also for catching the typo. :)
Did you try to upgrade your database? If yes, how did it go? One of
Dries' concerns is the complexity of the upgrade. How many nodes and
revisions did the db have?
About database locking: This part of the patch was created by Nick and
I simply continued to use it.
Maybe the code should rather be:
if(db_begin_transaction(array('{node}', '{node_revisions}',
'{watchdog}', '{sessions}', '{files}'))) {
db_query($node_query, $node_table_values);
db_query($revisions_query, $revisions_table_values);
db_commit_transaction();
...
}
The idea is probably to avoid two updates at the same time. I don't
think the locking helps if you abort the script at an inconvenient
time. Rollbacks aren't implemented in all mysql versions.
We could omit the db locking if deemed inappropriate. Maybe Nick can
explain his ideas behind this.
@Dries: I wonder who the "many of us" are. They certainly haven't
spoken to me. Moshe had some reservations about the upgrade path and
project module, but the time that project module abused revisions to
store issue updates was long ago and his reservations were resolved.
Nobody else (besides you of course and now Jeremy) has voiced
reservations in a way that was audible to me.
If you grep through the patch you will notice that there are only four
queries which have a join on the node_revisions table. Two of them are
in node_load and in the other cases the join replaced a join on the
node table. The two queries in node_load are the only ones that have
both a join on the node and the revisions query. Thus, loading of
individual nodes might become somewhat slower. All other queries will
be faster since the node table is now much smaller. Also, node loading
does not have to be slower, it depends on your node table. If you had
a lot of revisions and thus a large table, then the new scheme will
make your queries actually faster since we do not load the revisions
on each and every node load anymore. If you didn't have many revisions
your node_load migth be somewhat slower.
WRT to the update script Karoly pointed out that we could use multiple
insert queries instead one query per revision. This would probably
make the update somewhat faster. I am willing to work on this iff you
declare that you will commit the patch afterwards. I'd need to know if
this will work on pgsql and on all supported mysql versions before I
work on it.
About locking: Database locking is dog slow, at least on mysql. I was
using locks in an earlier version of the upgrade script but had to
remove it for (serious!) performance reasons.
------------------------------------------------------------------------
May 9, 2005 - 17:07 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_26.patch (46.45 KB)
Ok, another update, cause I need it myself.
I've left out the transaction stuff for now. It is in principle
unrelated to this patch and should be discussed elsewhere.
This also makes the patch smaller and easier to review (hint, hint).
------------------------------------------------------------------------
May 9, 2005 - 22:32 : killes(a)www.drop.org
The patch contained the update functions twice.
------------------------------------------------------------------------
May 9, 2005 - 22:32 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_27.patch (39.05 KB)
The patch contained the update functions twice.
------------------------------------------------------------------------
May 9, 2005 - 23:23 : Dries
Got one error during the upgrade path:
ALTER TABLE {book} ADD PRIMARY KEY vid (vid)
FAILED
------------------------------------------------------------------------
May 9, 2005 - 23:26 : killes(a)www.drop.org
This had happend to me as well, when I tested this patch. The reason is
that for some reason the vid is not unique. Most likely there are some
entries with vid = 0 in there. Can you check which node types those
have? it always was an error in the test database. See:
http://drupal.org/node/7582#comment-20678
------------------------------------------------------------------------
May 9, 2005 - 23:27 : Dries
Actually, I got 2850 errors during the upgrade.
Some of these:
sprintf() [function.sprintf]: Too few arguments in
drupal-cvs/includes/database.inc on line 154.
Some of these:
Query was empty query: in drupal-cvs/includes/database.mysql.inc on
line 66.
And this:
Unknown table 'n' in field list query: SELECT n.nid, n.vid FROM node
INNER JOIN files f ON n.nid = f.nid in
drupal-cvs/includes/database.mysql.inc on line 66.
:-)
------------------------------------------------------------------------
May 9, 2005 - 23:29 : Dries
Or this:
user error: Unknown column 'log' in 'field list'
query: SELECT parent, weight, log FROM book WHERE nid = 1 in
drupal-cvs/includes/database.mysql.inc on line 66
------------------------------------------------------------------------
May 9, 2005 - 23:52 : Dries
The time required to generate my main page went from 902 ms (before
upgrade) to 2139 ms (after upgrade).
The time required to generate a forum listing (?q=forum/x) went from
1872 ms (before upgrade) to 2874 ms (after upgrade).
Maybe this is because my database is not consistent as the result of
the upgrade errors (yet I don't see any errors on the pages I
benchmarked).
------------------------------------------------------------------------
May 10, 2005 - 02:24 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_28.patch (53.47 KB)
Ok, let me get to this from the bottom to the top:
- my test runs indicated a different development wrt timing. If I had
gotten your results, I had stopped working on this long ago. So your
results are wrong for some reason.
- user error: Unknown column 'log' in 'field list'
Wasn't my day, the book patch got lost. It is contained now. First -R
the old patch, then apply this one.
- Unknown table 'n' in field list query:
Walkah found this, but I forgot to fix it. Fixed now.
- I've no idea where the other queries come from. I am hoping that
either your test db is borken or they are follow ups from the other
ones.
If you let me have your test db, I'll try some debugging.
Thanks for wasting your time, too.
------------------------------------------------------------------------
May 10, 2005 - 07:07 : Dries
I double-checked and the numbers don't seem to lie. I'll test some more
after work on another machine to make sure it is not platform-specific.
1
0
Issue status update for http://drupal.org/node/22551
Project: Drupal
Version: cvs
Component: node system
Category: bug reports
Priority: critical
Assigned to: Anonymous
Reported by: jjeff
Updated by: jjeff
Status: patch
Attachment: http://drupal.org/files/issues/node_no-options.patch (1009 bytes)
When there are no node options (such as 'published', 'in queue', etc)
set for a node-type, the variable_get() in line 1327 does not return an
array. So all of the in_array() functions that follow generate errors.
Here is a patch that substitutes an empty array for $node_options if it
isn't an array. This fixes the problem.
-Jeff
jjeff
1
0
The 4.6 changelog states the following:
- usability:
...
* refactored the statistics pages.
with not much detail apart from it being rewritten.
When I access the log pages for 4.6, there are lots of missing
functions, most notably:
1. There is no way to separate external from internal referrers, which
makes this a much less useful referer than before.
2. Top pages cannot be sorted by date
3. Recent hits cannot be sorted by number of hits
To me, this is reduced functionality in a newer release, and does not
rest well with many, just like the "Remember Me" checkbox that was
removed in 4.5.
Was there a discussion on this on this list that I missed?
2
2
Issue status update for http://drupal.org/node/22035
Project: Drupal
Version: cvs
Component: base system
Category: tasks
Priority: normal
Assigned to: Anonymous
Reported by: mikeryan
Updated by: mikeryan
Status: patch
re: worst-case scenario - I understand that.
re: # queries - I would not want to assume adding to the number of
queries is necessarily a significant performance hit (with database
caching on, of course). I understand the suspicion - I've been around
long enough to remember when you did everything you could to avoid
making one single SQL query more than you absolutely had to. But modern
databases (including MySQL) are very well optimized for this kind of
application (lots and lots of small queries). And, I have seen
real-life cases where adding caching of data that's already being
cached somewhere upstream actually degrades performance.
As long as we're examining this particular area of performance under a
microscope, let's make sure we squeeze every millisecond of savings we
can out of it - I'd just like to see the numbers for a cached
database/no Drupal caching combination for comparison.
Thanks.
mikeryan
Previous comments:
------------------------------------------------------------------------
May 4, 2005 - 21:10 : mikeryan
See Investigate use of conf_url_rewrite [1] for context...
The current core support for translating incoming path aliases to the
internal Drupal path (drupal_get_path_alias) and substituting aliases
when generating links to internal paths (drupal_get_normal_path) does
not scale well with many aliases. The ease with which pathauto enables
site administrators to generate large numbers of aliases exposes this
issue, but it is inherent in the core implementation, because
drupal_get_path_map reads in the entire url_alias table at bootstrap
time. I'd like to discuss ideas on how to improve the performance...
Most obviously, why not simply query the url_alias table as needed
instead of loading the whole table? In the incoming case, only a single
simple SELECT is necessary, which will always be more efficient than
reading the table. In the outgoing case, there might be some slight
performance advantage to caching the table with a small number of
aliases, but the disadvantage can become huge as the alias table grows.
One note - I've noticed that the src column in url_alias is not
indexed, I think adding an index should significantly help the
performance in the outgoing case if we were to do individual SELECTs.
Any other thoughts?
[1] http://drupal.org/node/21938
------------------------------------------------------------------------
May 4, 2005 - 22:02 : mikeryan
Attachment: http://drupal.org/files/issues/bootstrap.inc_2.patch (1.06 KB)
What the hell, I went ahead and gave it a shot... On my home system,
where I'm testing out 4.6.0, page loads have been taking several
seconds, which I attributed to the fact that it's an old computer and
I'm multi-tasking like crazy. I implemented my own suggestion, and now
pages load in about one second, it makes an incredible difference
(FWIW, my url_alias table has over 4000 rows).
Patches attached, go give it a try...
------------------------------------------------------------------------
May 4, 2005 - 22:02 : mikeryan
Attachment: http://drupal.org/files/issues/common.inc_11.patch (723 bytes)
One attachment per note, that's tedious...
------------------------------------------------------------------------
May 4, 2005 - 22:03 : mikeryan
Attachment: http://drupal.org/files/issues/path.module_0.patch (1.34 KB)
------------------------------------------------------------------------
May 4, 2005 - 22:03 : mikeryan
Attachment: http://drupal.org/files/issues/database.mysql_4.patch (553 bytes)
------------------------------------------------------------------------
May 4, 2005 - 22:04 : mikeryan
Attachment: http://drupal.org/files/issues/database.pgsql_2.patch (502 bytes)
------------------------------------------------------------------------
May 4, 2005 - 22:07 : mikeryan
Note: I just noticed that the database.*sql patches showed an extra diff
(not mine) to the location field in locales_source. I did my diffs
against a freshly-updated DRUPAL-4-6, and had made the edits against a
release from a few days ago... Those locales_source changes should
probably be removed from my patches before applying.
------------------------------------------------------------------------
May 7, 2005 - 19:13 : mikeryan
I upgraded Fenway Views [2] to Drupal 4.6.0 today, incorporating these
patches. Performance is very noticeably improved.
[2] http://fenway-views.com/
------------------------------------------------------------------------
May 8, 2005 - 09:31 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/pathalias.patch (4.98 KB)
I've merged the patch into one. Much more convenient. I also changed it
to cvs as only there new features will be added.
Upgrade path needs to be added.
Mike, can you run some more tests? We are especially interested in
"hard" data ie numbers with possibly error bars. it would be
interesting to know how this patch affects sites that have only a few
path aliases vs those with a lot of them. Also, how pages with a lot of
node links (tracker) are affected.
It might be worthwhile to add static caching in drupal_get_normal_path
if it gets calle dmore than once for a link on one page view.
------------------------------------------------------------------------
May 8, 2005 - 13:21 : mikeryan
Thanks for merging the patches, I wasn't aware you could patch multiple
files at once. If this is accepted, of course, adding the index on src
should be incorporated into updates.inc.
In terms of hard data, I've never profiled PHP code - what tool(s) do
you use? I couldn't find anything referenced in the PHP manual...
re: pages with lots of links - on Fenway Views the big test is the
calendar [3], and the performance improvement seems even more dramatic
here. I don't know why - the url_alias table is read once per page, so
I would expect as you do that preloading the table would tend to look
better when you have a hundred internal links on the page. Maybe we're
both underestimating exactly how fast a simple MySQL query on an
indexed key can be (I'm using MySQL 4.0.20, BTW)...
[3] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 8, 2005 - 13:26 : killes(a)www.drop.org
As we speak, Mathias is doing some tests. The tool of choice is usually
apache bench. If you have a lot of url aliases, the generated array
will be huge. Part of the improvement could be due to the fact that you
need less memory now.
------------------------------------------------------------------------
May 8, 2005 - 13:28 : mikeryan
Great, thanks!
------------------------------------------------------------------------
May 8, 2005 - 13:34 : Dries
On drupal.org it takes 800 ms or more to generate a page. Of these 800
ms, only 4 ms is spent building the path alias map (incl. the SQL query
time which takes about 1 ms). That is, building the map takes 0.5% of
the total time which is neglible. In our url_alias table are only 321
aliases though. How many aliases do you have?
What is more, when we build drupal.org's main page, we query this map
215 times. I believe your patch would introduce 215 SQL queries ...
I'm afraid that if we'd apply your patch, we'd pay a serious
performance penalty (unless we have many more aliases).
Can you provide some numbers/measurements?
------------------------------------------------------------------------
May 8, 2005 - 13:37 : Dries
Note: my measurements did not take into account the time spent searching
the map. On the main page, we search this array 215 times.
------------------------------------------------------------------------
May 8, 2005 - 14:08 : puregin
I believe that it's felt that adding path aliases would improve the
Drupal documentation. This may well add 400+ paths to the URL aliases
table on Drupal.org
------------------------------------------------------------------------
May 8, 2005 - 14:27 : Dries
I spent some more time investigating this.
The relevant code in bootstrap.inc is this:
function drupal_get_path_alias($path) {
if (($map = drupal_get_path_map()) && ($newpath = array_search($path,
$map))) {
return $newpath;
}
...
Turns out that the time to build the map is neglible (< 5ms, see
previous comments), however the total time spent on 'path aliasing' it
about 70ms per page! 50ms of these 70ms are spent on $map =
drupal_get_path_map(). The remaining 20ms is spent on $newpath =
array_search($path, $map).
The first call to drupal_get_path_map() takes 3 to 5 ms, each
subsequent call takes about 0.3 ms. Searching the array costs 0.1 ms.
However, if you have to do this 130+ times, this adds up to a whopping
70 ms. Remind that we have 321 aliases in our map.
As drupal_get_path_alias() is typically called from code>url(), which
in turn is typically called from l() I set out to investigate the time
spent in l(). Looks like l() easily gets called 120 times/page, and
that each call to l() costs about 0.9 ms. That is about 2 or 3 times
the cost of the /average SQL query/. The total time spent in l() is
115 ms!
------------------------------------------------------------------------
May 8, 2005 - 18:25 : adrian
This is a complete aside, and probably off-topic, but I would like to
mention that I was actually
looking at rewriting url() to be able to handle 'external urls' ,
because I want to do aliases using the site subdomain, like for
instance :
http://category.sitename.com/article/title_goes_here -> node/23
Rewriting url to allow this would also allow us to use l() for external
links of any kind, getting rid of a bunch of inline html.
------------------------------------------------------------------------
May 8, 2005 - 18:46 : chx
adrian, I have written a patch for l to support external links but Ber
said that a module like weblink shall handle those.
------------------------------------------------------------------------
May 8, 2005 - 19:03 : mikeryan
I've got over 4000 aliases on Fenway Views - another pathauto user
reported over 6000.
I don't have a profiling tool to give timing data, but adding a quick
counter shows that for the Fenway Views calendar [4],
drupal_get_path_alias() is called 404 times and drupal_get_normal_path
is called 227 times. And trust me, this page loads MUCH faster with my
patch than it does with the path map.
I think those SQL queries are a lot cheaper than one might expect...
[4] http://fenway-views.com/calendar
------------------------------------------------------------------------
May 9, 2005 - 00:24 : mathias
Attachment: http://drupal.org/files/issues/pathalias-with-caching.patch (5.62 KB)
Some benchmarks, with MySQL and Drupal caching disabled.
2 SETS OF TESTS
=======================================
1) 500 nodes and aliases
2) 20,000 nodes and aliases
3 TYPES OF TESTS PER SET
=======================================
1) Baseline unmodified Drupal
2) The following change inside drupal_get_path_map():
- if (is_null($map)) {
+ if ($map === NULL) {
$map = array(); // Make $map non-null in case no aliases
are defined.
3) Pull aliases only when needed rather than loading the entire alias
table.
SET 1 - 500 Nodes and 500 Aliases
using: ab -c 10 -n 100 [homepage]
=======================================
Baseline
Time taken for tests: 13.00 seconds
Requests per second: 3.85 [#/sec] (mean)
Transfer rate: 56.35 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 12.463 seconds
Requests per second: 4.01 [#/sec] (mean)
Transfer rate: 57.63 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 11.502 seconds
Requests per second: 4.35 [#/sec] (mean)
Transfer rate: 63.30 [Kbytes/sec] received
SET 2 - 20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
is_null() VS NULL
Time taken for tests: 127.629 seconds
Requests per second: 0.39 [#/sec] (mean)
Transfer rate: 5.35 [Kbytes/sec] received
Per Instance Alias Lookup (32 additional queries)
Time taken for tests: 28.788 seconds
Requests per second: 1.74 [#/sec] (mean)
Transfer rate: 23.59 [Kbytes/sec] received
ANALYSIS
=======================================
Converting 'is_null()' to '=== NULL' is a no brainer and results in a
nice performance gain. And while per instance alias lookups also give a
huge boost for site with thousands of aliases, they're benefits are
entirely dependent on the number of system URLs and menu items visible
per request. As noted above I tested the standard homepage which added
32 additional queries. A request for something like the admin interface
would presumably show less benefits. I would've tested this, but I
didn't know how to invoke an authenticated page request with 'ab'. A
positive of this approach is we no longer would be storing all the
aliases in memory.
What other concerns do we have with this last approach? Am I properly
testing the strain on the database?
------------------------------------------------------------------------
May 9, 2005 - 01:08 : Dries
Can you try making the following changes?
1. To common.inc:
function drupal_rebuild_path_map() {
- drupal_get_path_map('rebuild');
+ drupal_get_path_map(TRUE);
}
2. To bootstrap.inc:
-function drupal_get_path_map($action = '') {
+function drupal_get_path_map($rebuild = FALSE) {
static $map = NULL;
- if ($action == 'rebuild') {
+ if ($rebuild) {
$map = NULL;
}
That is, replace the string comparison with a boolean comparison. Not
sure it is going to make a significant difference but it might be
another micro-improvement.
I'll test your patch shortly.
------------------------------------------------------------------------
May 9, 2005 - 08:21 : mathias
Dries. Those changes had no real performance gain.
20,000 Nodes and 20,000 Aliases
using: ab -c 5 -n 50 [homepage]
=======================================
Baseline
Time taken for tests: 204.583 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
String VS Boolean
Time taken for tests: 204.190 seconds
Requests per second: 0.24 [#/sec] (mean)
Transfer rate: 3.34 [Kbytes/sec] received
------------------------------------------------------------------------
May 9, 2005 - 09:01 : Dries
Before this patch can be committed, we need to do more testing. I'd
like to know how this behaves on sites with few or no path aliases, and
on sites like drupal.org, with a modest amount of path aliases.
The reason I'm asking is because MySQL is often the bottleneck and not
PHP/Apache. This is the case on drupal.org, for example. The proposed
patch moves some of the processing costs from PHP to MySQL. On
drupal.org, the amount of SQL queries/page would double. Needless to
say, this is somewhat scary. ;)
More numbers would be appreciated.
------------------------------------------------------------------------
May 9, 2005 - 22:56 : mikeryan
I'm curious, why test with MySQL caching disabled? Since much of the
issue is the expense of making (potentially many) more queries, this
wouldn't seem to reflect the performance gain in practice.
The caching (compared to making the query each time) seems like
overkill to me - with MySQL caching enabled, I would think this
complicates the code for little (if any) performance gain. Did you do
any profiling using my original (cacheless) patch?
Thanks.
------------------------------------------------------------------------
May 10, 2005 - 01:23 : mathias
In an ideal world everyone would be running a query cache but I wanted
to see how this would hold up under the worst-case scenario.
I tested the original patch which didn't use static variables and the
amount of queries doubled. For example, to load the front page, Drupal
queried the url_alias table 22 times looking for an alias for 'node'.
There's no need to send duplicate queries to the DB since it seems
that's where most bottlenecks seem to lie.
1
0
Issue status update for http://drupal.org/node/21067
Project: Drupal
Version: 4.6.0
Component: base system
Category: bug reports
Priority: critical
Assigned to: CdnStrangequark
Reported by: jwells
Updated by: killes(a)www.drop.org
-Status: active
+Status: patch
Attachment: http://drupal.org/files/issues/valid_email.patch (1.31 KB)
Our intrepid Debian developer has cooked up a patch. It is based on RFC
2822 but it needs testing.
killes(a)www.drop.org
Previous comments:
------------------------------------------------------------------------
April 22, 2005 - 07:00 : jwells
When attemping to use a sub-domain email address for a new account, it
won't pass the syntax test. We know that its really the base - but I'm
sure a lot of end uers don't know.
newaccount(a)research.drupal.org - this type of address will fail, though
it is actually legal
------------------------------------------------------------------------
May 10, 2005 - 17:00 : CdnStrangequark
Not just sub-domain emails. If you have an email address of the form:
"first.last(a)somewhere.com" it will also fail even though this is
perfectly valid.
------------------------------------------------------------------------
May 10, 2005 - 18:30 : CdnStrangequark
After attempting to enter more emails on one of my new sites, I also
discovered that the validation fails in yet more perfectly valid cases.
For example: "myemail(a)somewhere.xx" where xx is the country domain code.
(like .ca, or .us). Not all country codes are accepted.
Here is a replacement I made for the code in common.inc:
*valid_email_address($mail)* that works just great:
$user = "[-a-z0-9!#$%&'*+/=?^_`{|}~]";
$domain = "([a-z]([-a-z0-9]*[a-z0-9]+)?)";
$regex = "^$user+(\.$user+)*(a)($domain{1,63}\.)+$domain{2,63}$";
//Return a 1 or 0 to mimic results of preg_match
if (eregi($regex, $mail)) {
return 1;
} else {
return 0;
}
The only thing this doesn't do is allow for "user@localhost" but does
anyone really do that anyway? The code could be modified to do it
through an alternate check on $domain though.
PS: I left this post's status as active and unassigned cause I'm kinda
new here and don't know the process for submitting patches and bug
fixes. Hope someone can put this code in the core though cause I'm sure
we're not the only ones who have run into the problem.
2
1
Issue status update for http://drupal.org/node/21067
Project: Drupal
Version: 4.6.0
Component: base system
Category: bug reports
Priority: critical
Assigned to: CdnStrangequark
Reported by: jwells
Updated by: Cvbge
Status: patch
Please add a button 'I know my email is good, accept it!' that would be
displayed when email is found invalid.
Cvbge
Previous comments:
------------------------------------------------------------------------
April 22, 2005 - 07:00 : jwells
When attemping to use a sub-domain email address for a new account, it
won't pass the syntax test. We know that its really the base - but I'm
sure a lot of end uers don't know.
newaccount(a)research.drupal.org - this type of address will fail, though
it is actually legal
------------------------------------------------------------------------
May 10, 2005 - 17:00 : CdnStrangequark
Not just sub-domain emails. If you have an email address of the form:
"first.last(a)somewhere.com" it will also fail even though this is
perfectly valid.
------------------------------------------------------------------------
May 10, 2005 - 18:30 : CdnStrangequark
After attempting to enter more emails on one of my new sites, I also
discovered that the validation fails in yet more perfectly valid cases.
For example: "myemail(a)somewhere.xx" where xx is the country domain code.
(like .ca, or .us). Not all country codes are accepted.
Here is a replacement I made for the code in common.inc:
*valid_email_address($mail)* that works just great:
$user = "[-a-z0-9!#$%&'*+/=?^_`{|}~]";
$domain = "([a-z]([-a-z0-9]*[a-z0-9]+)?)";
$regex = "^$user+(\.$user+)*(a)($domain{1,63}\.)+$domain{2,63}$";
//Return a 1 or 0 to mimic results of preg_match
if (eregi($regex, $mail)) {
return 1;
} else {
return 0;
}
The only thing this doesn't do is allow for "user@localhost" but does
anyone really do that anyway? The code could be modified to do it
through an alternate check on $domain though.
PS: I left this post's status as active and unassigned cause I'm kinda
new here and don't know the process for submitting patches and bug
fixes. Hope someone can put this code in the core though cause I'm sure
we're not the only ones who have run into the problem.
------------------------------------------------------------------------
May 10, 2005 - 23:07 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/valid_email.patch (1.31 KB)
Our intrepid Debian developer has cooked up a patch. It is based on RFC
2822 but it needs testing.
1
0
Issue status update for http://drupal.org/node/22531
Project: Drupal
Version: cvs
Component: database system
Category: bug reports
Priority: normal
Assigned to: Morbus Iff
Reported by: Morbus Iff
Updated by: Morbus Iff
Status: patch
Attachment: http://drupal.org/files/issues/_p_rssxmldb.patch (4.82 KB)
The CHANGELOG for Drupal 4.5 claims that a URL alias of "rss.xml" for
"node/feed" was added. This ONLY occurs in updates.inc however - if
people grab a fresh 4.5 (or now, 4.6), they are never going to get this
added alias. The attached patch adds this INSERT into the default
database.* files, adds another update into the updates.inc (so that
those who started with a fresh 4.5 or 4.6 will get it via update.php
the next time around) and fixes some minor whitespace/style
inconsistencies.
Morbus Iff
1
0
Issue status update for http://drupal.org/node/19442
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: wiz
Updated by: wiz
Status: active
When cache_set is called for a cache entry which already exists in that
very form (including the timestamp), the first UPDATE will change no
rows (db_affected_rows() == 0), and the subsequent INSERT will fail
because the row (i.e., the primary key) already exists.
This can happen only if calling cache_set with the same parameters
twice in one second -- but this can happen. Example:
locale_refresh_cache() may be called often from locale() when no rows
exist in the translation_* tables (which is a bug by itself). The
error message then is:
user error: Duplicate entry 'locale:de' for key 1
query: INSERT INTO cache (cid, data, created, expire, headers) VALUES
('locale:de', 'N;', 1111769323, 0, '') in
/var/www/drupal-cvs/includes/database.mysql.inc on line 66.
I can think of no obvious general fix, except to add another column to
cache that is updated with a random number, or to use a SELECT to check
for the existence of the row. The function locale_refresh_cache in
locale.module seems to have a bug too (separate post).
wiz
3
9