[drupal-devel] [task] Performance issues with path aliases
Issue status update for http://drupal.org/node/22035 Project: Drupal Version: cvs Component: base system Category: tasks Priority: normal Assigned to: Anonymous Reported by: mikeryan Updated by: mathias Status: patch Attachment: http://drupal.org/files/issues/pathalias-with-caching.patch (5.62 KB) Some benchmarks, with MySQL and Drupal caching disabled. 2 SETS OF TESTS ======================================= 1) 500 nodes and aliases 2) 20,000 nodes and aliases 3 TYPES OF TESTS PER SET ======================================= 1) Baseline unmodified Drupal 2) The following change inside drupal_get_path_map(): - if (is_null($map)) { + if ($map === NULL) { $map = array(); // Make $map non-null in case no aliases are defined. 3) Pull aliases only when needed rather than loading the entire alias table. SET 1 - 500 Nodes and 500 Aliases using: ab -c 10 -n 100 [homepage] ======================================= Baseline Time taken for tests: 13.00 seconds Requests per second: 3.85 [#/sec] (mean) Transfer rate: 56.35 [Kbytes/sec] received is_null() VS NULL Time taken for tests: 12.463 seconds Requests per second: 4.01 [#/sec] (mean) Transfer rate: 57.63 [Kbytes/sec] received Per Instance Alias Lookup (32 additional queries) Time taken for tests: 11.502 seconds Requests per second: 4.35 [#/sec] (mean) Transfer rate: 63.30 [Kbytes/sec] received SET 2 - 20,000 Nodes and 20,000 Aliases using: ab -c 5 -n 50 [homepage] ======================================= Baseline Time taken for tests: 204.583 seconds Requests per second: 0.24 [#/sec] (mean) Transfer rate: 3.34 [Kbytes/sec] received is_null() VS NULL Time taken for tests: 127.629 seconds Requests per second: 0.39 [#/sec] (mean) Transfer rate: 5.35 [Kbytes/sec] received Per Instance Alias Lookup (32 additional queries) Time taken for tests: 28.788 seconds Requests per second: 1.74 [#/sec] (mean) Transfer rate: 23.59 [Kbytes/sec] received ANALYSIS ======================================= Converting 'is_null()' to '=== NULL' is a no brainer and results in a nice performance gain. And while per instance alias lookups also give a huge boost for site with thousands of aliases, they're benefits are entirely dependent on the number of system URLs and menu items visible per request. As noted above I tested the standard homepage which added 32 additional queries. A request for something like the admin interface would presumably show less benefits. I would've tested this, but I didn't know how to invoke an authenticated page request with 'ab'. A positive of this approach is we no longer would be storing all the aliases in memory. What other concerns do we have with this last approach? Am I properly testing the strain on the database? mathias Previous comments: ------------------------------------------------------------------------ May 4, 2005 - 19:10 : mikeryan See Investigate use of conf_url_rewrite [1] for context... The current core support for translating incoming path aliases to the internal Drupal path (drupal_get_path_alias) and substituting aliases when generating links to internal paths (drupal_get_normal_path) does not scale well with many aliases. The ease with which pathauto enables site administrators to generate large numbers of aliases exposes this issue, but it is inherent in the core implementation, because drupal_get_path_map reads in the entire url_alias table at bootstrap time. I'd like to discuss ideas on how to improve the performance... Most obviously, why not simply query the url_alias table as needed instead of loading the whole table? In the incoming case, only a single simple SELECT is necessary, which will always be more efficient than reading the table. In the outgoing case, there might be some slight performance advantage to caching the table with a small number of aliases, but the disadvantage can become huge as the alias table grows. One note - I've noticed that the src column in url_alias is not indexed, I think adding an index should significantly help the performance in the outgoing case if we were to do individual SELECTs. Any other thoughts? [1] http://drupal.org/node/21938 ------------------------------------------------------------------------ May 4, 2005 - 20:02 : mikeryan Attachment: http://drupal.org/files/issues/bootstrap.inc_2.patch (1.06 KB) What the hell, I went ahead and gave it a shot... On my home system, where I'm testing out 4.6.0, page loads have been taking several seconds, which I attributed to the fact that it's an old computer and I'm multi-tasking like crazy. I implemented my own suggestion, and now pages load in about one second, it makes an incredible difference (FWIW, my url_alias table has over 4000 rows). Patches attached, go give it a try... ------------------------------------------------------------------------ May 4, 2005 - 20:02 : mikeryan Attachment: http://drupal.org/files/issues/common.inc_11.patch (723 bytes) One attachment per note, that's tedious... ------------------------------------------------------------------------ May 4, 2005 - 20:03 : mikeryan Attachment: http://drupal.org/files/issues/path.module_0.patch (1.34 KB) ------------------------------------------------------------------------ May 4, 2005 - 20:03 : mikeryan Attachment: http://drupal.org/files/issues/database.mysql_4.patch (553 bytes) ------------------------------------------------------------------------ May 4, 2005 - 20:04 : mikeryan Attachment: http://drupal.org/files/issues/database.pgsql_2.patch (502 bytes) ------------------------------------------------------------------------ May 4, 2005 - 20:07 : mikeryan Note: I just noticed that the database.*sql patches showed an extra diff (not mine) to the location field in locales_source. I did my diffs against a freshly-updated DRUPAL-4-6, and had made the edits against a release from a few days ago... Those locales_source changes should probably be removed from my patches before applying. ------------------------------------------------------------------------ May 7, 2005 - 17:13 : mikeryan I upgraded Fenway Views [2] to Drupal 4.6.0 today, incorporating these patches. Performance is very noticeably improved. [2] http://fenway-views.com/ ------------------------------------------------------------------------ May 8, 2005 - 07:31 : killes@www.drop.org Attachment: http://drupal.org/files/issues/pathalias.patch (4.98 KB) I've merged the patch into one. Much more convenient. I also changed it to cvs as only there new features will be added. Upgrade path needs to be added. Mike, can you run some more tests? We are especially interested in "hard" data ie numbers with possibly error bars. it would be interesting to know how this patch affects sites that have only a few path aliases vs those with a lot of them. Also, how pages with a lot of node links (tracker) are affected. It might be worthwhile to add static caching in drupal_get_normal_path if it gets calle dmore than once for a link on one page view. ------------------------------------------------------------------------ May 8, 2005 - 11:21 : mikeryan Thanks for merging the patches, I wasn't aware you could patch multiple files at once. If this is accepted, of course, adding the index on src should be incorporated into updates.inc. In terms of hard data, I've never profiled PHP code - what tool(s) do you use? I couldn't find anything referenced in the PHP manual... re: pages with lots of links - on Fenway Views the big test is the calendar [3], and the performance improvement seems even more dramatic here. I don't know why - the url_alias table is read once per page, so I would expect as you do that preloading the table would tend to look better when you have a hundred internal links on the page. Maybe we're both underestimating exactly how fast a simple MySQL query on an indexed key can be (I'm using MySQL 4.0.20, BTW)... [3] http://fenway-views.com/calendar ------------------------------------------------------------------------ May 8, 2005 - 11:26 : killes@www.drop.org As we speak, Mathias is doing some tests. The tool of choice is usually apache bench. If you have a lot of url aliases, the generated array will be huge. Part of the improvement could be due to the fact that you need less memory now. ------------------------------------------------------------------------ May 8, 2005 - 11:28 : mikeryan Great, thanks! ------------------------------------------------------------------------ May 8, 2005 - 11:34 : Dries On drupal.org it takes 800 ms or more to generate a page. Of these 800 ms, only 4 ms is spent building the path alias map (incl. the SQL query time which takes about 1 ms). That is, building the map takes 0.5% of the total time which is neglible. In our url_alias table are only 321 aliases though. How many aliases do you have? What is more, when we build drupal.org's main page, we query this map 215 times. I believe your patch would introduce 215 SQL queries ... I'm afraid that if we'd apply your patch, we'd pay a serious performance penalty (unless we have many more aliases). Can you provide some numbers/measurements? ------------------------------------------------------------------------ May 8, 2005 - 11:37 : Dries Note: my measurements did not take into account the time spent searching the map. On the main page, we search this array 215 times. ------------------------------------------------------------------------ May 8, 2005 - 12:08 : puregin I believe that it's felt that adding path aliases would improve the Drupal documentation. This may well add 400+ paths to the URL aliases table on Drupal.org ------------------------------------------------------------------------ May 8, 2005 - 12:27 : Dries I spent some more time investigating this. The relevant code in bootstrap.inc is this: function drupal_get_path_alias($path) { if (($map = drupal_get_path_map()) && ($newpath = array_search($path, $map))) { return $newpath; } ... Turns out that the time to build the map is neglible (< 5ms, see previous comments), however the total time spent on 'path aliasing' it about 70ms per page! 50ms of these 70ms are spent on $map = drupal_get_path_map(). The remaining 20ms is spent on $newpath = array_search($path, $map). The first call to drupal_get_path_map() takes 3 to 5 ms, each subsequent call takes about 0.3 ms. Searching the array costs 0.1 ms. However, if you have to do this 130+ times, this adds up to a whopping 70 ms. Remind that we have 321 aliases in our map. As drupal_get_path_alias() is typically called from code>url(), which in turn is typically called from l() I set out to investigate the time spent in l(). Looks like l() easily gets called 120 times/page, and that each call to l() costs about 0.9 ms. That is about 2 or 3 times the cost of the /average SQL query/. The total time spent in l() is 115 ms! ------------------------------------------------------------------------ May 8, 2005 - 16:25 : adrian This is a complete aside, and probably off-topic, but I would like to mention that I was actually looking at rewriting url() to be able to handle 'external urls' , because I want to do aliases using the site subdomain, like for instance : http://category.sitename.com/article/title_goes_here -> node/23 Rewriting url to allow this would also allow us to use l() for external links of any kind, getting rid of a bunch of inline html. ------------------------------------------------------------------------ May 8, 2005 - 16:46 : chx adrian, I have written a patch for l to support external links but Ber said that a module like weblink shall handle those. ------------------------------------------------------------------------ May 8, 2005 - 17:03 : mikeryan I've got over 4000 aliases on Fenway Views - another pathauto user reported over 6000. I don't have a profiling tool to give timing data, but adding a quick counter shows that for the Fenway Views calendar [4], drupal_get_path_alias() is called 404 times and drupal_get_normal_path is called 227 times. And trust me, this page loads MUCH faster with my patch than it does with the path map. I think those SQL queries are a lot cheaper than one might expect... [4] http://fenway-views.com/calendar
participants (1)
-
mathias