Guidelines for writing efficient SQL code
Dear Friends, I welcome your comments on this short guide for writing efficient SQL code in Drupal: http://drupal.org/node/559302 Feel free to start hunting for speed on the two issues mentioned (logging and LEFT JOINs). I will add several tips in the next days. The forum code proved to be heavily broken and I could debug it in 4 hours. So it is very likely that the entire Drupal 6.dev is at stake. Probably Drupal 7 alpha too. Because Drupal makes little use of LEFT JOINs in the code, it is likely that it cannot scale well. Hence a reputation to be slow. But it can change rapidly if you fix the code and apply the tips. I will be back tomorrow with more tips. Hunt for speed now! Kind regards, Jean-Michel
I'm not super sure how to react to this, especially after seeing this thread: http://drupal.org/node/559424 Jean-Michel, I think you misunderstand joins as they pertain to best practice and performance. Are we all basically in agreement that INNER JOIN is the faster of the two for most purposes? -D Jean-Michel Pouré wrote:
Dear Friends,
I welcome your comments on this short guide for writing efficient SQL code in Drupal: http://drupal.org/node/559302
Feel free to start hunting for speed on the two issues mentioned (logging and LEFT JOINs). I will add several tips in the next days.
The forum code proved to be heavily broken and I could debug it in 4 hours. So it is very likely that the entire Drupal 6.dev is at stake. Probably Drupal 7 alpha too.
Because Drupal makes little use of LEFT JOINs in the code, it is likely that it cannot scale well. Hence a reputation to be slow. But it can change rapidly if you fix the code and apply the tips.
I will be back tomorrow with more tips. Hunt for speed now!
Kind regards, Jean-Michel
Le mardi 25 août 2009 à 16:27 -0700, Domenic Santangelo a écrit :
Jean-Michel, I think you misunderstand joins as they pertain to best practice and performance. Are we all basically in agreement that INNER JOIN is the faster of the two for most purposes?
Dear Friends, I rewrote the guide: http://drupal.org/node/559302 It will be expanded further in the next days. Feel free to comment. Kind regards, Jean-Michel
Do LEFT JOINS outperform INNER JOINS? Yes Do INNER JOINS outperform LEFT JOINS? Yes It depends on the query. The MySQL query optimizer treats both differently. That's why its best to try different methods of joins and analyze each query to see where the best performance can be gained, and dependent upon the usage of said query. Also on indexes you say "Therefore the proposed solution to enhance Drupal is to hunt for multi-column indexes and replace them with single indexes.". MySQL makes very effecient use of multi column indexes. On countless queries in Drupal changing to only single column indexes would result in full table scans, temporary tables and filesorts in numerous places. When writing queries they should be targeted for performance on all supported RDBMS, yet the emphasis should be put on MySQL. The simple fact that a vast (vast) majority of Drupal sites out there run MySQL is reason enough for that, and until there is some major cosmic shift in what hosting companies offer that's going to remain the fact. Perhaps a future feature would be to have the installer optionally determine which RDBMS is being used and have targeted indexes for all in the schema - say $schema['table']['indexes']['mysql']=array(...). $schema['table']['indexes']['pgsql']=array(...). If no RDBMS is specified then the indexes go everywhere ($schema['table']['indexes'] - same as now). || Jamie Holly http://www.intoxination.net http://www.hollyit.net Jean-Michel Pouré wrote:
Le mardi 25 août 2009 à 16:27 -0700, Domenic Santangelo a écrit :
Jean-Michel, I think you misunderstand joins as they pertain to best practice and performance. Are we all basically in agreement that INNER JOIN is the faster of the two for most purposes?
Dear Friends,
I rewrote the guide: http://drupal.org/node/559302
It will be expanded further in the next days.
Feel free to comment.
Kind regards, Jean-Michel
I am far from a MySql expert, as I am sure many on this list can attest to. But I have seen one thing that seems to make a big difference on some occasions and possibly somone can illuminate me on why. That is a numeric (say int(11)) index is used with "WHERE abc = 123" seems to be considerably slower than "WHERE abc = '123'", as explained by ANALYZE. Yet, we Drupallers mostly use the non-quoted version. Nancy E. Wichmann, PMP Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed). Pierre. On Wed, 2009-08-26 at 09:18 -0400, Nancy Wichmann wrote:
I am far from a MySql expert, as I am sure many on this list can attest to. But I have seen one thing that seems to make a big difference on some occasions and possibly somone can illuminate me on why. That is a numeric (say int(11)) index is used with "WHERE abc = 123" seems to be considerably slower than "WHERE abc = '123'", as explained by ANALYZE. Yet, we Drupallers mostly use the non-quoted version.
Nancy E. Wichmann, PMP
Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql. Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad. Nancy E. Wichmann, PMP Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
All I'm asking is Standard SQL, I don't blame you to use MySQL. Pierre. On Wed, 2009-08-26 at 10:34 -0400, Nancy Wichmann wrote:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
Nancy E. Wichmann, PMP
Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
This is a moot point anyway as of Drupal 7, as you shouldn't be quoting anything. You should be using placeholders/prepared statements for all queries, and the database will deal with it. --Larry Garfield Pierre Rineau wrote:
All I'm asking is Standard SQL, I don't blame you to use MySQL.
Pierre.
On Wed, 2009-08-26 at 10:34 -0400, Nancy Wichmann wrote:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed). Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
Nancy E. Wichmann, PMP
Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
Quoting Nancy Wichmann <nan_wich@bellsouth.net>:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
It makes no sense to me why a quoted numeric string is faster than a native integer unless the database engine is broken. Not talking about MySQL vs anything else but integer comparison should be faster than string comparison. My guess is that the database engine is converting the integer to string when storing it in the index file but I have no proof of that. Have you tried your testing with InnoDB? -- Earnie -- http://r-feed.com/ -- http://for-my-kids.com/ -- http://www.4offer.biz/ -- http://give-me-an-offer.com/
I actually tried this on MySQL 5.0.17 using MyISAM. I tried it on primary, standard index and no index columns and restarted MySQL between each run and really didn't notice any different in a comments table with about 1.5 million rows (at the most .02ms and it would very in either direction - quoted or none). I wonder if there was a release out there where this may have been broken in MySQL. Jamie Holly http://www.intoxination.net http://www.hollyit.net Earnie Boyd wrote:
Quoting Nancy Wichmann <nan_wich@bellsouth.net>:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
It makes no sense to me why a quoted numeric string is faster than a native integer unless the database engine is broken. Not talking about MySQL vs anything else but integer comparison should be faster than string comparison. My guess is that the database engine is converting the integer to string when storing it in the index file but I have no proof of that. Have you tried your testing with InnoDB?
-- Earnie -- http://r-feed.com/ -- http://for-my-kids.com/ -- http://www.4offer.biz/ -- http://give-me-an-offer.com/
Quoted integers are typecast from string back to integers. If they weren't, the index wouldn't be usable. If you notice a speed improvement in quoting an integer, either the benchmark is invalid for some reason (did you test the quoted version after the regular int version? Was it in the query cache? Was it in the buffer pool already?) or your in a more complicated situation where quoting the integer is forcing you not to use the index, and the index has a very low cardinality. That is very unlikely, so I'd guess your benchmark is flawed. On Wed, Aug 26, 2009 at 2:46 PM, Jamie Holly<hovercrafter@earthlink.net> wrote:
I actually tried this on MySQL 5.0.17 using MyISAM. I tried it on primary, standard index and no index columns and restarted MySQL between each run and really didn't notice any different in a comments table with about 1.5 million rows (at the most .02ms and it would very in either direction - quoted or none). I wonder if there was a release out there where this may have been broken in MySQL.
Jamie Holly http://www.intoxination.net http://www.hollyit.net
Earnie Boyd wrote:
Quoting Nancy Wichmann <nan_wich@bellsouth.net>:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
It makes no sense to me why a quoted numeric string is faster than a native integer unless the database engine is broken. Not talking about MySQL vs anything else but integer comparison should be faster than string comparison. My guess is that the database engine is converting the integer to string when storing it in the index file but I have no proof of that. Have you tried your testing with InnoDB?
-- Earnie -- http://r-feed.com/ -- http://for-my-kids.com/ -- http://www.4offer.biz/ -- http://give-me-an-offer.com/
-- Narayan Newton GA Member Drupal Association Tag1Consulting
I didn't notice any real speed difference (it was minute and could be contributed to anything). I restarted the server daemon between each run and I keep caching off on my devel server just for better optimization during development as it is. Thats on Ubuntu 9.04 64 bit, dual core 2.2ghz running 4 gigs. I just went ahead and ran them with query cache on and saw no difference between quoted or unquoted integers. The only thing I did notice (I ran each twice on NID in comments) was that the query cache doesn't kick in if you change the field from non quote to quote in the where clause. It apparently views them as different queries, which is a good thing. My original tests were on PID and NID. For the no index test I just dropped the NID index. Jamie Holly http://www.intoxination.net http://www.hollyit.net Narayan Newton wrote:
Quoted integers are typecast from string back to integers. If they weren't, the index wouldn't be usable.
If you notice a speed improvement in quoting an integer, either the benchmark is invalid for some reason (did you test the quoted version after the regular int version? Was it in the query cache? Was it in the buffer pool already?) or your in a more complicated situation where quoting the integer is forcing you not to use the index, and the index has a very low cardinality. That is very unlikely, so I'd guess your benchmark is flawed.
On Wed, Aug 26, 2009 at 2:46 PM, Jamie Holly<hovercrafter@earthlink.net> wrote:
I actually tried this on MySQL 5.0.17 using MyISAM. I tried it on primary, standard index and no index columns and restarted MySQL between each run and really didn't notice any different in a comments table with about 1.5 million rows (at the most .02ms and it would very in either direction - quoted or none). I wonder if there was a release out there where this may have been broken in MySQL.
Jamie Holly http://www.intoxination.net http://www.hollyit.net
Earnie Boyd wrote:
Quoting Nancy Wichmann <nan_wich@bellsouth.net>:
Pierre Rineau wrote:
If you test a integer field, use no quotes, if you test a varchar field, use quotes, that's it. Your DBMS will be smarter than you (except MySQL, but for kitten sake, do standard SQL, it will be easier for PostgreSQL user like me to port your code if needed).
Then why, Pierre, is the quoted version faster, by far? At least in MySql.
Please stop disparaging MySql. The vast majority of sites use it, so it can't be all bad.
It makes no sense to me why a quoted numeric string is faster than a native integer unless the database engine is broken. Not talking about MySQL vs anything else but integer comparison should be faster than string comparison. My guess is that the database engine is converting the integer to string when storing it in the index file but I have no proof of that. Have you tried your testing with InnoDB?
-- Earnie -- http://r-feed.com/ -- http://for-my-kids.com/ -- http://www.4offer.biz/ -- http://give-me-an-offer.com/
Earnie Boyd wrote:
It makes no sense to me why a quoted numeric string is faster than a native integer unless the database engine is broken.
It is possible that the site has a less than optimal version of MySql, although I know they are fairly current. At the moment, I can't remember the actual query but I do remember it surprised me a lot. It was on something like 60K rows. I also remember that quoted made it use the index, while the unquoted did not. The funny thing is that, like everyone here gas said, the standard access throughout the site is unquoted. I may try to go in later and recreate the problem. Nancy E. Wichmann, PMP Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
Le mercredi 26 août 2009 à 08:30 -0400, Jamie Holly a écrit :
When writing queries they should be targeted for performance on all supported RDBMS, yet the emphasis should be put on MySQL. The simple fact that a vast (vast) majority of Drupal sites out there run MySQL is reason enough for that, and until there is some major cosmic shift in what hosting companies offer that's going to remain the fact.
Dear Jamie, I have been rewriting a generic guide for both MySQL and PostrgeSQL: Guidelines for writing efficient SQL code http://drupal.org/node/559302 These are very generic guidelines explaining why you should always log server-side queries, use Devel module and ANALYSE your queries. I also explain the notion of sequential scan and CPU time needed. Then I analyse simple facts using EXPLAIN.
Also on indexes you say "Therefore the proposed solution to enhance Drupal is to hunt for multi-column indexes and replace them with single indexes.". MySQL makes very effecient use of multi column indexes. On countless queries in Drupal changing to only single column indexes would result in full table scans, temporary tables and filesorts in numerous places.
About dual-field indexing: PostgreSQL and MySQL work the same way. Some databases claim that a dual index works equally right and left. This is not portable and probably not true, as multi-column indexing may result in a sequential scan in complex queries: Reference: According to http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html: MySQL cannot use an index if the columns do not form a leftmost prefix of the index. http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index. Therefore, when needed, it could be needed to add single indexes AND keep dual indexes.
Do LEFT JOINS outperform INNER JOINS? Yes Do INNER JOINS outperform LEFT JOINS? Yes
I don't believe so because of the nature of INNER JOINs which remove NULL rows and thus consume more CPU time. They only work faster on indexes on dual-fields when used properly. This needs to be studied using ANALYSE on very large SQL servers, not simple installs. Drupal website database probably is a good testing system. I hope that you can comment my code and improve the English and technical aspects. Kind regards, Jean-Michel
Dear Jean-Michel, Here http://drupal.org/node/559474 I think that is something wrong.. Say that the example: INNER JOINS .. "takes 400 ms" LEFT JOINS .. "executes in 60 second" 60 second isn't correct for sure.. -- ------------------------------------- Marco Sousa
Le mercredi 26 août 2009 à 16:17 +0200, Marco Sousa a écrit :
Here http://drupal.org/node/559474 I think that is something wrong..
Say that the example: INNER JOINS .. "takes 400 ms" LEFT JOINS .. "executes in 60 second"
60 second isn't correct for sure..
Dear Marco. 60 ms, thanks.
A good decision on wether to use multi-column indexes will take into account the cardinality of the data. Sometimes a multi-column index is simply the right answer, particularly if the values in the first column of the index do not limit the rows sufficiently. Also in where exists clauses the rdbms will not even load the row if the primary key is the composite index, but it will have to. Indexes are best thought of like caches, You use the right caching mechanism for the job, and you only do it when you can demonstrate performance improvement. The mysql reference is true but not a matter of concern in a large number of cases when composite keys are being used. Analyising complex queries is a good idea, but hunting out and eliminating multi-column indexes doesn't seem helpful. On Aug 26, 2009, at 6:40 AM, Jean-Michel Pouré wrote:
Le mercredi 26 août 2009 à 08:30 -0400, Jamie Holly a écrit :
When writing queries they should be targeted for performance on all supported RDBMS, yet the emphasis should be put on MySQL. The simple fact that a vast (vast) majority of Drupal sites out there run MySQL is reason enough for that, and until there is some major cosmic shift in what hosting companies offer that's going to remain the fact.
Dear Jamie,
I have been rewriting a generic guide for both MySQL and PostrgeSQL: Guidelines for writing efficient SQL code http://drupal.org/node/ 559302
These are very generic guidelines explaining why you should always log server-side queries, use Devel module and ANALYSE your queries.
I also explain the notion of sequential scan and CPU time needed.
Then I analyse simple facts using EXPLAIN.
Also on indexes you say "Therefore the proposed solution to enhance Drupal is to hunt for multi-column indexes and replace them with single indexes.". MySQL makes very effecient use of multi column indexes. On countless queries in Drupal changing to only single column indexes would result in full table scans, temporary tables and filesorts in numerous places.
About dual-field indexing: PostgreSQL and MySQL work the same way.
Some databases claim that a dual index works equally right and left. This is not portable and probably not true, as multi-column indexing may result in a sequential scan in complex queries:
Reference: According to
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html:
MySQL cannot use an index if the columns do not form a leftmost prefix of the index.
http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
Therefore, when needed, it could be needed to add single indexes AND keep dual indexes.
Do LEFT JOINS outperform INNER JOINS? Yes Do INNER JOINS outperform LEFT JOINS? Yes
I don't believe so because of the nature of INNER JOINs which remove NULL rows and thus consume more CPU time. They only work faster on indexes on dual-fields when used properly.
This needs to be studied using ANALYSE on very large SQL servers, not simple installs. Drupal website database probably is a good testing system.
I hope that you can comment my code and improve the English and technical aspects.
Kind regards, Jean-Michel
Le mercredi 26 août 2009 à 07:25 -0700, David Metzler a écrit :
A good decision on wether to use multi-column indexes will take into account the cardinality of the data. Sometimes a multi-column index is simply the right answer, particularly if the values in the first column of the index do not limit the rows sufficiently. Also in where exists clauses the rdbms will not even load the row if the primary key is the composite index, but it will have to. Indexes are best thought of like caches, You use the right caching mechanism for the job, and you only do it when you can demonstrate performance improvement.
Thanks David, I added this comment on the page, which is a perfect summary. Kind regards, Jean-Michel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jean-Michel Pouré schrieb:
Dear Friends,
I welcome your comments on this short guide for writing efficient SQL code in Drupal: http://drupal.org/node/559302
Feel free to start hunting for speed on the two issues mentioned (logging and LEFT JOINs). I will add several tips in the next days.
The forum code proved to be heavily broken and I could debug it in 4 hours. So it is very likely that the entire Drupal 6.dev is at stake. Probably Drupal 7 alpha too.
Because Drupal makes little use of LEFT JOINs in the code, it is likely that it cannot scale well. Hence a reputation to be slow. But it can change rapidly if you fix the code and apply the tips.
I will be back tomorrow with more tips. Hunt for speed now!
Your "tips" are mostly garbage. They may help certain queries under certain RDBMSses, but are not generally applicable. Cheers, Gerhard, who actually spent some time to _remove_ LEFT JOINS in Drupal 4.x because they were slower... P.S.: If you ever mention something like "we should leave mysql and go pgsql only because mysql is not a serious database" again, I'll block your account on d.o. There's enough stupid in the world, we don't need to give it a place on d.o. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkqUgOkACgkQfg6TFvELooRolACfbCk/RHEKDIEM8kGWOi4lEE3l 0vgAoLjKQ61D57Lc6l8mAnOvC53COsmj =vq0l -----END PGP SIGNATURE-----
Le mercredi 26 août 2009 à 02:25 +0200, Gerhard Killesreiter a écrit :
Your "tips" are mostly garbage. They may help certain queries under certain RDBMSses, but are not generally applicable.
I am removing specific MySQL and PostgreSQL information: http://drupal.org/node/559302 First page gives information about misleading Drupal stats: http://drupal.org/node/559320 Second page is a short introduction to EXPLAIN ANALYSE: http://drupal.org/node/559720 Third page is an introduction to using LEFT JOINs in place of INNER JOINs: http://drupal.org/node/559474 (I don't believe that INNER JOINs are faster than LEFT JOINs because an INNER JOIN removes NULL values and thus processes using sequential scans internaly) Indexes on dual fields may result in slow queries http://drupal.org/node/555558 Avoid Distinct on JOINS whenever possible Avoid using DISTINCT in JOINs whenever possible and write straight JOINs Kind regards, Jean-Michel
participants (11)
-
David Metzler -
Domenic Santangelo -
Earnie Boyd -
Gerhard Killesreiter -
Jamie Holly -
Jean-Michel Pouré -
larry@garfieldtech.com -
Marco Sousa -
Nancy Wichmann -
Narayan Newton -
Pierre Rineau