Hello, A while ago, someone (not me!) brought down our collocation server with a home made script which had many queries like this: SELECT * FROM ... I.e. the script was indiscriminately selecting everything from the table even though he might have needed only a few fields. This created heavy traffic between the SQL server and the web server which overwhelmed the platform. To be honest, I don't know the details (he might have been needlessly transferring whole images stored in the DB or something like this!) In Drupal 6.9, I count 120 occurrences of SELECT * FROM ... , not including SELECT COUNT(*) but including variations of SELECT n.*, pi.* FROM ... . Wouldn't it be a good idea to explicitly tell which fields are needed, in order to minimize traffic between SQL server and web server? The SQL coding standards say nothing about this: http://drupal.org/node/2497 Should SELECT * FROM be considered a kind of bug? Should this whole discussion be a 'won't fix' ? (why?) Or is this something worth considering in the issue queue? Blessings, Augustin.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, No actually in e-Commerce I have set standards where we do this where ever we can. this means that with very little code an external module can extend a table. Give that we use drupal_write_record() to write all records someone can use the hook_schema_alter() and the hook_form_alter() to add columns to a table. Take the ec_product table, we do a SELECT * from {ec_product} WHERE vid = %d. Then using the hook_form_alter() to add additional fields to the form to capture data. Other than validating that the field is correct when e-Commerce calls drupal_write_record() then entire node is past, which means no additional processing needs to be done, and the new custom fields will get saved as required. But you can have problems like what you describe is there number of columns is a large number. So if you are joining 20 odd tables with a SELECT * to return all fields can cause problems. It is really up to the developer to make sure they implement it correctly. Gordon. On 13/02/2009, at 4:08 PM, augustin (beginner) wrote:
Hello,
A while ago, someone (not me!) brought down our collocation server with a home made script which had many queries like this:
SELECT * FROM ...
I.e. the script was indiscriminately selecting everything from the table even though he might have needed only a few fields. This created heavy traffic between the SQL server and the web server which overwhelmed the platform. To be honest, I don't know the details (he might have been needlessly transferring whole images stored in the DB or something like this!)
In Drupal 6.9, I count 120 occurrences of SELECT * FROM ... , not including SELECT COUNT(*) but including variations of SELECT n.*, pi.* FROM ... .
Wouldn't it be a good idea to explicitly tell which fields are needed, in order to minimize traffic between SQL server and web server?
The SQL coding standards say nothing about this: http://drupal.org/node/2497
Should SELECT * FROM be considered a kind of bug? Should this whole discussion be a 'won't fix' ? (why?) Or is this something worth considering in the issue queue?
Blessings,
Augustin.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAkmVA3cACgkQngavurZvkryJJwCgip4b17BquRwlmoLUeMX3ZrKK yGkAmQGe4YusD6sU9umm0E1AEJvdgu5M =RKUH -----END PGP SIGNATURE-----
I would say "SELECT * FROM ..." is not a bug. While it wise to consider selecting only a set of fields if that is all that is needed, there are times when one wants all the fields in the table. This is particularly true when building an object/array from data in the database where on row represents all or part of the object/array. A good example are node objects where as developers we expect it to completely represent the node. Nevets
SELECT * is slower than listing out fields explicitly, because the database then needs to take the time to expand * out into the available fields before compiling the query. However, it's not so much slower that its use would kill the DB server unless you have a table with hundreds of fields that you're not using (or lots and lots of very large BLOB or LOB fields), in which case you have a bad database design to start with. So I don't believe that SELECT * would be solely responsible for melting your server. That said, for both that performance reason and for maintainability/readability of the code I generally discourage SELECT *, and try to avoid it myself. The D7 database layer allows it in static queries (which are almost entirely un-interpolated) and has a mechanism for supporting them in dynamic queries (fields('tablename') with no second parameter), but I still avoid it myself and encourage others to do so as well. So no, not a bug, but I do agree with discouraging it in most cases. The only time you'd want it is if you have a table where you expect the columns to be dynamic, which has all sorts of other maintenance issues anyway. On Thursday 12 February 2009 11:08:25 pm augustin (beginner) wrote:
Hello,
A while ago, someone (not me!) brought down our collocation server with a home made script which had many queries like this:
SELECT * FROM ...
I.e. the script was indiscriminately selecting everything from the table even though he might have needed only a few fields. This created heavy traffic between the SQL server and the web server which overwhelmed the platform. To be honest, I don't know the details (he might have been needlessly transferring whole images stored in the DB or something like this!)
In Drupal 6.9, I count 120 occurrences of SELECT * FROM ... , not including SELECT COUNT(*) but including variations of SELECT n.*, pi.* FROM ... .
Wouldn't it be a good idea to explicitly tell which fields are needed, in order to minimize traffic between SQL server and web server?
The SQL coding standards say nothing about this: http://drupal.org/node/2497
Should SELECT * FROM be considered a kind of bug? Should this whole discussion be a 'won't fix' ? (why?) Or is this something worth considering in the issue queue?
Blessings,
Augustin.
-- Larry Garfield larry@garfieldtech.com
On Fri, 13 Feb 2009 01:01:55 -0600 Larry Garfield <larry@garfieldtech.com> wrote:
So no, not a bug, but I do agree with discouraging it in most cases. The only time you'd want it is if you have a table where you expect the columns to be dynamic, which has all sorts of other maintenance issues anyway.
+1 It's always a risky bargain between extensibility, maintenance, performance and clean design. Tables should be as small as possible. It helps to keep a clear hierarchy of data and helps performances because it exploit more locality of the data. *Somehow* good design tend to produce queries that return most of the columns. When you extend tables defined upstream it means you don't trust upstream design decisions (and it will bite you) or you're very very tight with resources and you've been bitten by performance problems. Ideally I don't want to know why Drupal core decided to put those columns in users. I'd expect they made a general choice that help extend the user object and keep the base user object as thin as possible. I don't want to mess with their decisions, their API, their responsibility of the cost of building up a base user object, upgrading it to newer Drupal versions... *Dynamically* extended tables makes the cost of a service unpredictable. Service x add column A, B, C. Service y add column D, E, F. ... wooohp. Explicitly naming columns make refactoring apparently more boring but definitively more under control. It promotes better coding making people more aware of what they're doing. It makes easier to deploy tests and it makes easier to find places where you've to refactor. As with set/get methods in OOP it makes easier to hide where data are actually coming from. It gives an order to columns making select easier to reuse correctly. Not only it helps to spot errors, but it spots them earlier. The first time you'll have to refactor 100 places where you had to add a column (say you moved from name -> name + surname) it will give you a clue there is something that doesn't work with your design... but at least it will tell you all the 100 places you've to refactor. Every time you write a SELECT * it is a missed chance to use grep as a debugging tool. This alone is worth getting rid of all the SELECT * and even n.* And if it is not for debugging purposes it helps to understand the code (especially for contrib). Where does this stuff come from/is used...? let's grep it. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Should SELECT * FROM be considered a kind of bug?
I personally consider it a bug, but I still haven't yet trained my fingers to do it the proper way. I subscribe to the logic from: http://www.parseerror.com/sql/select*isevil.html -- Morbus Iff ( i'm the droid you're looking for ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
I think it's a bug to do it automatically - I think it's a good default to list the fields you need. But, if you have a reason to use SELECT * FROM... then definitely use it. -- John Fiala
Although there are merits in this argument, I wouldn't classify it as a bug. There's a lot of generic/abstracted data loading, and as long as there's code at there that dynamically adds columns, select * actually is the sanest way to do things. Also, the performance costs are database dependent. Mysql may work one way, but other db's work another. Any DB with precompiled cached queries is not going to carry significant parsing overhead. The amount of data transfered in most cases is dependent on the size of the data in the column and not the number of columns. So in some tables it may make sense. You might get the name of the image file in drupal, but you're not likely to get the image. Finally in my experience, most database performance problems lie in what is in the WHERE or JOIN, and not what's in the column list. Dave On Feb 13, 2009, at 4:37 AM, Morbus Iff wrote:
Should SELECT * FROM be considered a kind of bug?
I personally consider it a bug, but I still haven't yet trained my fingers to do it the proper way. I subscribe to the logic from:
http://www.parseerror.com/sql/select*isevil.html
-- Morbus Iff ( i'm the droid you're looking for ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
I give *little* regard for performance (in this case), and everything for my *expectations* and *documentation*. I could care less if it's faster or not. I care more about clarity. David Metzler wrote:
Although there are merits in this argument, I wouldn't classify it as a bug. There's a lot of generic/abstracted data loading, and as long as there's code at there that dynamically adds columns, select * actually is the sanest way to do things.
Also, the performance costs are database dependent. Mysql may work one way, but other db's work another. Any DB with precompiled cached queries is not going to carry significant parsing overhead. The amount of data transfered in most cases is dependent on the size of the data in the column and not the number of columns. So in some tables it may make sense. You might get the name of the image file in drupal, but you're not likely to get the image.
Finally in my experience, most database performance problems lie in what is in the WHERE or JOIN, and not what's in the column list.
-- Morbus Iff ( masochism-oriented recombinant bot (unlisted series) ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
I'm with Morbus on the clarity argument. And I also argue for better performance. It's not just the extra time the server needs to find the column names, but all of the extra data being buffered and sent over the wire. On large sites with load-balanced web server front ends, and mirrored or slaved database servers on the backend, operations people are usually trying to squeeze out every inch of performance possible from the site/database combined performance. Even though it may add only a millisecond per operation, when there are millions of operations being done, it quickly adds up. I'm well aware of the arguments against premature optimization, but given the arguments which Morbus presented, there's very little reason to not avoid "SELECT * " except when absolutely needed. ..chris On Fri, Feb 13, 2009 at 9:15 AM, Morbus Iff <morbus@disobey.com> wrote:
I give *little* regard for performance (in this case), and everything for my *expectations* and *documentation*. I could care less if it's faster or not. I care more about clarity.
David Metzler wrote:
Although there are merits in this argument, I wouldn't classify it as a bug. There's a lot of generic/abstracted data loading, and as long as there's code at there that dynamically adds columns, select * actually is the sanest way to do things.
Also, the performance costs are database dependent. Mysql may work one way, but other db's work another. Any DB with precompiled cached queries is not going to carry significant parsing overhead. The amount of data transfered in most cases is dependent on the size of the data in the column and not the number of columns. So in some tables it may make sense. You might get the name of the image file in drupal, but you're not likely to get the image.
Finally in my experience, most database performance problems lie in what is in the WHERE or JOIN, and not what's in the column list.
-- Morbus Iff ( masochism-oriented recombinant bot (unlisted series) ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
I would really like to hear David Strauss's comments on this thread. David, if you are out there, care to chime in? -Mike __________________ Michael Prasuhn mike@mikeyp.net http://mikeyp.net
I would really like to hear David Strauss's comments on this thread. David, if you are out there, care to chime in?
There could be only one reason to chime his name, and that would be for his inner knowledge of MySQL and the effects this may have on perf. While I'm sure I'll learn something on a tangential curve, it'd have little effect on why /I/ consider them not OK. -- Morbus Iff ( cheese and rice saves ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
One way of both specifying the columns, and accounting for dynamic schema, is http://api.drupal.org/api/function/drupal_schema_fields_sql - which you can then build your query from. I'd imagine the hit to the schema is going to cost more than the MySQL internals for SELECT *, but otherwise it lets you get the best of both worlds in terms of correctness. Nat
Thanks everybody for your comments. I'm glad I asked! It seems that there is a small consensus to at least discourage its use where it's not actually necessary. Given the response so far, there is ground for further discussion on this topic. Thus, I have created a new sub-page to the SQL coding standard here: http://drupal.org/node/374660 Obviously, it needs to be completed. Also, I have counted the number of occurrences of SELECT * or SELECT x.* in Drupal 7 (HEAD), and I found almost 150 of them! So, this is the issue to refine the SQL coding standard by discussing actual code examples found in Drupal core: http://drupal.org/node/374661 Blessings, Augustin.
I would definitely agree here. Selecting from watchdog definitely deserves to have the columns listed. Selecting from taxonomies and categories are the same. But there are places in CCK and views that select * might be considered more appropriate. As a developer I'd love to see such comments in a patch review or code review, but I wouldn't love to see a bunch of issues filed because someone grepped my code for select *. I was really pointing out the fallacy of the select * = poor performance argument, and that you might do as well to find out how many ROWS were returned by the select statement in question, plus start searching for outer joins ,etc. And don't event get me started on insert statements without column lists.. dave On Feb 13, 2009, at 7:15 AM, Morbus Iff wrote:
I give *little* regard for performance (in this case), and everything for my *expectations* and *documentation*. I could care less if it's faster or not. I care more about clarity.
David Metzler wrote:
Although there are merits in this argument, I wouldn't classify it as a bug. There's a lot of generic/abstracted data loading, and as long as there's code at there that dynamically adds columns, select * actually is the sanest way to do things. Also, the performance costs are database dependent. Mysql may work one way, but other db's work another. Any DB with precompiled cached queries is not going to carry significant parsing overhead. The amount of data transfered in most cases is dependent on the size of the data in the column and not the number of columns. So in some tables it may make sense. You might get the name of the image file in drupal, but you're not likely to get the image. Finally in my experience, most database performance problems lie in what is in the WHERE or JOIN, and not what's in the column list.
-- Morbus Iff ( masochism-oriented recombinant bot (unlisted series) ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
One additional (bugworthy) note: Using SELECT * on a {node} query breaks db_rewrite_sql() in Drupal 5 and Drupal 6. [1] This opens a quasi-security hole where content may be shown to non-privileged users. This is due to the fact that node_db_rewrite_sql() adds 'distinct' == TRUE, and you cannot run DISTINCT(*) queries. So, in the case of node list queries, SELECT * is _definitely_ a bug. - Ken Rickard Palantir.net [1] -- http://drupal.org/node/324070 On Fri, Feb 13, 2009 at 10:23 PM, David Metzler <metzlerd@metzlerd.com> wrote:
I would definitely agree here. Selecting from watchdog definitely deserves to have the columns listed. Selecting from taxonomies and categories are the same. But there are places in CCK and views that select * might be considered more appropriate. As a developer I'd love to see such comments in a patch review or code review, but I wouldn't love to see a bunch of issues filed because someone grepped my code for select *.
I was really pointing out the fallacy of the select * = poor performance argument, and that you might do as well to find out how many ROWS were returned by the select statement in question, plus start searching for outer joins ,etc.
And don't event get me started on insert statements without column lists..
dave
On Feb 13, 2009, at 7:15 AM, Morbus Iff wrote:
I give *little* regard for performance (in this case), and everything for my *expectations* and *documentation*. I could care less if it's faster or not. I care more about clarity.
David Metzler wrote:
Although there are merits in this argument, I wouldn't classify it as a bug. There's a lot of generic/abstracted data loading, and as long as there's code at there that dynamically adds columns, select * actually is the sanest way to do things. Also, the performance costs are database dependent. Mysql may work one way, but other db's work another. Any DB with precompiled cached queries is not going to carry significant parsing overhead. The amount of data transfered in most cases is dependent on the size of the data in the column and not the number of columns. So in some tables it may make sense. You might get the name of the image file in drupal, but you're not likely to get the image. Finally in my experience, most database performance problems lie in what is in the WHERE or JOIN, and not what's in the column list.
-- Morbus Iff ( masochism-oriented recombinant bot (unlisted series) ) Technical: http://www.oreillynet.com/pub/au/779 Enjoy: http://www.disobey.com/ and http://www.videounderbelly.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
-- Ken Rickard agentrickard@gmail.com http://ken.therickards.com
participants (12)
-
augustin (beginner) -
Chris Johnson -
David Metzler -
Gordon Heydon -
Ivan Sergio Borgonovo -
John Fiala -
Ken Rickard -
Larry Garfield -
Michael Prasuhn -
Morbus Iff -
Nathaniel Catchpole -
Steve Ringwood