Re: [development] Database / SQL future thoughts
Perhaps before this discussion gets too far astray -- or maybe such digression is impossible to avoid! :-) My point in posting this was to elicit folks who were interested in the database usage and performance for the core parts of Drupal, and in particular the folks who I through the past years have observed as being especially astute and involved (and if I left your name off that short list, it's only because I forgot your name or other such failure on my part -- no slight or insult intended). I was particularly hopeful that folks would read the comments, and take to heart those written in favor of SQL, such as those by Robert Young. I didn't want to bias the conversation by injecting my opinion too early, but I'm in the camp that says things like BigTable are really just reinventions of older technology, AND that the death of RDBMS/SQL is not nearly as soon as some might think. For Drupal, I think it may make sense to attempt to make the SQL as simple as possible, avoiding the really complex capabilities and doing more of that work in PHP, where most of the developers actually have some expertise. This is especially true since MySQL's MyISAM table engine is really poor at doing anything beyond that. If we were to move to using MVCC[1] engines like Postgres, Oracle and MySQL with InnoDB or Falcon, then more complex SQL would be useful. Unfortunately, most Drupal developers don't have the SQL skills needed, and even worse, most people are not going to have the ability to physically optimize the database for performance (i.e. skilled DBA person, access to MySQL settings at most hosting services, etc.). Hence, the former course of using only the simplest, most primitive SQL statements appeals to me for Drupal. It's that last point I'd really like to have criticized. Is it possible to do that? Especially in light of DBTNG, where is the line between simple statements and complex statements? Lastly, although many Drupal-based sites are database limited, many are PHP-code interpretation (# of files/lines parsed) or bandwidth (fast JS/image pages) limited, too. My goal is simple: make my hundred module websites respond in less than 2 seconds! :-D Thanks for participating in this discussion, no matter which tangent you're on. ..chris [1] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
Chris Johnson wrote: *snip*
Hence, the former course of using only the simplest, most primitive SQL statements appeals to me for Drupal.
It's that last point I'd really like to have criticized. Is it possible to do that? Especially in light of DBTNG, where is the line between simple statements and complex statements?
Simple vs. complex is not a binary statement anyway, but a continuum. In D7, some queries are more verbose than they used to be (complex update statements, for instance) but a lot of more horrifically complex queries are now quite easy. So we've effectively flattened the curve and moved most DB interaction into the "Can't click and drool but you can do a lot more" realm. The other advantage is that we have, at least, points at which DB-specific optimizations can be made. Mostly we're only doing that for modifier queries (insert and merge, specifically), because that's where the SQL syntax differs the most. Restructuring a SELECT statement is an order of magnitude harder, and as the article notes much of it has to be done on the DB side anyway, not the application side. (Some queries simply cannot avoid filesort in MySQL; other DBs may offer workarounds.) The other catch is that Drupal's heavily-modifiable design often precludes "one query to rule them all" design, as that requires knowing everything you'll need before hand. With modules and hooks, you almost never do. See also this great read: http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx Really, I see "object store" databases (CouchDB and friends) as simply solving a different problem space that formerly was crammed into SQL out of "it's the only tool available". RDBMSs are still around, and will remain around, for a LOOOONG time. Just look at a typical file system on your computer. I'd bet good money it's hierarchical, despite the supposed advantages of "huge-data-soup-with-craploads-of-indexes" architecture. However, that exact architecture ran the Palm OS for years. The author may or may not be right about Oracle and MySQL, but either way I'm not worried that SQL databases will go away any time in the next decade. They may have reached a maturity point and not change dramatically (dear god I hope so), but SQL is not going anywhere. There may be advantages to having an "object store" general front-end that can then be connected to CouchDB or similar (hm, another handlers use case...), but that's separate from having an SQL interface in core. They're just different tools for different problems. For most of our problems, simple relational models are perfectly fine.
Lastly, although many Drupal-based sites are database limited, many are PHP-code interpretation (# of files/lines parsed) or bandwidth (fast JS/image pages) limited, too. My goal is simple: make my hundred module websites respond in less than 2 seconds! :-D
With the registry now in place we really need to start properly splitting up core modules to reduce code weight. :-) --Larry Garfield
Hey Chris, I'm probably not the person you aimed this discussion at but I do have something that might be relevant. (I'm just a lowly contrib developer, but I do have a few years of drupal dev under my belt). If I understand correctly where Drupal is going with storage abstraction, I can't see Bigtable as a viable replacement for SQL, so I think I'm in the same camp as you, but I do see folks starting to get interested in using the right storage mechanism for the right task. The PDO SQLLite discussion we had a while ago was I think somewhat analogous. (although admittedly not the same) I could see using some kind of BigTable style storage for storing the cached data for drupal and therefore scaling cached pages across servers. Then the cache wouldn't rely strongly on the important relational data in a backend mysql or postgres engine. This seems like a productive use of those kind of storage techniques. I could even see caching common search results using the cache api in the search module (if it doesn't already). There are many in this community who will, rightly so, express concern about the added complexity of relying on multiple storage engines, but this is one case where I think the results might be worth it. Dave On May 5, 2009, at 9:23 AM, Chris Johnson wrote:
Perhaps before this discussion gets too far astray -- or maybe such digression is impossible to avoid! :-)
My point in posting this was to elicit folks who were interested in the database usage and performance for the core parts of Drupal, and in particular the folks who I through the past years have observed as being especially astute and involved (and if I left your name off that short list, it's only because I forgot your name or other such failure on my part -- no slight or insult intended).
I was particularly hopeful that folks would read the comments, and take to heart those written in favor of SQL, such as those by Robert Young. I didn't want to bias the conversation by injecting my opinion too early, but I'm in the camp that says things like BigTable are really just reinventions of older technology, AND that the death of RDBMS/SQL is not nearly as soon as some might think.
For Drupal, I think it may make sense to attempt to make the SQL as simple as possible, avoiding the really complex capabilities and doing more of that work in PHP, where most of the developers actually have some expertise. This is especially true since MySQL's MyISAM table engine is really poor at doing anything beyond that. If we were to move to using MVCC[1] engines like Postgres, Oracle and MySQL with InnoDB or Falcon, then more complex SQL would be useful. Unfortunately, most Drupal developers don't have the SQL skills needed, and even worse, most people are not going to have the ability to physically optimize the database for performance (i.e. skilled DBA person, access to MySQL settings at most hosting services, etc.).
Hence, the former course of using only the simplest, most primitive SQL statements appeals to me for Drupal.
It's that last point I'd really like to have criticized. Is it possible to do that? Especially in light of DBTNG, where is the line between simple statements and complex statements?
Lastly, although many Drupal-based sites are database limited, many are PHP-code interpretation (# of files/lines parsed) or bandwidth (fast JS/image pages) limited, too. My goal is simple: make my hundred module websites respond in less than 2 seconds! :-D
Thanks for participating in this discussion, no matter which tangent you're on.
..chris
[1] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
On Tuesday 05 May 2009 8:39:08 pm David Metzler wrote:
Hey Chris,
I'm probably not the person you aimed this discussion at but I do have something that might be relevant. (I'm just a lowly contrib developer, but I do have a few years of drupal dev under my belt).
If I understand correctly where Drupal is going with storage abstraction, I can't see Bigtable as a viable replacement for SQL, so I think I'm in the same camp as you, but I do see folks starting to get interested in using the right storage mechanism for the right task. The PDO SQLLite discussion we had a while ago was I think somewhat analogous. (although admittedly not the same)
I could see using some kind of BigTable style storage for storing the cached data for drupal and therefore scaling cached pages across servers. Then the cache wouldn't rely strongly on the important relational data in a backend mysql or postgres engine. This seems like a productive use of those kind of storage techniques. I could even see caching common search results using the cache api in the search module (if it doesn't already).
There are many in this community who will, rightly so, express concern about the added complexity of relying on multiple storage engines, but this is one case where I think the results might be worth it.
Dave
In many many cases, the storage engine depends on your scale. For the run of the mill Drupal site, for instance, MySQL-based caching is perfectly fine and is also the lowest barrier to entry (since the main system runs on SQL). For higher end sites, memcache, CouchDB, Flatfile, or whatever else could make more sense. Fortunately those high end sites are also the ones run by people who, if we provide them with proper mechanisms to swap out backends cleanly, would know how to do so and why. So we just need to make more key parts of Drupal pluggable and those who actually care about such differences can leverage them as appropriate, and those who don't need to care won't have to. -- Larry Garfield larry@garfieldtech.com
On May 5, 2009, at 9:55 PM, Larry Garfield wrote:
So we just need to make more key parts of Drupal pluggable and those who actually care about such differences can leverage them as appropriate, and those who don't need to care won't have to.
This principle is probably important enough to be enshrined as one of Drupal's core values. --Jeff
On Tue, May 5, 2009 at 10:40 PM, Jeff Eaton <jeff@viapositiva.net> wrote:
On May 5, 2009, at 9:55 PM, Larry Garfield wrote:
So we just need to make more key parts of Drupal pluggable and those who actually care about such differences can leverage them as appropriate, and those who don't need to care won't have to.
This principle is probably important enough to be enshrined as one of Drupal's core values.
--Jeff
Excellent point, Jeff. If we provide pluggable subsystems and good APIs, then people can use the "right tool" for the job. I'd like to learn more about how CouchDB and Tokyo Cabinets can be scaled horizontally more easily than SQL. As many of us know, it's not easy with engines like MySQL and Postgres. Surely other technologies would face similar hurdles to ensuring coherency and reliability of the data, and hence would likewise have technical challenges in scaling -- wouldn't they? Thanks for directing attention to this presumed advantage, Kyle. David Metzler -- everyone is welcome to join the conversation, of course -- else why post to a list with thousands of subscribers? Thanks for your input. ..chris
Hi guys, you find a outstanding presentation about the usage of datatypes, storage engines, replication, query optimization .... from the ZendCon2008 http://devzone.zend.com/article/4497 includes slides and audio. Best Thomas Zahreddin With more Code than you can handle, don't ask for more code, ask for more cooperation.
participants (6)
-
Chris Johnson -
David Metzler -
Jeff Eaton -
Larry Garfield -
larry@garfieldtech.com -
Thomas Zahreddin