Referential integrity -- finally?

newer
Re: [development] Drag and Drop UI...

Dries Buytaert

19 Jan 2007 19 Jan '07

8:45 p.m.

Hello world, Fact: Drupal 6 will finally drop support for MySQL 3.x. The good news is that we can _finally_ start caring about referential integrity. As of MySQL 4, you can define foreign keys, and even though they don't actually work with MyISAM tables they would work with InnoDB tables. Plus, the PostgreSQL community would love us for it. To make a long story short: more DBAs and/or business critical websites would take us seriously. In other words: I'd happily accept patches that add foreign keys to Drupal tables. It woud not be perfect as we can't use cascading deletes, for example. Nevertheless, it would be a great step in the right direction -- assuming we can make it all work in a SQL- compliant way without leaving MyISAM users in the cold. If you don't know what referential integrity is, please read up on it: http://www.databasejournal.com/features/mysql/print.php/2248101 http://www.databasejournal.com/features/mysql/print.php/ 10897_2248101_2 If you want to make a contribution to Drupal but you don't know where to start, this would be a great project to wetten your toes in Drupal! -- Dries Buytaert :: http://www.buytaert.net/

Show replies by date

Morbus Iff

20 Jan 20 Jan

1 a.m.

New subject: [development] Referential integrity -- finally?

...

In other words: I'd happily accept patches that add foreign keys to Drupal tables. It woud not be perfect as we can't use cascading deletes, for example. Nevertheless, it would be a great step in the right direction -- assuming we can make it all work in a SQL-compliant way without leaving MyISAM users in the cold.

I'm personally not a fan of it, primarily because it caused me headaches in the past. I had to deal with a rather large import to CiviCRM, which is all InnoDB and foreign keys and, for speed reasons (2 million records), could not use their API. This forced me to add the records to the tables manually. And, since the import data was always changing, I had to code the importer in such a way that it could start over - delete all the import data and prepare the database for a fresh import. Foreign keys added tons and tons of time to this - both for the import (though, I have no true facts to back this up) and for the actual deletion, since the rows had to be deleted in quite a specific way, otherwise the foreign keys would complain about splintered data. Likewise, all the foreign keys in CiviCRM actively caused lookups to slow down tremendously - for a 2 million person database (with roughly three tables having 2 million rows, and one having nearly 10 million), I eventually had to remove all the foreign key constraints (along with a number of other optimizations that were contributed back to dlobo and CiviCRM 1.5). To some degree, "referential integrity" sounds all awesome and "yay, save me from myself" but, really, Drupal's been working just fine without it, and I see no huge reason to just "add it on". And, for the sake of those not on infra, my comments from there: For what it's worth, I'm not a fan of InnoDB myself. A default configuration of InnoDB uses a single database file that can grow quite large (2GB+ for one of my CiviCRM installations), and then there's all the index-y and cache tables (whose purpose I never really grasped). Unless you have binary logging turned on (which, in turn, create more files), you have a single point of failure. Single points of failures are never a good thing. Yes, there's a configuration option in newer MySQLs to force InnoDB back to one file per table (as per MyISAM). Still, the likelihood of any of our common users being able to tweak that IS NULL. -- Morbus Iff ( i still fail to see what this has to do with morocco ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus

FGM

1:48 a.m.

New subject: [development] Referential integrity -- finally?

One thing I noticed when working with Firebird is for mass imports, it is almost always much faster to lock the DB, deactivate keys and indexes, perform the mass operations, and reactivate all once you're over: rebuilding the constraints once beats maintaining them on a large set of operations. I suspect the picture could look the same with MySQL. But of course, it would have to be benchmarked. ----- Original Message ----- From: "Morbus Iff" <morbus@disobey.com> To: <development@drupal.org> Sent: Saturday, January 20, 2007 1:00 AM Subject: Re: [development] Referential integrity -- finally? [...]

...

I'm personally not a fan of it, primarily because it caused me headaches in the past. I had to deal with a rather large import to CiviCRM, which is all InnoDB and foreign keys and, for speed reasons (2 million records), could not use their API. This forced me to add the records to the tables manually. And, since the import data was always changing, I had to code the importer in such a way that it could start over - delete all the import data and prepare the database for a fresh import. Foreign keys added tons and tons of time to this - both for the import (though, I have no true facts to back this up) and for the actual deletion, since the rows had to be deleted in quite a specific way, otherwise the foreign keys would complain about splintered data. [...]

Morbus Iff

2:04 a.m.

New subject: [development] Referential integrity -- finally?

...

One thing I noticed when working with Firebird is for mass imports, it is almost always much faster to lock the DB, deactivate keys and indexes, perform the mass operations, and reactivate all once you're over: rebuilding the constraints once beats maintaining them on a large set of operations.

I can attest to this, regardless of benchmarks :) -- Morbus Iff ( keep out of reach of children ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus

Khalid B

22 Jan 22 Jan

5:25 a.m.

New subject: [development] Referential integrity -- finally?

On 1/19/07, Morbus Iff <morbus@disobey.com> wrote:

...

...
One thing I noticed when working with Firebird is for mass imports, it is almost always much faster to lock the DB, deactivate keys and indexes, perform the mass operations, and reactivate all once you're over: rebuilding the constraints once beats maintaining them on a large set of operations.

I can attest to this, regardless of benchmarks :)

Me too. I recall in Oracle and Teradata (a datawarehousing DB), it was faster to drop the indexes altogether, do the mass inserts/updates/whatever, then rebuild the indexes after the fact.

Dries Buytaert

20 Jan 20 Jan

4:56 p.m.

New subject: [development] Referential integrity -- finally?

On 20 Jan 2007, at 01:00, Morbus Iff wrote:

...

To some degree, "referential integrity" sounds all awesome and "yay, save me from myself" but, really, Drupal's been working just fine without it, and I see no huge reason to just "add it on".

And then, the DBA said to the PHP developer: "Hey, my PHP code is spaghetti code, but it is working just fine. I see no huge reason to make it readable". Or says the security expert to the PHP developer: "Hey, I don't sanitize my inputs, but it is working just fine. I see no huge reason to make it secure". There are people that care deeply about clean and readable code, people that care deeply about secure code, and people that care deeply about the correctness of their data. When talking with people of the PostgreSQL community, for example, it is clear that they steer away from Drupal, because we don't take their industry serious. By adding support for referential integrity, we're reaching out to people that know more about databases. It wouldn't hurt to have some database experts in our community. -- Dries Buytaert :: http://www.buytaert.net/

Sammy Spets

21 Jan 21 Jan

8:02 a.m.

New subject: [development] Referential integrity -- finally?

I've been quiet on this topic thus far and figured that since i'm maintaining PostgreSQL for core, I might as well speak up. I agree that ref integrity is important for any respectable application. There are the occasions with Drupal (when developing mainly) where a query goes in with an ID of zero. It can go un-noticed for sometime as well. +1 to referential integrity. It'll involve a lot of effort to ensure queries are ordered correctly (most of them should be) and their success will need to be tested. No small feat. -- <spam> -- It's also funny to see how many people out there disregard PostgreSQL as a usable DBMS. They seem to think MySQL is the bee's knees of free DBMS's. True: they install MySQL at many ISPs. I believe the main reason for that initially was because of phpMyAdmin. phpPgAdmin now exists and allows people to gain access to the _better_ DBMS. As far as recommending a DBMS for new sites hosted on VPS's or dedicated boxes goes. I'd cheer out a very loud and unbiased "PostgreSQL". Last year saw many modules gain PostgreSQL capabilities. Some patches I submitted _still_ haven't been committed. Bummer eh? -- <spam> -- -- Sammy Spets Synerger Pty Ltd http://synerger.com On 20-Jan-07 16:56, Dries Buytaert wrote:

...

On 20 Jan 2007, at 01:00, Morbus Iff wrote:

...
To some degree, "referential integrity" sounds all awesome and "yay, save me from myself" but, really, Drupal's been working just fine without it, and I see no huge reason to just "add it on".

And then, the DBA said to the PHP developer: "Hey, my PHP code is spaghetti code, but it is working just fine. I see no huge reason to make it readable".

Or says the security expert to the PHP developer: "Hey, I don't sanitize my inputs, but it is working just fine. I see no huge reason to make it secure".

There are people that care deeply about clean and readable code, people that care deeply about secure code, and people that care deeply about the correctness of their data.

When talking with people of the PostgreSQL community, for example, it is clear that they steer away from Drupal, because we don't take their industry serious. By adding support for referential integrity, we're reaching out to people that know more about databases. It wouldn't hurt to have some database experts in our community.

-- Dries Buytaert :: http://www.buytaert.net/

adrian rossouw

22 Jan 22 Jan

11:31 a.m.

New subject: [development] Referential integrity -- finally?

On 20 Jan 2007, at 5:56 PM, Dries Buytaert wrote:

...

When talking with people of the PostgreSQL community, for example, it is clear that they steer away from Drupal, because we don't take their industry serious. By adding support for referential integrity, we're reaching out to people that know more about databases. It wouldn't hurt to have some database experts in our community.

Maintaining schemas is still a lot of work,. and i fear adding another layer of complexity to it will start making it unmanageable. Oracle, MSSQL, db2, mysql, sql lite, postgresql. This means we're going to be doing essentially the same work 6 times over. And that's only for core, the majority of contrib modules will still not work on other db's. I also seriously doubt that just adding referential integrity is going to draw lots of postgresql experts out of the ether.. -- Adrian

Sammy Spets

11:48 p.m.

New subject: [development] Referential integrity -- finally?

Good point Adrian, I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code. What do you guys think of this? -- Sammy Spets Synerger Pty Ltd http://synerger.com On 22-Jan-07 12:31, adrian rossouw wrote:

...

On 20 Jan 2007, at 5:56 PM, Dries Buytaert wrote:

When talking with people of the PostgreSQL community, for example, it is clear that they steer away from Drupal, because we don't take their industry serious. By adding support for referential integrity, we're reaching out to people that know more about databases. It wouldn't hurt to have some database experts in our community.

Maintaining schemas is still a lot of work,. and i fear adding another layer of complexity to it will start making it unmanageable.

Oracle, MSSQL, db2, mysql, sql lite, postgresql.

This means we're going to be doing essentially the same work 6 times over. And that's only for core, the majority of contrib modules will still not work on other db's.

I also seriously doubt that just adding referential integrity is going to draw lots of postgresql experts out of the ether..

-- Adrian

Earnest Berry III

23 Jan 23 Jan

12:02 a.m.

New subject: [development] Referential integrity -- finally?

I like it. Now we just need people to implement that layer. It would be nice if someone would create a script that would run through a set of tables and create that code for you. What I mean by this is, for me anyway, when I'm rapidly developing a module, especially one that has a lot of tables, the most tedious thing to do is go back, and place all my create statements in db_query, and in quotes, and then format it all for my install script. I think it would help module developers move to the new abstraction layer faster and easier it you could give it a list of tables, and it would create the php.install (the bulk of it anyway) for you. E.g. webhicks module builder. It builds the general frame work for you for a module. - Earnest Sammy Spets wrote:

...

Good point Adrian,

I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code.

What do you guys think of this?

Khalid B

2 a.m.

New subject: [development] Referential integrity -- finally?

There was a discussion a while back where Adrian proposed using array data to abstract the database creation. Each column and its attributes (number, text, ...etc.) will become like FormAPI is today. This removes the overhead of maintaining two (or more) separate schemas in the install. Here is the discussion http://lists.drupal.org/archives/development/2006-05/msg00456.html Here is the issue for it http://drupal.org/node/63049

adrian rossouw

8:49 a.m.

New subject: [development] Referential integrity -- finally?

On 23 Jan 2007, at 12:48 AM, Sammy Spets wrote:

...

I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code.

What do you guys think of this?

http://drupal.org/node/63049 Dries didn't like it. I seriously think we have no other choice.

Gerhard Killesreiter

1:04 p.m.

New subject: [development] Referential integrity -- finally?

Sammy Spets wrote:

...

Good point Adrian,

I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code.

What do you guys think of this?

Problem is that Dries doesn't like this _at_ _all_. :p His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL. While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore. Cheers, Gerhard

Derek Wright

6:14 p.m.

New subject: [development] Referential integrity -- finally?

On Jan 23, 2007, at 4:04 AM, Gerhard Killesreiter wrote:

...

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

the main problem with this argument is the there's no "Standard" in the "Standard Query Language" for table creation and schema modifications across all the databases. so, while anyone who knows any variant of SQL (preferably the ANSI standard) can write SELECT and UPDATE and JOIN and the rest, you *have* to have DB-specific knowledge to create tables and add/remove/alter indexes or columns. to me, the lack of a standard here is the central point -- if there was a standard a) our situation would be totally different and b) we wouldn't need an abstraction layer. IMHO, it's unwise to expect every drupal developer to learn all the DB-specific quirks of creating tables on *every* possible DB we're planning to support. furthermore, it's crazy to expect that they're going to be able to test on all of them, since for example, i have no interest in paying top dollar for an oracle license anytime soon. ;) but, if the people who care about each of these DBs can spend the time to write, test, and maintain a small layer of DB abstraction functions that at least properly create/modify schemas, and drupal developers mostly stick to ANSI SQL everywhere else, we're 95% done with supporting whatever DBs people want to use. -derek p.s. relatively speaking, table creation and schema modification is small percentage of the SQL queries in drupal. basically, it's everything in the .install files, but nothing in the .module or .inc files. so, having a reasonable DB abstraction layer for the .install files doesn't change the fact that most of drupal development can still happen in "raw" SQL (except, of course, for all the drupalisms we already have, like {table_name}, db_query() %d and %s handling, db_rewrite_sql(), etc). p.p.s. if people really don't want to be bothered, there'd be nothing stopping them (except peer pressure) from continuing to use DB- specific schema queries directly in their .install files. so, if new drupal developers are doing a custom module for a site that will only ever run on some DB, and they don't care about writing portable code, they don't have to learn anything new and there's no further barrier of entry. however, it they're going to become responsible maintainers of important contribs, they're already expected to learn more than nothing, and in this case, a layer of code that hides the ugly details about how oracle vs. postgres vs. mssql vs. sqlite vs. mysql vs. whatever creates tables would be welcomed as a blessing, not cursed as another weird API to learn.

Earnest Berry III

7:29 p.m.

New subject: [development] Referential integrity -- finally?

I like Derek's point. Most of this discussion is for minor/major module/contrib developers who want to on some level, support more DB's. Writing my own "Hello World" or "Quote of the Day" module for my own liking is not something that we are including in this discussion. So I would def. welcome the abstraction layer. Perhaps we should start getting some sort of road map together in this thread? I think we're all in agreement that: 1. RI should be included in core in 6 2. A new DB abstration layer for schema creation/modification should be created and/or made avail. 3. Go Colts! (oh, sorry, that's my own thing :), scratch #3) Derek Wright wrote:

...

On Jan 23, 2007, at 4:04 AM, Gerhard Killesreiter wrote:

...
His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

the main problem with this argument is the there's no "Standard" in the "Standard Query Language" for table creation and schema modifications across all the databases. so, while anyone who knows any variant of SQL (preferably the ANSI standard) can write SELECT and UPDATE and JOIN and the rest, you *have* to have DB-specific knowledge to create tables and add/remove/alter indexes or columns. to me, the lack of a standard here is the central point -- if there was a standard a) our situation would be totally different and b) we wouldn't need an abstraction layer.

IMHO, it's unwise to expect every drupal developer to learn all the DB-specific quirks of creating tables on *every* possible DB we're planning to support. furthermore, it's crazy to expect that they're going to be able to test on all of them, since for example, i have no interest in paying top dollar for an oracle license anytime soon. ;)

but, if the people who care about each of these DBs can spend the time to write, test, and maintain a small layer of DB abstraction functions that at least properly create/modify schemas, and drupal developers mostly stick to ANSI SQL everywhere else, we're 95% done with supporting whatever DBs people want to use.

-derek

p.s. relatively speaking, table creation and schema modification is small percentage of the SQL queries in drupal. basically, it's everything in the .install files, but nothing in the .module or .inc files. so, having a reasonable DB abstraction layer for the .install files doesn't change the fact that most of drupal development can still happen in "raw" SQL (except, of course, for all the drupalisms we already have, like {table_name}, db_query() %d and %s handling, db_rewrite_sql(), etc).

p.p.s. if people really don't want to be bothered, there'd be nothing stopping them (except peer pressure) from continuing to use DB-specific schema queries directly in their .install files. so, if new drupal developers are doing a custom module for a site that will only ever run on some DB, and they don't care about writing portable code, they don't have to learn anything new and there's no further barrier of entry. however, it they're going to become responsible maintainers of important contribs, they're already expected to learn more than nothing, and in this case, a layer of code that hides the ugly details about how oracle vs. postgres vs. mssql vs. sqlite vs. mysql vs. whatever creates tables would be welcomed as a blessing, not cursed as another weird API to learn.

Dries Buytaert

24 Jan 24 Jan

4:20 p.m.

New subject: [development] Referential integrity -- finally?

On 23 Jan 2007, at 18:14, Derek Wright wrote:

...

...
His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

the main problem with this argument is the there's no "Standard" in the "Standard Query Language" for table creation and schema modifications across all the databases. so, while anyone who knows any variant of SQL (preferably the ANSI standard) can write SELECT and UPDATE and JOIN and the rest, you *have* to have DB-specific knowledge to create tables and add/remove/alter indexes or columns. to me, the lack of a standard here is the central point -- if there was a standard a) our situation would be totally different and b) we wouldn't need an abstraction layer.

The quality of this discussion has increased a lot since we discussed this last time for Drupal 5.0. Great! The 'database scheme definition' vs 'standard query language' (or 'data manipulation language') argument make sense to me -- although it would be great if you had pointers to references. Either way, I'm willing to reverse my position on creating database abstraction functions for create or altering SQL tables -- but not to build SELECT/UPDATE/INSERT queries. We'll want to document some of the knowledge captured in this thread in the code comments -- it doesn't hurt to explain our design goals/ decisions. -- Dries Buytaert :: http://www.buytaert.net/

adrian rossouw

6:01 p.m.

New subject: [development] Referential integrity -- finally?

On 24 Jan 2007, at 5:20 PM, Dries Buytaert wrote:

...

The 'database scheme definition' vs 'standard query language' (or 'data manipulation language') argument make sense to me -- although it would be great if you had pointers to references.

Either way, I'm willing to reverse my position on creating database abstraction functions for create or altering SQL tables -- but not to build SELECT/UPDATE/INSERT queries.

Great news. Perhaps we should contact the people handling the different DBMS ports. ? Also, this might make sqllite more easily implemented. I believe it doesn't even have an alter mechanism, and you need to use application code to recreate tables and the like. http://code.jenseng.com/db/

Khalid B

6:15 p.m.

New subject: [development] Referential integrity -- finally?

Dries So DDL abstraction is OK, but no DML abstraction. This is great. Do you mean +1 on this now? http://drupal.org/node/63049

Earnest Berry III

7:58 p.m.

New subject: [development] Referential integrity -- finally?

I'm planning on handling the MS-SQL port; I agree, the create DML functions should be created.....but SQL should be left to the user. I know we're always trying to get new/more developers. But, if the information that a "newbie" needs from the database requires a complex query, the either: A) Need to be adapt enough at programming to use simple SQL statements and application logic to produce the desired results (this will then prob. be refactored by the community after having a look at it, and some nice person will write the complex query for them) B) Contact someone for help C) If they are not adapt at programming, then they can't really create a module in the first place, and that is a whole other issue. So yes, we want more people and s short learning curent, which I believe is there. I think this talk is for people who want to better maintain their modules (DB wise), support more DB's, and still offer the fine SQL control for optimization for advanced queries/statements/etc. We should prob. start to come up with some standards for naming during creations, etc. Example, if I do a db_create_table(array('col1' => array('type' => 'number'), 'options' => array('index' => array('col1'))))); // Creates a table with a column col1 with an index The index should be called 'idx_col1' (naming convention index = idx). so that when other DML statements happen, we're accessing the same index/PK/FK names across DB's. So, I'll start: indexs = idx_[col_name1]_[col_name2]_[...] Primary Key = PK FK = FK_[parent_col]_[child_col]_[..] Good idea/bad idea? adrian rossouw wrote:

...

On 24 Jan 2007, at 5:20 PM, Dries Buytaert wrote:

...
The 'database scheme definition' vs 'standard query language' (or 'data manipulation language') argument make sense to me -- although it would be great if you had pointers to references.

Either way, I'm willing to reverse my position on creating database abstraction functions for create or altering SQL tables -- but not to build SELECT/UPDATE/INSERT queries.

Great news.

Perhaps we should contact the people handling the different DBMS ports. ?

Also, this might make sqllite more easily implemented. I believe it doesn't even have an alter mechanism, and you need to use application code to recreate tables and the like. http://code.jenseng.com/db/

Frando (Franz Heinzmann)

8:40 p.m.

New subject: [development] Referential integrity -- finally?

This whole issue was already discussed somewhere near the end of the last cycle, and back then I wrote an implentation of this whole thing. It was the result of the discussion back then, and is actually very similar to what you are proposing. The patch is here: http://drupal.org/node/63049#comment-98967 It provides a very basic API for creating tables, adding and removing columns, and adding and removing indexes. Type declarations are based on the concept of some default data types that provide default parameterers that can all be overridden easily (comparable to fapi). So far, it's only done in/for MySQL (as I'm not at all familiar with Postgres), however, it shouldn't be too hard to port it to other engines for people who are familiar with them. It will probably not apply cleanly anymore, and there is probably space for improvements, however, the concept should become clear, and a reroll will be easy. regards, frando Earnest Berry III schrieb:

...

I'm planning on handling the MS-SQL port; I agree, the create DML functions should be created.....but SQL should be left to the user. I know we're always trying to get new/more developers. But, if the information that a "newbie" needs from the database requires a complex query, the either:

A) Need to be adapt enough at programming to use simple SQL statements and application logic to produce the desired results (this will then prob. be refactored by the community after having a look at it, and some nice person will write the complex query for them) B) Contact someone for help C) If they are not adapt at programming, then they can't really create a module in the first place, and that is a whole other issue.

So yes, we want more people and s short learning curent, which I believe is there. I think this talk is for people who want to better maintain their modules (DB wise), support more DB's, and still offer the fine SQL control for optimization for advanced queries/statements/etc.

We should prob. start to come up with some standards for naming during creations, etc. Example, if I do a db_create_table(array('col1' => array('type' => 'number'), 'options' => array('index' => array('col1'))))); // Creates a table with a column col1 with an index The index should be called 'idx_col1' (naming convention index = idx). so that when other DML statements happen, we're accessing the same index/PK/FK names across DB's.

So, I'll start: indexs = idx_[col_name1]_[col_name2]_[...] Primary Key = PK FK = FK_[parent_col]_[child_col]_[..]

Good idea/bad idea?

adrian rossouw wrote:

...
On 24 Jan 2007, at 5:20 PM, Dries Buytaert wrote:

...
The 'database scheme definition' vs 'standard query language' (or 'data manipulation language') argument make sense to me -- although it would be great if you had pointers to references.

Either way, I'm willing to reverse my position on creating database abstraction functions for create or altering SQL tables -- but not to build SELECT/UPDATE/INSERT queries.

Great news.

Perhaps we should contact the people handling the different DBMS ports. ?

Also, this might make sqllite more easily implemented. I believe it doesn't even have an alter mechanism, and you need to use application code to recreate tables and the like. http://code.jenseng.com/db/

-- Frando? Gaunab? -Unbiskant: http://unbiskant.org

Earnest Berry III

23 Jan 23 Jan

6:15 p.m.

New subject: [development] Referential integrity -- finally?

Hm. I was also thinking.... If we have RI...we should start to think about transactions implementation of some sort, thus you don't have a parent table w/o any child tables (node table entry w/o a node_revisions table entry). Although, this should be handled at the application level (what I like to call pessimistic coding: e.g. in hook_insert a simple if(!db_query($my_sql_statement)) { node_delete($node->nid); // set error msg }. Thus we get do not get any 'phantom' parent tables. I think this is something that should be added to the developer hand book perhaps. A section on error handling. Most modules follow an 'optimistic' coding practice where most things are assumed that they work with no errors. Sorry, getting off topic. Also with regards to RI..it shouldn't be to bad. The main parent table in Drupal is the node table. After that, most everything references the nid generated from the node. If we do go RI however, I think that doing away with auto_increment would make things easier. Gerhard Killesreiter wrote:

...

Sammy Spets wrote:

...
Good point Adrian,

I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code.

What do you guys think of this?

Problem is that Dries doesn't like this _at_ _all_. :p

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore.

Cheers, Gerhard

Dries Buytaert

6:35 p.m.

New subject: [development] Referential integrity -- finally?

On 23 Jan 2007, at 13:04, Gerhard Killesreiter wrote:

...

...
I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code. What do you guys think of this?

Problem is that Dries doesn't like this _at_ _all_. :p

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof. An abstraction layer makes that less transparant. Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer. It's a little implementing a new template language in PHP (cfr. Smarty) -- PHP is already a template language. At the same time, an abstraction layer makes it easier to port and update modules. Conclusion: there is no win-win situation -- each direction has advantages and disadvantages.

...

While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore.

Yes, with the install system, the facts has changed. "Ad absurdum" is too strong a word (but it sure sounds cool. ;) I don't mind us opening this discussion again. The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board? -- Dries Buytaert :: http://www.buytaert.net/

Earnest Berry III

7:46 p.m.

New subject: [development] Referential integrity -- finally?

Hm. Could there be a way to keep both optimization and the abstraction layer? e.g. I can still do my case $db_type for fine grained schema creation control, and then after my switch run a general db_create_table($table_array) for tables that do not need the fine-grained control.....but then again....that's riding the fence...hmm...yeah, no win-win here. blah. Dries Buytaert wrote:

...

On 23 Jan 2007, at 13:04, Gerhard Killesreiter wrote:

...
...
I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code. What do you guys think of this?

Problem is that Dries doesn't like this _at_ _all_. :p

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof. An abstraction layer makes that less transparant.

Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer. It's a little implementing a new template language in PHP (cfr. Smarty) -- PHP is already a template language.

At the same time, an abstraction layer makes it easier to port and update modules.

Conclusion: there is no win-win situation -- each direction has advantages and disadvantages.

...
While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore.

Yes, with the install system, the facts has changed. "Ad absurdum" is too strong a word (but it sure sounds cool. ;) I don't mind us opening this discussion again.

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

-- Dries Buytaert :: http://www.buytaert.net/

Michael Favia

7:49 p.m.

New subject: [development] Referential integrity -- finally?

Earnest Berry III wrote:

...

Hm. Could there be a way to keep both optimization and the abstraction layer? e.g. I can still do my case $db_type for fine grained schema creation control, and then after my switch run a general db_create_table($table_array) for tables that do not need the fine-grained control.....but then again....that's riding the fence...hmm...yeah, no win-win here. blah.

i disagree with the riding the fence comment. There CAN be more than 1 way to do something "right" in drupal if we'd like no? I agree i like to unify where i can but sometimes there is a good argument to provide two different ways to do something because the use case changes (which seems to be the case here). One could even use the other in its implementation if desired to reduce code duplication. -- Michael Favia michael@favias.org tel. 512.585.5650 http://michael.favias.org

Derek Wright

8:10 p.m.

New subject: [development] Referential integrity -- finally?

On Jan 23, 2007, at 9:35 AM, Dries Buytaert wrote:

...

Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer.

not true for schema manipulation, which is the only thing many of us are arguing for. SQL is not an abstraction layer for this, since there's no standard. every DB has its own syntax for this crap, which is why we need our own abstraction layer for it.

...

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

i think that's a false dichotomy... we don't have to choose one or the other. i may not be a DB expert, but i'm expert in other fields of CS. abstraction layers that hide ugly details i don't want to be bothered to think about, freeing my mind to focus on the problems i care about, are almost always welcome (assuming they're done right, and don't make my problems worse). also, i continue to question this "if we build it, they will come" attitude towards DB experts. i don't think the sheer fact of if we add RI or not, or have a schema creation abstraction layer or not, will cause more DB experts to join our development efforts or not. but, again, i'm no DB expert, so i don't know how they think. ;) but, this seems a little wonky to me. an analogy for something i know more about: CVS and revision control... what attracted me to drupal wasn't the great use of CVS and powerful tools drupal had built to take advantage of it. ;) i just needed an interactive website for my band where people could signup for what rehearsals and gigs they were planning to come to that i posted to an online calendar. turned out i also happened to be a CVS expert, and decided to start putting my expertise to use fixing the tattered ruins known as the contributions repository (since the lack of real releases was making my efforts to build the site for my band more painful). i don't think the DB experts are just floating around out there, waiting for The Right CMS(tm) to show up that does everything they ever wanted with all their DB ideas so they can finally get their hands dirty doing interesting work. they're going to come to Drupal for all sorts of random reasons, and may or may not put their DB expertise to use making Drupal better... -derek

Michael Favia

8:23 p.m.

New subject: [development] Referential integrity -- finally?

Derek Wright wrote:

...

i don't think the DB experts are just floating around out there, waiting for The Right CMS(tm) to show up that does everything they ever wanted with all their DB ideas so they can finally get their hands dirty doing interesting work. they're going to come to Drupal for all sorts of random reasons, and may or may not put their DB expertise to use making Drupal better...

Agreed. With my only concern being that a DB expert will first look at the state of our storage mechanisms as his primary judgment of the overall quality of the system (because it is what he knows) and might unfairly deduce that the other fields of CS arent taken seriously either. Assuming such an assumption is valid concerning DB management with regards to RI and schema abstraction, etc. This isnt something I agree with but it is just a point worth considering IMO. Demonstrating our willingness and /attempt/ to do things /right/ by them opens a lot of doors to individuals who might be wiling to improve it. To bring this back to your experience: You saw a good project trying to do things right (CVS + contribs) but needing help getting it done properly so you opted to help. Olive branches are powerful things. -- Michael Favia michael@favias.org tel. 512.585.5650 http://michael.favias.org

Earnest Berry III

8:47 p.m.

New subject: [development] Referential integrity -- finally?

I fine some disagreement in the "DB Expert" comments. I think using more DB's will attract more "use", not "DB Experts". In most shops, in which they are large enough to have a sole, full-time DBA, and perhaps even a DBO...the decision to make a CMS or new application is a "group" effort. Thus, when we go into a meeting for a new application, it's a joint effort of everyone being presented with the problems, the DBA presenting the schema and sproc's (DBA's like abstraction layers too :) ), and then the developers providing looking at how they are going ot access and manipulate the data. This presents a 'groud-up'/bare-bones approach. If Drupal used more DB's, this would allow a lot of the frame-work to be used by the develper, and the DBA could still create their own underlying schema in which Drupal could query. So, that in turn, I think will open up Drupal to more "shops". A true DB-Expert these days is far beyond Drupal (as a general 'challenge' anyway), and more concerned and having "fun" with new native XML table-types, Geo-Spatial datatypes/tables, clustering, replication, etc. So in summary, I think the we should look at this as opening up Drupal to more "shops"/"uses", rather than "DB-Experts". Also, about the abstraction layer removing the "ugly mess" below. This is nice...and not so nice sometimes. I think of it like a car. I like that all I have to do is turn my car on, and it runs fine. I doubt any F1 team would like a car where all they had to do was turn the car on and didn't have to worry about camber, PSI, gear-selection, etc. Now, most schema items are trivial, and the detailed mess is fine as abstracted...but there are going to be times where people want the mess. Bring me the mess..sometimes :). I know in my last project I was saved my MS-SQL offering "locking hints" where I could tell the DB what type of lock I wanted it to acquire in my query. Locking is usually abstracted in MySQL (offering only table lock and row locking, and mostly done automatically for you by the RDMS). Michael Favia wrote:

...

Derek Wright wrote:

...
i don't think the DB experts are just floating around out there, waiting for The Right CMS(tm) to show up that does everything they ever wanted with all their DB ideas so they can finally get their hands dirty doing interesting work. they're going to come to Drupal for all sorts of random reasons, and may or may not put their DB expertise to use making Drupal better...

Agreed. With my only concern being that a DB expert will first look at the state of our storage mechanisms as his primary judgment of the overall quality of the system (because it is what he knows) and might unfairly deduce that the other fields of CS arent taken seriously either. Assuming such an assumption is valid concerning DB management with regards to RI and schema abstraction, etc. This isnt something I agree with but it is just a point worth considering IMO. Demonstrating our willingness and /attempt/ to do things /right/ by them opens a lot of doors to individuals who might be wiling to improve it.

To bring this back to your experience: You saw a good project trying to do things right (CVS + contribs) but needing help getting it done properly so you opted to help. Olive branches are powerful things.

Khalid B

8:35 p.m.

New subject: [development] Referential integrity -- finally?

Dries When it comes to abstraction, one has to differentiate between two things. When it comes to DML (Data manipulation language, such as INSERT, SELECT, UPDATE, ...etc.) abstraction normally becomes a lowest common denominator and performance may suffer. For DDL (Data definition language, such as CREATE), there is no standard for column types and syntax, and hence abstraction buys us portability. Sites that need tuning (e.g. adding an index, ..etc.) will do so regardless of whether we do abstraction or not. Therefore, I am opposed to too much abstraction in DML, but happy with abstraction, as suggested by Adrian in the issue mentioned, for DDL.

Earnest Berry III

8:49 p.m.

New subject: [development] Referential integrity -- finally?

Yes. Little to 0 abstraction at the DML level. I don't think anyone was thinking that though. Was anyone considering abstraction DML statements? Khalid B wrote:

...

Dries

When it comes to abstraction, one has to differentiate between two things.

When it comes to DML (Data manipulation language, such as INSERT, SELECT, UPDATE, ...etc.) abstraction normally becomes a lowest common denominator and performance may suffer.

For DDL (Data definition language, such as CREATE), there is no standard for column types and syntax, and hence abstraction buys us portability. Sites that need tuning ( e.g. adding an index, ..etc.) will do so regardless of whether we do abstraction or not.

Therefore, I am opposed to too much abstraction in DML, but happy with abstraction, as suggested by Adrian in the issue mentioned, for DDL.

Darrel O'Pry

8:54 p.m.

New subject: [development] Referential integrity -- finally?

On Tue, 2007-01-23 at 18:35 +0100, Dries Buytaert wrote:

...

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

I think the question of what database folks want or feel should be garnered from database folk, not assumed. The conversations I have with the DBA's I work with normally go along the lines of the database should handle data and data related functions, not the application. Only business and presentation logic should be in the application. They're interested in stored procedures, transactions, triggers, etc. Which Drupal will probably never have since it caters to the least common denominator. A schema abstraction layer won't help or hinder DBA's. They're going to be examining the schema from the DB not Drupal. It will make life easier for developers and adventurous database folks. .darrel.

Gerhard Killesreiter

11:08 p.m.

New subject: [development] Referential integrity -- finally?

Dries Buytaert wrote:

...

On 23 Jan 2007, at 13:04, Gerhard Killesreiter wrote:

...
Problem is that Dries doesn't like this _at_ _all_. :p

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof.

Currently, we don't care at all about the order of fields in a table. Furthermore, if we add a field later, we just append it to the table. This causes the structure of an old Drupal install to be different from new installs. And, quite frankly, most Drupal developers including myself wouldn't have a clue for what order of fields should be preferred.

...

An abstraction layer makes that less transparant.

True, but for example the functions used for pgsql are quite transparent.

...

Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer. It's a little implementing a new template language in PHP (cfr. Smarty) -- PHP is already a template language.

Well, Drupal _is_ already an abstration layer on top of PHP. We have hooks, special conventions (check_plain & firends), etc.

...

At the same time, an abstraction layer makes it easier to port and update modules.

Conclusion: there is no win-win situation -- each direction has advantages and disadvantages.

Yeah, but we need a decision on what to do. This is not only valid for db_add_columns & friends, but also for other things like db_insert &tc that have been proposed.

...

...
While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore.

Yes, with the install system, the facts has changed. "Ad absurdum" is too strong a word (but it sure sounds cool. ;)

I knew you'd like it. :p

...

I don't mind us opening this discussion again.

I think we should have added a database.*sql file to Drupal 5 before releasing it. If I download an unkown product that depends on a database, I tend to first look at the sql definitions. It tells me a lot about how well engineered the project is. It would not hurt if the db schema would be documented.

...

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

Well, he have been using the current way of doing SQL for some time now and the number of SQL savvy Drupal people is still low. Maybe we try something else now? Also, I have found a few people in #drupal that help me with with SQL related questions. Nobody has yet complained about our API. Of course i usually feed them the processed queries from the database logs. Cheers, Gerhard

Larry Garfield

24 Jan 24 Jan

2:32 a.m.

New subject: [development] Referential integrity -- finally?

On Tuesday 23 January 2007 4:08 pm, Gerhard Killesreiter wrote:

...

...
Conclusion: there is no win-win situation -- each direction has advantages and disadvantages.

Yeah, but we need a decision on what to do.

This is not only valid for db_add_columns & friends, but also for other things like db_insert &tc that have been proposed.

db_insert() and friends have been in the helpers module for several months now. I've used them with great success. (I'd be happy to see them moved into core, but I don't expect miracles. <g>) However, I also disagree that there's no win-win situation, because it doesn't have to be either/or. I maintain a few small modules. None have particularly complex SQL. My SQL experience is 95% MySQL, with a wee bit of MS SQL Server thrown in for bad measure. Currently none of my modules support PostgreSQL. I simply don't know it, nor do I have any personal motivation to learn it. Similarly I know almost nothing about Oracle. Really, there's no way that any of my modules are ever going to support PostgreSQL and Oracle and SQLite unless there's a common abstracted create_table_schema() function of some sort, or someone else writes the install code for it. I'm sure some would say I'm being lazy/selfish/irresponsible (and maybe I won't disagree with them <g>), but the way I see it, they're a side hobby for me, for the most part, so I don't see a reason to spend extra time on learning and supporting a database that I never use and can't even test on. However, I *would* be willing to learn a few extra PHP function calls that would let me support "all supported databases", at least enough to be fully functional if not fully optimized. For more complex tables or where heavy field-order optimization matters, there's still raw db_query(). I see nothing at all wrong with an install function that looks like this: create_table('foo', array('nid'=>'int', 'title'=>'string')); switch ($db_type) { case 'mysql': // MySQL is faster if we do X instead db_query("ALTER TABLE {foo} CHANGE ..."); break; } Allowing both a "generic" DDL API and an option for more specific tweaks where needed does create a win-win situation. Module authors can easily make their code function on all databases Drupal supports while still having the option of making database-specific optimizations. My modules can "just work" everywhere, and CCK can be fine-tuned to each database as needed, and everyone wins.

...

Also, I have found a few people in #drupal that help me with with SQL related questions. Nobody has yet complained about our API. Of course i usually feed them the processed queries from the database logs.

Cheers, Gerhard

Well some of us are just PHP people who spend too much time writing SQL. :-) -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson

Sammy Spets

1:18 a.m.

New subject: [development] Referential integrity -- finally?

I agree that a schema definition is best left to be standard SQL for many reasons. Most importantly, Dries mentioned, is lowering the barrier for new module developers. New modules aren't the issue for me because the number of new modules added to the repository are far lower than the number of updates written. My quest is really to get the updates (usually done after a module developer has gained some experience) performed through abstracted functions. This would considerably lower the amount of work required by people in my position. If it works in MySQL, it works in PostgreSQL. Simple. Adding a table to a database is very easy to test. Meaning it's dealt with quickly if a module developer contacts someone savvy with PostgreSQL. Likewise, if the module developer is writing an update and finds they need something other than what's provided by the abstraction functions. In that event, they can contact someone savvy with PostgreSQL to assist them to write the PostgreSQL part. Several people are talking to me about some update code they are writing when it goes outside their PostgreSQL scope. This has been fantastic for Drupal because people using PostgreSQL don't get install errors or errors when they update some module. All i'm suggesting is we move the update code to abstracted functions and leave hook_install() as is. -- Sammy Spets Synerger Pty Ltd http://synerger.com On 23-Jan-07 18:35, Dries Buytaert wrote:

...

On 23 Jan 2007, at 13:04, Gerhard Killesreiter wrote:

...
...
I think there needs to be some effort put in to better the abstraction we currently have. e.g db_add_column() only exists for PostgreSQL. We'd get rid of a mountain of code if all updates were moved to use that function for both MySQL and PostgreSQL. It would also reduce the barrier for module developers to write PostgreSQL compatible code. What do you guys think of this?

Problem is that Dries doesn't like this _at_ _all_. :p

His argument is IIRC that we should stick to as plain SQL as possible to not further increase the barrier of entry for people already knowing SQL.

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof. An abstraction layer makes that less transparant.

Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer. It's a little implementing a new template language in PHP (cfr. Smarty) -- PHP is already a template language.

At the same time, an abstraction layer makes it easier to port and update modules.

Conclusion: there is no win-win situation -- each direction has advantages and disadvantages.

...
While his argument has merit, I think it is put ad absurdum by the fact, that Drupal doesn't have any plain sql files anymore.

Yes, with the install system, the facts has changed. "Ad absurdum" is too strong a word (but it sure sounds cool. ;) I don't mind us opening this discussion again.

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

-- Dries Buytaert :: http://www.buytaert.net/

Rowan Kerr

3:35 a.m.

New subject: [development] Referential integrity -- finally?

On 23-Jan-07, at 12:35 PM, Dries Buytaert wrote:

...

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof. An abstraction layer makes that less transparant.

But when you're working with Drupal, you're not working with database fields, tables, etc.. you're working with Nodes, Terms, Vocabularies, Comments, Users and the like. The fact that these just happen to be stored in a SQL database is not terribly important to ones line of thinking when developing most modules. (If you're really concerned about the specifics of your data at the database level, you would mandate that the application use stored procedures for everything. ;) Requiring that someone be well versed in SQL to do a lot of development in Drupal is already a barrier to entry. And most people are not expert DBA's. Just look at the schemas for some of core and contrib :)

...

At the same time, an abstraction layer makes it easier to port and update modules.

PHP has a crap database API. So (to Adrian's point) the only way to avoid killing contrib developers and core maintainers with schema management across X number of databases X-1 of which will most likely not be their primary SQL dialect is to provide a thin layer for the parts of SQL that are completely not standardized across implementations. So that for sure includes table creation, and altering, and also potentially date/time functions if we were to ever consider moving away from timestamps to leverage good database support for date and time data types. Module developers are already NOT writing pure SQL to create and alter their tables.. so we've already thrown up a layer in there that is ideal for neither developers nor db admins. Why not help developers out so that they need only write one db_table_create() or db_table_alter() statement instead of 2 or more? -Rowan

Chris Johnson

4:33 p.m.

New subject: [development] Referential integrity -- finally?

On 1/23/07, Dries Buytaert <dries.buytaert@gmail.com> wrote:

...

When working with databases you want to know _exactly_ what is happening with your fields, types, keys and indices -- and the order thereof. An abstraction layer makes that less transparant.

I believe most of that can be accomadated in a good abstraction layer. Few Drupal developers know enough about any given database that they worry about order of fields and keys. Just look at all the effort that Killes went through recently with trying to optimize database usage for drupal.org. There were a few of us providing small pieces of additional information to him, and a lot of reading The Fine Manual and testing things to see what happened. I say it is far better to have the abstraction layer so that all of contrib works on all supported databases, (and we can stop chasing database-specific install and upgrade bugs), than it is to allow easy fine-tuning of the database tables by contrib authors. And we can do both. If performance optimization of a table on a particular database gets to be crucial for a particular contrib author, she can either help improve the core abstraction code to fix it, or make it possible through the abstraction layer. Or she can *bypass* the abstraction layer and code for the database specifically as we do now, at the cost of losing compatibility with other databases. That's a win-win situation.

...

Plus, SQL is already an abstraction layer so we'd be adding an abstraction layer on top of an abstraction layer. It's a little implementing a new template language in PHP (cfr. Smarty) -- PHP is already a template language.

No, that analogy doesn't hold up. SQL is no more an abstraction layer than PHP is or any other high-level language. SQL and PHP are both interpreted at run time by interpreters written in C/C++. (Come on, let's all code right in machine language! :-)

...

The more I think about it, the more it boils down to this question. What do we care more about? Optimizing Drupal for developers, or getting more database experts on board?

Why not both? Point the database experts at the core and abstraction SQL, and give the developers an easier way to be cross-database compatible. Or do I misunderstand your question? ..chrisxj

Rowan Kerr

23 Jan 23 Jan

5:53 a.m.

New subject: [development] Referential integrity -- finally?

On 22-Jan-07, at 5:31 AM, adrian rossouw wrote:

...

Maintaining schemas is still a lot of work,. and i fear adding another layer of complexity to it will start making it unmanageable.

If the schemas are designed correctly, there should not need to be many changes made over time. :)

...

This means we're going to be doing essentially the same work 6 times over. And that's only for core, the majority of contrib modules will still not work on other db's.

Or we could write some schema migration helper functions when queries between different databases are too different?

...

I also seriously doubt that just adding referential integrity is going to draw lots of postgresql experts out of the ether..

This affects more than just PostgreSQL. Having RI, while not some sort of silver bullet, will make database design sticklers take Drupal more seriously and open up new areas for deployment. -Rowan

Darrel O'Pry

6:05 a.m.

New subject: [development] Referential integrity -- finally?

On Mon, 2007-01-22 at 23:53 -0500, Rowan Kerr wrote:

...

On 22-Jan-07, at 5:31 AM, adrian rossouw wrote:

...
Maintaining schemas is still a lot of work,. and i fear adding another layer of complexity to it will start making it unmanageable.

If the schemas are designed correctly, there should not need to be many changes made over time. :)

...
This means we're going to be doing essentially the same work 6 times over. And that's only for core, the majority of contrib modules will still not work on other db's.

Or we could write some schema migration helper functions when queries between different databases are too different?

cck has some code that sort of does that for normalizing field tables when you have shared field instances or multivalue fields.

...

...
I also seriously doubt that just adding referential integrity is going to draw lots of postgresql experts out of the ether..

This affects more than just PostgreSQL. Having RI, while not some sort of silver bullet, will make database design sticklers take Drupal more seriously and open up new areas for deployment.

-Rowan

adrian rossouw

8:54 a.m.

New subject: [development] Referential integrity -- finally?

On 23 Jan 2007, at 6:53 AM, Rowan Kerr wrote:

...

On 22-Jan-07, at 5:31 AM, adrian rossouw wrote:

...
Maintaining schemas is still a lot of work,. and i fear adding another layer of complexity to it will start making it unmanageable.

If the schemas are designed correctly, there should not need to be many changes made over time. :)

Schemas will change, either due to new features being developed requiring new columns, or through data optimization. Any change made to the schema needs to be made once for each database type supported, and then one update needs to be written and tested for each database type supported. That means adding a column to a table for a new feature requires 12 changes to the source. And that's just one column.

Gerhard Killesreiter

20 Jan 20 Jan

1:35 a.m.

New subject: [development] Referential integrity -- finally?

Dries Buytaert wrote:

...

Hello world,

Fact: Drupal 6 will finally drop support for MySQL 3.x.

Can we please drop 4.0 while we are at it? http://drupal.org/node/105855

...

The good news is that we can _finally_ start caring about referential integrity. As of MySQL 4, you can define foreign keys, and even though they don't actually work with MyISAM tables they would work with InnoDB tables. Plus, the PostgreSQL community would love us for it. To make a long story short: more DBAs and/or business critical websites would take us seriously.

Sounds like a good idea.

...

In other words: I'd happily accept patches that add foreign keys to Drupal tables. It woud not be perfect as we can't use cascading deletes, for example. Nevertheless, it would be a great step in the right direction -- assuming we can make it all work in a SQL-compliant way without leaving MyISAM users in the cold.

This sounds as if we'd still need to check the integrity in the code, too. The main effort would probably to add foreign keys to the install functions and to check that we do inserts and deletes in the right order. A nice fringe benefit from this would be that we finally could automatically make plots from our db schemes as most programs doing this require foreign keys to work properly. People interested in other database work can contact me for a list of slow queries. 8-) Cheers, Gerhard

Bèr Kessels

10:40 a.m.

New subject: [development] Referential integrity -- finally?

Hello, Op vrijdag 19 januari 2007 20:45, schreef Dries Buytaert:

...

As of MySQL 4, you can define foreign keys, and even though they don't actually work with MyISAM tables they would work with InnoDB tables. Plus, the PostgreSQL community would love us for it.

Drupal is kindof database independent. Though no-one has managed to integrate sqlite because of 'things' (locking) that Drupal requires. Would using foreign keys and dependign on them not narrow the posibilities for other DBAs a lot? And do we care? Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl

Karoly Negyesi

12:39 p.m.

New subject: [development] Referential integrity -- finally?

...

Drupal is kindof database independent. Though no-one has managed to integrate sqlite because of 'things' (locking) that Drupal requires.

My hopes are that SQLite support will happen with Drupal 6. Tadej Baša has written the code and he says he is running Drupal with it since 4.5 which is quite believable. He also volunteers to be the maintainer: http://lists.drupal.org/pipermail/development/2007-January/021806.html My hopes are that Drupal will run out of the box, without any setup whatsoever. LOCKs are on their way out, killes is nurturing the no lock patch I abandoned. And even if they do not die, they are there only to avoid some races which simply do not happen on the small sites that are targets for SQLite (ie. quite small sites).

...

Would using foreign keys and dependign on them not narrow the posibilities for other DBAs a lot? And do we care?

We will not depend on them as MyISAM happily ignores FKs. Regards, NK

Khalid B

22 Jan 22 Jan

5:35 a.m.

New subject: [development] Referential integrity -- finally?

On 1/20/07, Karoly Negyesi <karoly@negyesi.net> wrote:

...

...
Drupal is kindof database independent. Though no-one has managed to integrate sqlite because of 'things' (locking) that Drupal requires.

My hopes are that SQLite support will happen with Drupal 6. Tadej Baša has written the code and he says he is running Drupal with it since 4.5 which is quite believable. He also volunteers to be the maintainer: http://lists.drupal.org/pipermail/development/2007-January/021806.html My hopes are that Drupal will run out of the box, without any setup whatsoever. LOCKs are on their way out, killes is nurturing the no lock patch I abandoned. And even if they do not die, they are there only to avoid some races which simply do not happen on the small sites that are targets for SQLite (ie. quite small sites).

Wasn't the other problem is that SQLite does not have some ALTER variant that we use, so updates to the schema not possible? Has this been recently solved too? If so, then good news. Back on topic, referential integrity is the way forward. Not only does it make us friends with the PostgreSQL folk, but it is good practice in general to have it, regardless of the underlying database engine. The debate about RI is a non-debate. It is a historical issue. There was a database used by hosting companies called mSQL (mini SQL) and it was not free, MySQL stepped in to fill the gap with a free database (as in cost-free initially), with the same syntax, ...etc.. MySQL's popularity skyrocketed, then it became free (as in liberty). They argued that RI is not necessary since web apps are mostly read intensive, and they shined in the read intensive arena. Now, true databases all have RI. MySQL themselves had InnoDB, and most hosting companies for the last 2 years or so support InnoDB. If only Oracle would not have nabbed that, it would have been the way forward. Now they are developing Falcon and solidDB, ...etc. Regardless, RI is needed. +1.

Earnest Berry III

5:53 a.m.

New subject: [development] Referential integrity -- finally?

I will agree from experiance. It is MUCH faster, when you get into larger data sets, to drop FK/RI all together, and handle the logic at the application level. However, "indexes" in general are another story. That starts to become database specific and has to do with the ways in which you can manipulate the query plan per DB implementation (Oracle, Sybase, MSSQL, etc.). In general though, for mass inserts, it is faster to drop all indexes, insert/update, and then re-create the index. I use this technique on a couple apps with GREAT success. I think the DB use issues, IMHO, comes from and depends on the applications purpose. Most Drupal apps are blogs/news content/etc. and are READ/SELECT intensive, thus most DB will do, and it's just as good to use MySQL, SQLite, etc. However, when you get into transaction/OLAP/e-commerce heavy type apps with many simultaneous UPDATE/INSERTS/transactions.....this, I find, is where the scalability/usability of MySQL, comes into play. With that said...I wonder if we should have an option to not use RI. But...dont' get me wrong, I want RI in Drupal, and I think RI needs to go into core, and the above situation I have discussed should be considered a minority and something dealt with by the developer/enduser. On 1/21/07, Khalid B <kb@2bits.com> wrote:

...

On 1/20/07, Karoly Negyesi <karoly@negyesi.net> wrote:

...
...
Drupal is kindof database independent. Though no-one has managed to integrate sqlite because of 'things' (locking) that Drupal requires.

My hopes are that SQLite support will happen with Drupal 6. Tadej Baša has written the code and he says he is running Drupal with it since 4.5 which is quite believable. He also volunteers to be the maintainer: http://lists.drupal.org/pipermail/development/2007-January/021806.html My hopes are that Drupal will run out of the box, without any setup whatsoever. LOCKs are on their way out, killes is nurturing the no lock patch I abandoned. And even if they do not die, they are there only to avoid some races which simply do not happen on the small sites that are targets for SQLite (ie. quite small sites).

Wasn't the other problem is that SQLite does not have some ALTER variant that we use, so updates to the schema not possible? Has this been recently solved too? If so, then good news.

Back on topic, referential integrity is the way forward. Not only does it make us friends with the PostgreSQL folk, but it is good practice in general to have it, regardless of the underlying database engine.

The debate about RI is a non-debate. It is a historical issue. There was a database used by hosting companies called mSQL (mini SQL) and it was not free, MySQL stepped in to fill the gap with a free database (as in cost-free initially), with the same syntax, ...etc.. MySQL's popularity skyrocketed, then it became free (as in liberty). They argued that RI is not necessary since web apps are mostly read intensive, and they shined in the read intensive arena.

Now, true databases all have RI. MySQL themselves had InnoDB, and most hosting companies for the last 2 years or so support InnoDB. If only Oracle would not have nabbed that, it would have been the way forward. Now they are developing Falcon and solidDB, ...etc.

Regardless, RI is needed.

+1.

Victor Kane

20 Jan 20 Jan

12:56 p.m.

New subject: [development] Referential integrity -- finally?

Referential integrity is good. No doubt about it. But the purchase of InnoDB by Oracle last year is worrying, if Drupal is going to have any kind of dependence on that storage engine. Of course Drupal should support all databases, but I would think twice about creating dependencies upon commercial-only software (if InnoDB became some kind of "standard" for modules...). saludos, Victor Kane http://awebfactory.com.ar On 1/19/07, Dries Buytaert <dries@buytaert.net> wrote:

...

Hello world,

Fact: Drupal 6 will finally drop support for MySQL 3.x.

The good news is that we can _finally_ start caring about referential integrity. As of MySQL 4, you can define foreign keys, and even though they don't actually work with MyISAM tables they would work with InnoDB tables.

Karoly Negyesi

3:28 p.m.

New subject: [development] Referential integrity -- finally?

Noone said we want to depend on InnoDB. Noone. Ever. And noone. Ever. Will. (In private mail, please mail me the thought process how you managed to arrive to this assumption from Dries' mail.) Regards NK

...

Referential integrity is good. No doubt about it. But the purchase of InnoDB by Oracle last year is worrying, if Drupal is going to have any kind of dependence on that storage engine. Of course Drupal should support all databases, but I would think twice about creating dependencies upon commercial-only software (if InnoDB became some kind of "standard" for modules...).

saludos,

Victor Kane http://awebfactory.com.ar

On 1/19/07, Dries Buytaert <dries@buytaert.net> wrote:

...
Hello world,

Fact: Drupal 6 will finally drop support for MySQL 3.x.

The good news is that we can _finally_ start caring about referential integrity. As of MySQL 4, you can define foreign keys, and even though they don't actually work with MyISAM tables they would work with InnoDB tables.

Dries Buytaert

4:44 p.m.

New subject: [development] Referential integrity -- finally?

On 20 Jan 2007, at 12:56, Victor Kane wrote:

...

Referential integrity is good. No doubt about it. But the purchase of InnoDB by Oracle last year is worrying, if Drupal is going to have any kind of dependence on that storage engine.

I wasn't suggesting we'd make Drupal core use InnoDB tables. If we start caring about referential integrity, MyISAM would continue to work -- and would probably continue to be the default table engine. The MyISAM storage engine simply ignores all statements that have to do with referential integrity. However, referential integrity could be a good reason to use InnoDB tables or PostgreSQL. Being sure that my data is in a consistent/correct state, matters a great deal for me.

...

Of course Drupal should support all databases, but I would think twice about creating dependencies upon commercial-only software (if InnoDB became some kind of "standard" for modules...).

The versions of InnoDB that are released as open source will continue to be available. MySQL, the company, might stop supporting it, but MySQL, the community, might maintain the InnoDB storage engine. It's unclear how the cards will be played at this time. The one thing that is clear is that MySQL, the company, is investing heavily in Falcon, a new storage engine that might replace InnoDB. Plus, you could also look at this from a different perspective. It could mean that -- all of a sudden -- Drupal works with Oracle (assuming they add InnoDB support to Oracle). You'll find that many serious DBAs (often Oracle users) care a lot about referential integrity. -- Dries Buytaert :: http://www.buytaert.net/

Victor Kane

5:19 p.m.

New subject: [development] Referential integrity -- finally?

...

...
Referential integrity is good. No doubt about it. But the purchase of InnoDB by Oracle last year is worrying, if Drupal is going to have any kind of dependence on that storage engine.

I wasn't suggesting we'd make Drupal core use InnoDB tables. If we start caring about referential integrity, MyISAM would continue to work -- and would probably continue to be the default table engine. The MyISAM storage engine simply ignores all statements that have to referential integrity could be a good reason to use InnoDB tables or PostgreSQL. Being sure that my data is in a consistent/correct state, matters a great deal for me.

I agree completely. In the year or so that I have been working with Drupal I can see indeed that "the Drupal way" is to solve our problems (and the need for referential integrity is key here) basing ourselves fully on GPL solutions; and in this sense the excellent PostgreSQL (a product some say even excels when compared to fully commercial products such as Oracle) perhaps should be brought more into the center of the picture, or elevated from its status as "also supported", this without demoting MySQL in any way. And I have put this philosophy into practice by filing issues where modules have not supported several PostgreSQL installations I have been responsible for.

...

Plus, you could also look at this from a different perspective. It could mean that -- all of a sudden -- Drupal works with Oracle (assuming they add InnoDB support to Oracle). You'll find that many serious DBAs (often Oracle users) care a lot about referential integrity.

excellent! Victor Kane http://awebfactory.com.ar

Dries Buytaert

11:24 p.m.

New subject: [development] Referential integrity -- finally?

On 20 Jan 2007, at 17:19, Victor Kane wrote:

...

In the year or so that I have been working with Drupal I can see indeed that "the Drupal way" is to solve our problems (and the need for referential integrity is key here) basing ourselves fully on GPL solutions; and in this sense the excellent PostgreSQL (a product some say even excels when compared to fully commercial products such as Oracle) perhaps should be brought more into the center of the picture, or elevated from its status as "also supported", this without demoting MySQL in any way. And I have put this philosophy into practice by filing issues where modules have not supported several PostgreSQL installations I have been responsible for.

Absolutely -- and for that we need to conquer the heart of PostgreSQL users. As explained, they care a lot about these things. Here is what a PostgreSQL person said to me:

...

As someone who is about as deep as you can get in the postgresql community without being core :). This is what I would like to see.

Truly normalized schema Every table must have a natural primary key Relational Integrity Removal of ilike searches. Use lower() = lower() it is faster

(I modified and stripped his/her mail to hide his identity -- he might have been OK with me forwarding the entire message but I haven't checked so ...) -- Dries Buytaert :: http://www.buytaert.net/

7112

Age (days ago)

7117

Last active (days ago)

List overview

Download

47 comments

19 participants

participants (19)

adrian rossouw
Bèr Kessels
Chris Johnson
Darrel O'Pry
Derek Wright
Dries Buytaert
Dries Buytaert
Earnest Berry III
FGM
Frando (Franz Heinzmann)
Gerhard Killesreiter
Karoly Negyesi
Khalid B
Larry Garfield
Michael Favia
Morbus Iff
Rowan Kerr
Sammy Spets
Victor Kane