Do not let postgresql hold back great patches
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it. Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not. Regards, Karoly Negyesi
Is there a reliable way to search for patches requiring pgSQL testing? Perhaps these sorts of things should be committed and then separate issues opened for pgSQL testing? In the case of this issue, ALTER IGNORE will not work on pgSQL. [Will follow-up there.] It is also possible (and quick) to run EXPLAIN against pgSQL if we have a list of queries that I could test. - Ken Rickard agentrickard Note: Not a pgSQL guru, but have it installed on the Mac. On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
Regards,
Karoly Negyesi
On Nov 11, 2007 7:31 PM, Ken Rickard <agentrickard@gmail.com> wrote:
Is there a reliable way to search for patches requiring pgSQL testing?
Not really. In the Postgresql group[1] I created a post[2] to track things like this. Others have proposed a taxonomy to track stuff like this. I think the problem with that is that project_issue doesn't support editing taxonomies during followups (I assume there's an issue - I didn't look). Greg [1] http://groups.drupal.org/postgresql [2] http://groups.drupal.org/node/6980 -- Greg Knaddison Denver, CO | http://knaddison.com World Spanish Tour | http://wanderlusting.org/user/greg
On Nov 11, 2007, at 4:07 PM, Greg Knaddison wrote:
I think the problem with that is that project_issue doesn't support editing taxonomies during followups
Right.
(I assume there's an issue - I didn't look).
There's always an issue. ;) http://drupal.org/node/187480 -Derek (dww)
On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
I know many will not like what I will say, but I have to say it. A prime example of where MySQL works fine to solve an issue with a few lines is this issue http://drupal.org/node/83738. All the huge changes and jumping through hoops is because of accomodating PostgreSQL We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Regards,
Karoly Negyesi
-- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
I would agree with Khalid, actually. Having rock-solid code on a popular platform seems ideal, at least until something like PDO is ready to handle database abstraction for us. I only installed pgSQL to be a "responsible" module maintainer after my requests for pgSQL testers went unanswered. At work, we're entirely MySQL. - Ken Rickard On Nov 11, 2007 7:30 PM, Khalid Baheyeldin <kb@2bits.com> wrote:
On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
I know many will not like what I will say, but I have to say it.
A prime example of where MySQL works fine to solve an issue with a few lines is this issue http://drupal.org/node/83738. All the huge changes and jumping through hoops is because of accomodating PostgreSQL
We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Regards,
Karoly Negyesi
-- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
PDO will only abstract the API for us. It will not help us abstract query syntax or solve the fact that there is no such thing as "SQL", just vaguely similar languages called MySQL, PostgreSQL, SQLite, Oracle, etc. On Sunday 11 November 2007, Ken Rickard wrote:
I would agree with Khalid, actually. Having rock-solid code on a popular platform seems ideal, at least until something like PDO is ready to handle database abstraction for us.
I only installed pgSQL to be a "responsible" module maintainer after my requests for pgSQL testers went unanswered.
At work, we're entirely MySQL.
- Ken Rickard
On Nov 11, 2007 7:30 PM, Khalid Baheyeldin <kb@2bits.com> wrote:
On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
I know many will not like what I will say, but I have to say it.
A prime example of where MySQL works fine to solve an issue with a few lines is this issue http://drupal.org/node/83738. All the huge changes and jumping through hoops is because of accomodating PostgreSQL
We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Regards,
Karoly Negyesi
-- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
-- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality Larry Garfield wrote:
PDO will only abstract the API for us. It will not help us abstract query syntax or solve the fact that there is no such thing as "SQL", just vaguely similar languages called MySQL, PostgreSQL, SQLite, Oracle, etc.
On Sunday 11 November 2007, Ken Rickard wrote:
I would agree with Khalid, actually. Having rock-solid code on a popular platform seems ideal, at least until something like PDO is ready to handle database abstraction for us.
I only installed pgSQL to be a "responsible" module maintainer after my requests for pgSQL testers went unanswered.
At work, we're entirely MySQL.
- Ken Rickard
On Nov 11, 2007 7:30 PM, Khalid Baheyeldin <kb@2bits.com> wrote:
On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
I know many will not like what I will say, but I have to say it.
A prime example of where MySQL works fine to solve an issue with a few lines is this issue http://drupal.org/node/83738. All the huge changes and jumping through hoops is because of accomodating PostgreSQL
We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Regards,
Karoly Negyesi
-- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
On Sun, 11 Nov 2007 20:14:30 -0600 George Kappel <gkappel@herrspacific.com> wrote:
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality
Agree. I think supporting one DB is not just a matter of losing support but it is a matter of loosing freedom. Once drupal lose the infrastructure for DB independence, it will be hard to put it back. Support for 2 DB is actually too few to keep the code sane. The second will just be a special case of the first rather than an abstraction. I am *ing scared of mono-cultures. No one said DB abstraction had to be easy, actually it isn't and there are plenty of DB abstraction layers out there to prove it. I was actually looking at http://www.sqlalchemy.org/ (python) to learn something. Any chance we could introduce, get inspired, rely on an existent DB abstraction layer? OK, I'll stop to pretend to be a software architect and I'm going to install D6 on pgsql now to see if I can help in anyway. Installation completed... BTW I've some older drupal sites on pgsql. The reasons I chose pgsql were historical (more mature transaction, stored procedures...). If I'd have to chose for the future I *may* reconsider to use mysql. All my pgsql sites are running smoothly even if I had to patch some modules. I could live with a mysql only D6. I'd just consider it risky for the future of drupal itself. Wasn't Microsoft interested in supporting drupal running on IIS/MSSQL? http://hojtsy.hu/blog/2007-nov-04/adventures-redmond-microsoft-open-source-a... http://buytaert.net/microsoft-and-drupal BTW as far as I know there is no way to introduce a unique constraint in pgsql without eliminating the duplicates first. For the case exposed here http://drupal.org/node/146466 mysql ALTER IGNORE TABLE {search_index} ADD UNIQUE KEY sid_word_type (sid, word, type)"); ALTER IGNORE TABLE {search_dataset} ADD UNIQUE KEY sid_type (sid, type) would turn to be pgsql delete from {search_index} where exists ( select 'x' from {search_index} i where i.sid = {search_index}.sid and i.word = {search_index}.word and i.type = {search_index}.type and i.oid < {search_index}.oid ); alter table {search_index} add constraint sid_word_type unique (sid, word, type); delete from {search_dataset} where exists ( select 'x' from {search_dataset} i where i.sid={search_dataset}.sid and i.type = {search_dataset}.type and i.oid < {search_dataset}.oid ); alter table {search_dataset} add constraint sid_type unique (sid, type); untested on drupal... tested on a *test* table, going to test further shortly. (oid is a bit of pgmagic, oid is the object id, so you're not risking to kill all the row and leave at least one instance). -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 12 Nov 2007 12:34:33 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
(oid is a bit of pgmagic, oid is the object id, so you're not risking to kill all the row and leave at least one instance).
It seems you can't access oid directly on newer postgresql. It was working in 7.4, it doesn't in 8.1. Investigating... -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 12 Nov 2007 13:15:17 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Nov 2007 12:34:33 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
(oid is a bit of pgmagic, oid is the object id, so you're not risking to kill all the row and leave at least one instance).
It seems you can't access oid directly on newer postgresql. It was working in 7.4, it doesn't in 8.1.
Investigating...
default oid assignment has been dropped from >8.0. I can't judge how much reliable this solution could be: http://archives.postgresql.org/pgsql-sql/2003-01/msg00513.php -- Ivan Sergio Borgonovo http://www.webthatworks.it
I'm here, and willing to test on postgresql.. I'm not full time but rather a weekend hacker. Please send me stuff. (Shiny on #drupal) On Mon, 2007-11-12 at 12:34 +0100, Ivan Sergio Borgonovo wrote:
On Sun, 11 Nov 2007 20:14:30 -0600 George Kappel <gkappel@herrspacific.com> wrote:
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality
Agree.
I think supporting one DB is not just a matter of losing support but it is a matter of loosing freedom. Once drupal lose the infrastructure for DB independence, it will be hard to put it back.
Support for 2 DB is actually too few to keep the code sane. The second will just be a special case of the first rather than an abstraction.
I am *ing scared of mono-cultures.
No one said DB abstraction had to be easy, actually it isn't and there are plenty of DB abstraction layers out there to prove it.
I was actually looking at http://www.sqlalchemy.org/ (python) to learn something. Any chance we could introduce, get inspired, rely on an existent DB abstraction layer?
OK, I'll stop to pretend to be a software architect and I'm going to install D6 on pgsql now to see if I can help in anyway. Installation completed...
BTW I've some older drupal sites on pgsql. The reasons I chose pgsql were historical (more mature transaction, stored procedures...). If I'd have to chose for the future I *may* reconsider to use mysql. All my pgsql sites are running smoothly even if I had to patch some modules. I could live with a mysql only D6. I'd just consider it risky for the future of drupal itself.
Wasn't Microsoft interested in supporting drupal running on IIS/MSSQL?
http://hojtsy.hu/blog/2007-nov-04/adventures-redmond-microsoft-open-source-a...
http://buytaert.net/microsoft-and-drupal
BTW as far as I know there is no way to introduce a unique constraint in pgsql without eliminating the duplicates first.
For the case exposed here http://drupal.org/node/146466
mysql ALTER IGNORE TABLE {search_index} ADD UNIQUE KEY sid_word_type (sid, word, type)"); ALTER IGNORE TABLE {search_dataset} ADD UNIQUE KEY sid_type (sid, type)
would turn to be
pgsql delete from {search_index} where exists ( select 'x' from {search_index} i where i.sid = {search_index}.sid and i.word = {search_index}.word and i.type = {search_index}.type and i.oid < {search_index}.oid );
alter table {search_index} add constraint sid_word_type unique (sid, word, type);
delete from {search_dataset} where exists ( select 'x' from {search_dataset} i where i.sid={search_dataset}.sid and i.type = {search_dataset}.type and i.oid < {search_dataset}.oid );
alter table {search_dataset} add constraint sid_type unique (sid, type);
untested on drupal... tested on a *test* table, going to test further shortly.
(oid is a bit of pgmagic, oid is the object id, so you're not risking to kill all the row and leave at least one instance).
Alright let me throw my 2 cents in here .... Again many may not like what i have to say. I have experience with both mysql and postgresql (along with SQL server and oracle). They are both pretty cool databases (though I tend to lean towards postgres, its much more mature, the command line client is a lot better, and its had a PL/SQL language for much longer then mysql) Everything I read in the previous messages about keeping DB independance I think is dead on. Any application that only supports one database shuts the door on potential users and more then likely is using some boofed up, non-standard SQL. Neither Mysql or Postgres is evil, they are just tools. Its all about how you use those tools. Same as the argument, guns don't kill people, people kill people.
From my limited experience with Drupal, its bad queries that are breaking Postgres support. ... '1' is not an int ... its a string. Don't user '%s' for every parameter you pass to db_query ...Realize there is a difference between storing a string, storing an int, storing a timestamp, etc ... and please stay away from functions or constructs that only work for one db platform.
On a side note, I've been away from the community for a little while (close to a year), and I really would like to jump back in. My jedi skill of programming have gotten a lot better (lots of jquery and lots of php), and I would like to try my hand at designing architecture or even opening up og2list to qmail, or maybe even making a second attempt at porting mantis to drupal. Thanks for you time -Trevor (tmck) On Nov 16, 2007 3:50 AM, Brenda Wallace <brenda@wallace.net.nz> wrote:
I'm here, and willing to test on postgresql.. I'm not full time but rather a weekend hacker. Please send me stuff. (Shiny on #drupal)
On Mon, 2007-11-12 at 12:34 +0100, Ivan Sergio Borgonovo wrote:
On Sun, 11 Nov 2007 20:14:30 -0600 George Kappel <gkappel@herrspacific.com> wrote:
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality
Agree.
I think supporting one DB is not just a matter of losing support but it is a matter of loosing freedom. Once drupal lose the infrastructure for DB independence, it will be hard to put it back.
Support for 2 DB is actually too few to keep the code sane. The second will just be a special case of the first rather than an abstraction.
I am *ing scared of mono-cultures.
No one said DB abstraction had to be easy, actually it isn't and there are plenty of DB abstraction layers out there to prove it.
I was actually looking at http://www.sqlalchemy.org/ (python) to learn something. Any chance we could introduce, get inspired, rely on an existent DB abstraction layer?
OK, I'll stop to pretend to be a software architect and I'm going to install D6 on pgsql now to see if I can help in anyway. Installation completed...
BTW I've some older drupal sites on pgsql. The reasons I chose pgsql were historical (more mature transaction, stored procedures...). If I'd have to chose for the future I *may* reconsider to use mysql. All my pgsql sites are running smoothly even if I had to patch some modules. I could live with a mysql only D6. I'd just consider it risky for the future of drupal itself.
Wasn't Microsoft interested in supporting drupal running on IIS/MSSQL?
http://hojtsy.hu/blog/2007-nov-04/adventures-redmond-microsoft-open-source-a...
http://buytaert.net/microsoft-and-drupal
BTW as far as I know there is no way to introduce a unique constraint in pgsql without eliminating the duplicates first.
For the case exposed here http://drupal.org/node/146466
mysql ALTER IGNORE TABLE {search_index} ADD UNIQUE KEY sid_word_type (sid, word, type)"); ALTER IGNORE TABLE {search_dataset} ADD UNIQUE KEY sid_type (sid, type)
would turn to be
pgsql delete from {search_index} where exists ( select 'x' from {search_index} i where i.sid = {search_index}.sid and i.word = {search_index}.word and i.type = {search_index}.type and i.oid < {search_index}.oid );
alter table {search_index} add constraint sid_word_type unique (sid, word, type);
delete from {search_dataset} where exists ( select 'x' from {search_dataset} i where i.sid={search_dataset}.sid and i.type = {search_dataset}.type and i.oid < {search_dataset}.oid );
alter table {search_dataset} add constraint sid_type unique (sid, type);
untested on drupal... tested on a *test* table, going to test further shortly.
(oid is a bit of pgmagic, oid is the object id, so you're not risking to kill all the row and leave at least one instance).
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality
If pigs can fly they are better animals.Maybe you want to support your statement with something. Though I said that "has a decent hope (use common sense) to work on postgresql". Once it's committed those who care about postgresql, if
We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Amen brother. I am happy which simply removes LOWER from those queries and kicks the ball to the postgresql half of the playground. As I stated on my blog I have nothing against posgresql just I do not want to deal / held back with it. Any support for this solution?
Quoting Karoly Negyesi <karoly@negyesi.net>:
I would think this is a slippery slope, making sure a patch work against at least 2 db backends in a reasonable way is an important indication of quality
If pigs can fly they are better animals.Maybe you want to support your statement with something.
Though I said that "has a decent hope (use common sense) to work on postgresql". Once it's committed those who care about postgresql, if
We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity.
Amen brother. I am happy which simply removes LOWER from those queries and kicks the ball to the postgresql half of the playground. As I stated on my blog I have nothing against posgresql just I do not want to deal / held back with it. Any support for this solution?
IMO, we need to focus on Drupal SQL[1] and not MySql SQL or Postgresql SQL or Oracle SQL for coding purposes. Another layer of abstraction handles the nitty gritty of each different DB. AFAIR, the code prior to 6 was both MySql and PgSql friendly. So making a change in a different direction for Drupal 6 doesn't seem TRT to me. However if Drupal 5 wasn't as PgSql friendly as I remember then I agree with Karoly. For Drupal 7 I would like to see Drupal SQL handle all of the syntax changes needed for various DB engines; that means no more testing of $db_type outside of the Drupal SQL abstraction. Then a company with Oracle, Sybase and DB2 could display data from all three database types comfortably with Drupal. [1] http://drupal.org/node/191486 Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
IMO, we need to focus on Drupal SQL[1] and not MySql SQL or Postgresql SQL or Oracle SQL for coding purposes. Another layer of abstraction handles the nitty gritty of each different DB.
http://adodb.sourceforge.net/. We don't want to go that route. ~Konstantin
I'm just catching up on this thread, so this may have already been answered. Regarding the case insensitivy, why can't one use ILIKE during the postGres search in the SQL statement to force case-insensitive?...again, as I read more through this thread, this may have been answered. From: development-bounces@drupal.org [mailto:development-bounces@drupal.org] On Behalf Of Khalid Baheyeldin Sent: Sunday, November 11, 2007 7:31 PM To: development@drupal.org Subject: Re: [development] Do not let postgresql hold back great patches On Nov 11, 2007 5:17 PM, Karoly Negyesi <karoly@negyesi.net> wrote: In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it. Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not. I know many will not like what I will say, but I have to say it. A prime example of where MySQL works fine to solve an issue with a few lines is this issue http://drupal.org/node/83738. All the huge changes and jumping through hoops is because of accomodating PostgreSQL We introduced schema changes just because PostgreSQL cannot do case insensitive matching by default, while MySQL works fine. For the sake of 5% (or 1%) of the sites, we are increasing complexity. Regards, Karoly Negyesi -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
On 11 Nov 2007, at 23:17, Karoly Negyesi wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. Once it's committed those who care about postgresql, if they so want to, can test and if there is a need, fix it.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
No. Supporting multiple databases is a good thing and something we want to get better at. While MySQL is popular in our developer world, it /only/ has a 30% market share in the real world. Making a MySQL-only CMS is like making a Firefox-only website. It's OK for techies, but not for mam and dad. I'm not willing to drop PostgreSQL support, certainly not after we put time and effort in the schema API and now we started to scratch the surface of object relational mapping. PostgreSQL is not the ugly sister. Admittedly though, we keep waiting for her prince on a white horse. ;) -- Dries Buytaert :: http://www.buytaert.net/
I agree with the earlier statement. MySQL seems "smarter/easier" because most devel in the Drupal work use it all the time. I've been on other projects where pgSQL is the "smarter/easier" method. Multiple-DB support is just good practice for a project that is to be widely adopted and touts itself as a "flexible" CMS. There's no fine print that says "Flexible...only on Apache 2.2 running CentOS 5 with kernel X; further MySQL 5.x and PHP 5.1x and only with the following php libs installed....". We not doing data-warehousing, or complex SQL Data cubes using mutli-dimensional series slices...thus, in my very humble opinion, we SHOULD be able to support multiple database with just good clean SQL code and logic. We, as a community, should also be able to handle the "situations" that arise b/t databases types; such as this case-insensitivity issue. We all have friends across the pond ( and by pond I mean a linux guy has a windows-expert friend, a MySQL guy has a PostGres friend, etc. etc.), and could be able to cross that pond for information to help form a solution. Anyway, that's my 2 bits...well, 1.5 bits.
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql.
No. Supporting multiple databases is a good thing and something we want to get better at.
We agree.
I'm not willing to drop PostgreSQL support
Same as I answered to Barry. Somebody care to explain how my words can mean "drop postgresql support" How? I said " a decent hope (use common sense) to work on postgresql." by (use common sense) I meant "please do not debate on what's a decent hope". All I asked for is to commit something, move on and let the postgresql people test/fix once it's in thus not letting the very limited postgresql testing community hold back the whole of Drupal. In the past we committed patches that were incomplete one way or another so we could build on them. That's all I am asking here. I am working on the LOWER patch. I have carefully worded the original letter I am following up to now. I am truly baffled that despite all these efforts, people think I want to drop PostgreSQL. If you can point that out in my letters or my work, I would be grateful because surely all these great people take that motion from somewhere but where...? Maybe my blog post? But some people did not like it, so I yielded to fate and try to find solutions that are good for everyone.
Op maandag 12 november 2007, schreef Dries Buytaert:
No. Supporting multiple databases is a good thing and something we want to get better at.
I would like everyone who commented in this thread to head over to http://groups.drupal.org/node/6772 "Active Records, a possible approach for consistent Data APIs" and to read the excellent proposal in PDF [1] there. It is a very drupalish approach of an "industry standard"-way of doing db abstraction. I believe that - with your input - we can make that into something we all will love. Bèr [1] http://groups.drupal.org/files/drupal_active_records.pdf -- Drupal, Ruby on Rails and Joomla! development http://webschuur.com
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql.
Kind of like saying: "Software really only needs to support Windows. The other platforms don't matter." Except in this case our "Windows" is only 30% of the market. You need to understand that MySQL is broken in a lot of ways. A lot of the "missing support for pgsql" bugs are actually "we are depending on extremely broken and non-portable mysql behavior" bugs. If we go down the route of accepting MySQL-only code, it will be very difficult to get out.
Prime example: http://drupal.org/node/146466 this is the most important patch we have currently as it makes Drupal search speedy and nice. And it is more or less on hold just because noone is sure whether the postgresql update works or not.
Actually, that patch was broken for MySQL too (see http://drupal.org/node/192348) and got committed anyway because no one tested its upgrade path. Separate rant. Thanks, Barry
If we go down the route of accepting MySQL-only code, it will be very difficult to get out.
http://lists.drupal.org/pipermail/development/2007-November/027299.html "In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql." How can people read "Let's drop postgresql!" into my words? I am nonplussed.
"Karoly Negyesi" <karoly@negyesi.net> wrote:
If we go down the route of accepting MySQL-only code, it will be very difficult to get out.
http://lists.drupal.org/pipermail/development/2007-November/027299.html
"In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql."
How can people read
"Let's drop postgresql!"
into my words? I am nonplussed.
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it. Why would you request that patches get committed when it's "likely" that it will work with a supported backend, when it's so easy to go the extra few feet and verify? -- Bill Moran http://www.potentialtech.com
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
"Karoly Negyesi" <karoly@negyesi.net> wrote:
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
I'm not kidding you. Of course, if your distro sucks, it might not be easy, but on FreeBSD, I do make install; /usr/local/etc/rc.d/postgresql initdb; /usr/local/etc/rc.d/postgresql start If you need a video to get you through that, you're not going to be much help as a patch tester anyway ... Have you even tried installing it? You're criticizing quite a bit, but I'm not seeing any first-hand description, just a baseless dismissal. -- Bill Moran http://www.potentialtech.com
Bill Moran <wmoran@potentialtech.com> wrote:
"Karoly Negyesi" <karoly@negyesi.net> wrote:
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
Wait a minute ... allow me to retract my previous statement and get out of the trap you tricked me into. Why are you being so confrontational? You _asked_ _why_ and I gave you my opinion. Now you're arguing that my opinion is no good, but it doesn't change the fact that it's still my opinion. Stop turning this into an argument and perhaps you'll get more support. I have no desire to argue the relative difficulty of installing PostgreSQL on this list. If you want to discuss that, take it to the PostgreSQL mailing lists. -- Bill Moran http://www.potentialtech.com
On Nov 16, 2007 6:20 PM, Bill Moran <wmoran@potentialtech.com> wrote:
Bill Moran <wmoran@potentialtech.com> wrote:
"Karoly Negyesi" <karoly@negyesi.net> wrote:
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch
testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
Wait a minute ... allow me to retract my previous statement and get out of the trap you tricked me into.
Why are you being so confrontational? You _asked_ _why_ and I gave you my opinion. Now you're arguing that my opinion is no good, but it doesn't change the fact that it's still my opinion.
Stop turning this into an argument and perhaps you'll get more support. I have no desire to argue the relative difficulty of installing PostgreSQL on this list. If you want to discuss that, take it to the PostgreSQL mailing lists.
Bill, Karoly is not being confrontational. Why do you say so? His point is that we do not get enough testers IN GENERAL, let alone PostgreSQL. The user base of PostgreSQL vs. MySQL on Drupal is far in favor of MySQL, and hence testing and installed base is vastly in favor of MySQL. Yes, PostgreSQL may be a better engine, may be damn easy to install, but that is not the point. The point is having ENOUGH USER BASE WHO care to spend the time to test patches, compatibility, upgrade path, ....etc. The LOWER issue that is being refered to is this http://drupal.org/node/83738. I have voiced my concern, together with others, on adding so many columns just to get around the case sensitive of a database that is not much in use. So, Karoly is proposing that we test on MySQL, then throw the issue over the fence to the folk who use (and care about) PostgreSQL for them to handle, because of the INSTALLED BASE issue, and the TESTING RESOURCES issue. This is NOT a technology issue. -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
So, Karoly is proposing that we test on MySQL, then throw the issue over the fence to the folk who use (and care about) PostgreSQL for them to handle, because of the INSTALLED BASE issue, and the TESTING RESOURCES issue. This is NOT a technology issue.
The previous analogy of Windows is incorrect. A more proper analogy is : "I test on LAMP, because that is what I use. If others want to make Drupal work on Windows and IIS, then let them test there, and propose patches that do not put Drupal through hoops to support that. We are not specifically excluding Windows, and in fact some people use Drupal just like that". Same for PostgreSQL, MS-SQL, SQLite, ...etc. -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
"Khalid Baheyeldin" <kb@2bits.com> wrote:
The LOWER issue that is being refered to is this http://drupal.org/node/83738.
I have voiced my concern, together with others, on adding so many columns just to get around the case sensitive of a database that is not much in use.
I agree with your concern 100%. In PostgreSQL, the correct way to solve this problem would be to add an index: create index lower_name on users (LOWER(name)); Said index will then be used any time a WHERE clause references LOWER(name). This avoids the problems of adding a column. Does MySQL not have this capability? -- Bill Moran http://www.potentialtech.com
On Nov 16, 2007, at 6:06 PM, Bill Moran wrote:
In PostgreSQL, the correct way to solve this problem would be to...
a) Please read the issue. This approach was proposed months ago and discussed at length. b) Please don't reply to the issue on this mailing list. It splinters the conversation, and spams hundreds of developers who might not care about LOWER() performance problems). Reply in the issue so that all the replies are in 1 place and so that people who care can find it, and people who don't won't be bothered. ;) Thanks, -Derek (dww)
"Khalid Baheyeldin" <kb@2bits.com> wrote:
On Nov 16, 2007 6:20 PM, Bill Moran <wmoran@potentialtech.com> wrote:
Bill Moran <wmoran@potentialtech.com> wrote:
"Karoly Negyesi" <karoly@negyesi.net> wrote:
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch
testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
Wait a minute ... allow me to retract my previous statement and get out of the trap you tricked me into.
Why are you being so confrontational? You _asked_ _why_ and I gave you my opinion. Now you're arguing that my opinion is no good, but it doesn't change the fact that it's still my opinion.
Stop turning this into an argument and perhaps you'll get more support. I have no desire to argue the relative difficulty of installing PostgreSQL on this list. If you want to discuss that, take it to the PostgreSQL mailing lists.
Bill,
Karoly is not being confrontational. Why do you say so?
When these kinds of things happen, I'm torn between defending my viewpoint and letting it drop for the sake of peace and quiet. In this case, I will state my reason then let it drop. The conversation went as follows: Karoly: Why is everyone against my viewpoint. Me: I think it's because ... Karoly: You're wrong. If he didn't want my opinion, why did he ask? It felt confrontational and pointlessly argumentative to me, and it also felt like it was about to degrade into "MySQL vs. PostgreSQL". That's how I saw it. If others saw it differently, then that's their viewpoint. -- Bill Moran http://www.potentialtech.com
On Nov 16, 2007 9:11 PM, Bill Moran <wmoran@potentialtech.com> wrote:
"Khalid Baheyeldin" <kb@2bits.com> wrote:
On Nov 16, 2007 6:20 PM, Bill Moran <wmoran@potentialtech.com> wrote:
Bill Moran <wmoran@potentialtech.com> wrote:
"Karoly Negyesi" <karoly@negyesi.net> wrote:
In my opinion, the reason for that is that it's sooo damn easy to install PostgreSQL and test a patch against it.
You are kidding me. First of all, we do not even have enough patch
testers. Second, damn easy to install PostgreSQL? Really? Where is the handbook page? The video?
Wait a minute ... allow me to retract my previous statement and get out of the trap you tricked me into.
Why are you being so confrontational? You _asked_ _why_ and I gave you my opinion. Now you're arguing that my opinion is no good, but it doesn't change the fact that it's still my opinion.
Stop turning this into an argument and perhaps you'll get more support. I have no desire to argue the relative difficulty of installing PostgreSQL on this list. If you want to discuss that, take it to the PostgreSQL mailing lists.
Bill,
Karoly is not being confrontational. Why do you say so?
When these kinds of things happen, I'm torn between defending my viewpoint and letting it drop for the sake of peace and quiet. In this case, I will state my reason then let it drop.
The conversation went as follows: Karoly: Why is everyone against my viewpoint. Me: I think it's because ... Karoly: You're wrong.
If he didn't want my opinion, why did he ask? It felt confrontational and pointlessly argumentative to me, and it also felt like it was about to degrade into "MySQL vs. PostgreSQL".
That's how I saw it. If others saw it differently, then that's their viewpoint.
I think we are going in circles here. His very point is that HE IS NOT ADVOCATING that Drupal become a MySQL only monoculture. Some thought it to be so, but it is not. Nor do I believe that anyone supports that for that matter. His point, that the lack of resources behind PostgreSQL is delaying the acceptance of patches that work in MySQL, thus holding Drupal back. I believe I covered this point in my previous two emails. Let those who use PostgreSQL put the effort of testing the patches, not hold everything back. -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
On Friday 16 November 2007, Khalid Baheyeldin wrote:
I think we are going in circles here.
His very point is that HE IS NOT ADVOCATING that Drupal become a MySQL only monoculture. Some thought it to be so, but it is not. Nor do I believe that anyone supports that for that matter.
His point, that the lack of resources behind PostgreSQL is delaying the acceptance of patches that work in MySQL, thus holding Drupal back.
I believe I covered this point in my previous two emails. Let those who use PostgreSQL put the effort of testing the patches, not hold everything back.
I expect this issue to become much bigger come Drupal 7. There's talk of getting MS SQL support in core (courtesy Microsoft), Oracle in core, SQLite in core... Even if we can support all of those different databases at the same time, maintaining support for them is going to be a huge pain. 99% of people reviewing patches are using MySQL. Will we say that no patch with a query can be committed until one of the three people who uses Oracle with Drupal has tested it? Will we not be able to commit a patch until Microsoft confirms it doesn't break MS SQL? Perhaps that's a bit hyperbolic, but since, for example, I am completely incapable of testing against MS SQL (I don't have a Windows computer capable of running it, even if I was willing to pay for it) and I suspect many other people are, this issue is going to crop up more and more, and we are going to need to figure out how we're going to address it. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On Fri, 16 Nov 2007 19:19:57 -0500 "Khalid Baheyeldin" <kb@2bits.com> wrote:
So, Karoly is proposing that we test on MySQL, then throw the issue over the fence to the folk who use (and care about) PostgreSQL for them to handle, because of the INSTALLED BASE issue, and the TESTING RESOURCES issue. This is NOT a technology issue.
That doesn't make it a good decision. Again... putting aside the ignorance of someone about postgres, forgetting completely the topic about what's better, *splitting* support for the DB is not healthy for the project at this stage. The path should be to make the DB abstraction layer more DB agnostic and make developer more aware there is not just mysql. Everything will look terrible if you design for mysql and then try to port it to any other DB. This increase the inertia of moving to a more agnostic DB layer and supporting anything else than mysql. So "commit something, move on and let the postgresql people test/fix" is not a solution. It is a dead end. If the few pg people will stop to bother you at *every step* till we won't have a DB abstraction layer and some way to deal with DB sensible patches), mysql people will become even less aware there are other DB. As a side note pg dev generally tend to know more about mysql than mysql dev tend to know about pg. When pg people will come out and say: "what kind of garbage did you wrote? It can't be ported to pg unless we...." what you'll end up with will be the mysql people saying "we can't mess with the code for a DB that have just [put ridiculously low unsupported percentage here] market share". And things will get worse. It is NOT the way you build the premises for a decent SPI! Larry Garfield made a good point too. But until you've the tools (abstraction layer that should mitigate the problem quite substantially and tune the patch system to make dev aware of "DB" sensible patches) committing in the hope things will work on pg is not going to make things better. If you lower the support level for pgsql now you'll never have a sane DB abstraction layer and sooner or later you'll support just one DB. pgsql is just what we have... it could be anything else *reasonably different* from mysql. Supporting 2 "anything" generally is not enough to build up a sane SPI. What you end up is: case A is supported and case B is special case of A, when you need to support case C everything is so entangled that you realise trying to build a SPI that way was wasted time. Once drupal become a one db application it will be extremely hard to go back. This will "hold drupal back" much more than supporting pgsql upfront. MS seemed to be interested in supporting drupal (is it?). They should be interested to support MS SQL as well, that should help to have a more agnostic db layer. (Larry... maybe they can give you a couple of licenses of SQL 200X ;) but they won't provide the exorcist ). As for me I installed D6 on pgsql. I'll try to learn how to contribute the "drupalish" way (cvs, issues etc...). I posted some patch for postgres mainly for modules in the past. I'll try to help with beta and rc too. As for the lower() problem I can't see why a "lowered" index can't get in. Again... it is a matter of good design and *freedom*. Maybe mysql will be where pg is now, maybe it won't, maybe you can't turn php into python in 1 or 2 years as you can't turn mysql in pg. I don't care. Fortunately we have mysql, and postgres and sqlite and firebird and ... to chose from. ************* It is not a matter of pg vs mysql. ************* I don't want to bet on others projects blindly, I don't like mono-cultures. DB abstraction is a known problem and everybody know adding one more layer has impact on performance. We have to live with it. -- Ivan Sergio Borgonovo http://www.webthatworks.it
When pg people will come out and say: "what kind of garbage did you wrote? It can't be ported to pg unless we...."
Let's get back to the original post: In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql. So, we are just skipping the final fine tune / testing steps. We do write garbge nor do we commit MysQL only code. I am willing (and I always did) to peek into the postgresql manual to see whether it has a solution to any problem at hand. This is not the same as actually writing the thing and testing but it gives, you know, a decent hope, that it will work on postgresql.
Question: Should this approach be taken with Javascript-related code on any non-Firefox, non-IE browsers? --Jeff On Nov 17, 2007, at 7:38 AM, Karoly Negyesi wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql.
So, we are just skipping the final fine tune / testing steps. We do write garbge nor do we commit MysQL only code. I am willing (and I always did) to peek into the postgresql manual to see whether it has a solution to any problem at hand. This is not the same as actually writing the thing and testing but it gives, you know, a decent hope, that it will work on postgresql.
On Nov 17, 2007, at 7:38 AM, Karoly Negyesi wrote:
In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql.
On Saturday, 17. November 2007, Jeff Eaton wrote:
Question: Should this approach be taken with Javascript-related code on any non-Firefox, non-IE browsers?
I think the main point to consider here is: Will such issues hold back a release (i.e. Postgres support is still a critical issue - "deferred, but not dropped") or will it be released with Postgres left in an unfunctionable state (or, for that matter, Opera/Safari/Konqueror for JavaScript stuff)? The latter would be where "dropping support" starts. Personally, I see no problem with the former.
Question: Should this approach be taken with Javascript-related code on any non-Firefox, non-IE browsers?
I would exercise caution here. While I can very see how a given patch might get an untested postgresql version just based on the postgresql manual, this gets very tricky with browsers. The decent hope here is greatly deceiving...
It's easier to find work-arounds and translations from standard SQL to MySQL syntax, and do the same thing to translate to another reasonable database (Posgres, Oracle, MSSQL), than it is to reverse translate MySQL (or other database) specific features to other databases. Standard SQL is the sort of the greatest common factor (in the mathematical sense). Even more significantly, hundreds of MySQL people have spent significant effort in coming up with standard SQL equivalents in MySQL, so most of that work is already done for us. Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized. I myself always use MySQL for my Drupal installs -- because it will work. Thus, I never test them with Postgres, even though I run Postgres on several systems, am very comfortable with using it, etc. I just don't have the time to hassle with discovering some obscure SQL bug resulting from database incompatabilitilies when I'm just trying to make a site work. If I were able to devote more time to just testing Drupal for the sake of testing it and with no further purpose, I could and would use Postgres more of the time. But even then -- if my goal were to test the new menu code, I'd want to eliminate any other possible error causing sources, such as using Postgres. So unless we actually push the issue of supporting multiple databases as a community, the vast majority of developers will never make the effort. And with each passing release spent in that mode, the harder it will become to reverse it. We will end up supporting only one database, MySQL, de facto.
Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized.
Good. I do not know about open source but http://developer.mimer.com/validator/index.htm offers what you want.
2007/11/28, Karoly Negyesi <karoly@negyesi.net>:
Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized.
Good. I do not know about open source but http://developer.mimer.com/validator/index.htm offers what you want.
Mimer also provide a free webservice [1] for validating SQL against ISO standards. This webservice has been implemented by two opensource projects: phpMyAdmin [2] and SQuirreL SQL [3]. Can we implement this on devel module? AFAIK it uses SOAP. [1] http://sqlvalidator.mimer.com/index.html [2] http://www.phpmyadmin.net [3] http://squirrel-sql.sourceforge.net Cheers, Henrique
On Wed, 28 Nov 2007 11:48:23 +0100 "Chris Johnson" <cxjohnson@gmail.com> wrote:
It's easier to find work-arounds and translations from standard SQL to MySQL syntax, and do the same thing to translate to another reasonable database (Posgres, Oracle, MSSQL), than it is to reverse translate MySQL (or other database) specific features to other databases.
Agree.
Standard SQL is the sort of the greatest common factor (in the mathematical sense). Even more significantly, hundreds of MySQL people have spent significant effort in coming up with standard SQL equivalents in MySQL, so most of that work is already done for us.
Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized.
On the long run this doesn't give space for optimisations. People should write SQL with standards in mind and not MySQL in mind but the DB abstraction layer should leave a door open for optimisation. There should be a default path that doesn't impact on performances (including both mysql and pgsql code) that let module writers write custom code for a particular DB *if* they are willing to do so. There was a thread that share some aspect with this problem about "numbers of loaded module and performance" and guidelines for module developers.
I myself always use MySQL for my Drupal installs -- because it will work. Thus, I never test them with Postgres, even though I run Postgres on several systems, am very comfortable with using it, etc. I just don't have the time to hassle with discovering some obscure SQL bug resulting from database incompatabilitilies when I'm just trying to make a site work.
I use pgsql for drupal and till now I installed stable versions of drupal where most stuff was working out of the box in core (there was a minor issue in length of watchdog messages where MySQL silently trim stuff while pg complains) and fixes for the modules I use were trivial. I've to learn how to use the drupal "patch/issues" infrastructure to give more chances the patch I made get into modules/core smoother. I still have to read carefully all the coding standards.
So unless we actually push the issue of supporting multiple databases as a community, the vast majority of developers will never make the effort. And with each passing release spent in that mode, the harder it will become to reverse it. We will end up supporting only one database, MySQL, de facto.
Absolutely agree. But it is a problem of chicken and egg. dev don't care cos there isn't enough abstraction in DB layer and the code is too mysqlish, the code continue to be that way because most dev are surmounted/ignore how to really deal with multiple DB. I sympathise with Chris Johnson. I understand the frustration of core dev when they have to deal with the latitancy/scarcity of pg dev trying to improve/give suggestions etc... about performance on pg. But again this is a problem of chicken and egg. Now everything that is going to change/add work to the MySQL side is seen as an annoyance because it is actually complicated to improve stuff for pg without degrading performances for MySQL. In the light of the fact that MySQL is going to have the same features pg had for years, I would think it would be a good bet to improve the abstraction layer for BOTH so to be able to use the most advanced features that BOTH will have. Anyone that didn't deal with MyISAM only know there are places where some of these features can improve performances and code writing *a lot*. I think most people here know there is no silver bullet when you've to deal with flexibility and performance and DB abstraction layers are not an exception. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Nov 28, 2007 1:45 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Wed, 28 Nov 2007 11:48:23 +0100 "Chris Johnson" <cxjohnson@gmail.com> wrote:
Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized.
On the long run this doesn't give space for optimisations. People should write SQL with standards in mind and not MySQL in mind but the DB abstraction layer should leave a door open for optimisation. There should be a default path that doesn't impact on performances (including both mysql and pgsql code) that let module writers write custom code for a particular DB *if* they are willing to do so.
Well, actually I'm very much in favor of optimized SQL, because I'm very much in favor of optimized Drupal. I hate software bloat. So what I really think -- and this is a truly unpopular position -- is that most of the SQL should be written by "core" SQL coders, and the rest of us should be accessing data through object abstractions, i.e. get me a node, get me a user. Underneath is the highly optimized, shiney, sparkly, incredibly brilliant PHP and SQL code written by the 'database access guru core team'. Ok, I'll put the crack pipe down now. My goal with the above suggestion was just to make it easier for people to write the standard SQL to MySQL || PGSQL || MS SQL || Oracle abstraction-layer translation stuff by helping authors avoid difficult to translate non-standard-isms. If developers can avoid creating modules which depend on unique special extension X in MySQL or special extension Y in Postgres (unless they really meant to), then we'd be better off in our question to support more than one database. I suppose I have not made things much clearer. For some reason, I'm plagued with complicated phraseology today. :-/
In response to "Chris Johnson" <cxjohnson@gmail.com>:
On Nov 28, 2007 1:45 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Wed, 28 Nov 2007 11:48:23 +0100 "Chris Johnson" <cxjohnson@gmail.com> wrote:
Maybe there is an open-source tool which can validate standard SQL which could be used. Then our DB abstraction could require a verifiable standard SQL syntax which would then be translated (where necessary) into the MySQL, Oracle, Postgres, etc. equivalents. This might only be for CRUD operations (data manipulation statements) and not for DDL, for which we already have a reasonable API and which are much less standardized.
On the long run this doesn't give space for optimisations. People should write SQL with standards in mind and not MySQL in mind but the DB abstraction layer should leave a door open for optimisation. There should be a default path that doesn't impact on performances (including both mysql and pgsql code) that let module writers write custom code for a particular DB *if* they are willing to do so.
Well, actually I'm very much in favor of optimized SQL, because I'm very much in favor of optimized Drupal. I hate software bloat.
So what I really think -- and this is a truly unpopular position -- is that most of the SQL should be written by "core" SQL coders, and the rest of us should be accessing data through object abstractions, i.e. get me a node, get me a user. Underneath is the highly optimized, shiney, sparkly, incredibly brilliant PHP and SQL code written by the 'database access guru core team'.
Ok, I'll put the crack pipe down now.
My goal with the above suggestion was just to make it easier for people to write the standard SQL to MySQL || PGSQL || MS SQL || Oracle abstraction-layer translation stuff by helping authors avoid difficult to translate non-standard-isms. If developers can avoid creating modules which depend on unique special extension X in MySQL or special extension Y in Postgres (unless they really meant to), then we'd be better off in our question to support more than one database.
I suppose I have not made things much clearer. For some reason, I'm plagued with complicated phraseology today. :-/
I think you should smoke some more of that crack pipe. If you look at an OS (Linux or BSD or whatever) you'll see the kernel divided into machine dependent and machine independent parts. For example, you can't boot an ARM processor the same way you boot an i386 CPU, but most of the other things the kernel does can come from the same code. I expect that the right way to make Drupal _truly_ database neutral is to have database-dependent and database-independent code. You'd only see this kind of thing at the lower levels, and higher logic code shouldn't even care about it. Take, for example, that COUNT(*) is _really_ fast on MySQL + MyISAM, but is slow on just about every other DB. So, replace it with $db->count(), which does an SQL COUNT(*) on MySQL + MyISAM, but uses some other technique on other databases. The programmer no longer cares _how_ such a thing gets done, he only cares that he wants to know how many rows are in a particular query -- Bill Moran http://www.potentialtech.com
Quoting Bill Moran <wmoran@potentialtech.com>:
If you look at an OS (Linux or BSD or whatever) you'll see the kernel divided into machine dependent and machine independent parts. For example, you can't boot an ARM processor the same way you boot an i386 CPU, but most of the other things the kernel does can come from the same code.
I expect that the right way to make Drupal _truly_ database neutral is to have database-dependent and database-independent code. You'd only see this kind of thing at the lower levels, and higher logic code shouldn't even care about it.
Drupal SQL http://drupal.org/node/191486
Take, for example, that COUNT(*) is _really_ fast on MySQL + MyISAM, but is slow on just about every other DB. So, replace it with $db->count(), which does an SQL COUNT(*) on MySQL + MyISAM, but uses some other technique on other databases. The programmer no longer cares _how_ such a thing gets done, he only cares that he wants to know how many rows are in a particular query
And we can optimize in the abstraction. We need to trim the number of places to worry with optimization. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
On Nov 30, 2007 9:47 AM, Bill Moran <wmoran@potentialtech.com> wrote:
I expect that the right way to make Drupal _truly_ database neutral is to have database-dependent and database-independent code. You'd only see this kind of thing at the lower levels, and higher logic code shouldn't even care about it.
The alternatives are: 1. Calling modules would do 'case' statements for each type of database. No one wants that, unless it is a highly optimized site with lots of customization ( i.e. not the normal Drupal core). 2. Use the lowest common denominator for SQL and live with performance and scalability issues. 3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above). I am for #3 as the way forward. -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
On Sunday 02 December 2007, Khalid Baheyeldin wrote:
On Nov 30, 2007 9:47 AM, Bill Moran <wmoran@potentialtech.com> wrote:
I expect that the right way to make Drupal _truly_ database neutral is to have database-dependent and database-independent code. You'd only see this kind of thing at the lower levels, and higher logic code shouldn't even care about it.
The alternatives are:
1. Calling modules would do 'case' statements for each type of database. No one wants that, unless it is a highly optimized site with lots of customization ( i.e. not the normal Drupal core).
2. Use the lowest common denominator for SQL and live with performance and scalability issues.
3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above).
I am for #3 as the way forward.
#3 is what I'm pushing for in the D7 database overhaul, within reason. How to abstract everything without making the syntax insane or the performance terrible is what I'm currently trying to wrap my head around. The biggest problem is field type handling, which I want to keep away from module developers but varies greatly depending on the database. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On 03 Dec 2007, at 6:38 AM, Larry Garfield wrote:
#3 is what I'm pushing for in the D7 database overhaul, within reason. How to abstract everything without making the syntax insane or the performance terrible is what I'm currently trying to wrap my head around. The biggest problem is field type handling, which I want to keep away from module developers but varies greatly depending on the database.
and the biggest issue here is date / time types. it's fscking crazy how different these are between databases.
On Monday 03 December 2007, adrian rossouw wrote:
On 03 Dec 2007, at 6:38 AM, Larry Garfield wrote:
#3 is what I'm pushing for in the D7 database overhaul, within reason. How to abstract everything without making the syntax insane or the performance terrible is what I'm currently trying to wrap my head around. The biggest problem is field type handling, which I want to keep away from module developers but varies greatly depending on the database.
and the biggest issue here is date / time types.
it's fscking crazy how different these are between databases.
Actually, that one I think I have a solution for, or the beginnings of one. In PHP 5, there is the DateTime class to represent a date/time in some timezone. It's great, but not a proper value object. Writing a value object around DateTime is easy, however, and I've already done one. So, each database driver defines internally its own format string to convert a DateTime/DateValue class into the string format it expects for a date/time stamp. The database driver then detects the presence of an object as an argument to a query/prepared statement and formats it to its desired format, which is then inserted into the query. That works for putting a date/time value into a query (be it insert, update, delete, or select). For reading, I believe we can use schema API to check if a field is a timestamp field and then cast it back to a DateValue object. What the performance implication of that is, though, I do not know. It may be something that we just have to swallow if we want real database independence, along with always using an object to represent a date/time (which I think we should be doing anyway). As for in-SQL operations on the date value, like MONTH() or YEAR() in MySQL? If someone has an idea for those that doesn't involve regexing every query I would dearly love to hear it. :-) -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On Mon, 3 Dec 2007 00:49:01 -0600 Larry Garfield <larry@garfieldtech.com> wrote:
As for in-SQL operations on the date value, like MONTH() or YEAR() in MySQL? If someone has an idea for those that doesn't involve regexing every query I would dearly love to hear it. :-)
EXTRACT both in mysql and pg manual. There is not such a beast in MS SQL[*]. Something to get around it in MS SQL could be DATEPART (there is a DATE_PART in mysql and pg if I remember right). Anyway you can't handle all the functions avoiding search&replace. a) we can chose some popular functions to have a good optimisation b) if queries will be built from objects they should be more regular so search&replace could be done without regexp Anyway I really would involve in such decisions people that already did the MS/Oracle/DB2 ports. If we were doing a "general" abstraction layer we could just borrow a more mature one. Unfortunately PHP is not compiled (sort of). [*] as it looks clear... DB abstraction should span more than 2 DB -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Dec 3, 2007 1:49 AM, Larry Garfield <larry@garfieldtech.com> wrote:
As for in-SQL operations on the date value, like MONTH() or YEAR() in MySQL? If someone has an idea for those that doesn't involve regexing every query I would dearly love to hear it. :-)
The current use of UNIX timestamp is a compromise, and has its issues, but still allows in-SQL operations. In-SQL operations is very very important. Going to ActiveRecord type of abstraction without the ability to do aggregate operations on the data is just not an option. Without in-SQL, doing things like: select n.nid, n.type, n.title, count(*) as num_votes, avg(value) as avg_vote from votingapi_vote v inner join node n on content_id = n.nid where n.type = 'image' and n.created between unix_timestamp('2007-11-01 00:00:00') and unix_timestamp('2007-11-30 23:59:59') group by v.content_id having num_votes
1 order by avg_vote desc, num_votes desc limit 20;
Would be impossible, or would entail very expensive full table scans.
On Mon, 3 Dec 2007 18:42:56 -0500, "Khalid Baheyeldin" <kb@2bits.com> wrote:
On Dec 3, 2007 1:49 AM, Larry Garfield <larry@garfieldtech.com> wrote:
As for in-SQL operations on the date value, like MONTH() or YEAR() in MySQL? If someone has an idea for those that doesn't involve regexing every
query
I would dearly love to hear it. :-)
The current use of UNIX timestamp is a compromise, and has its issues, but still allows in-SQL operations.
In-SQL operations is very very important. Going to ActiveRecord type of abstraction without the ability to do aggregate operations on the data is just not an option.
Without in-SQL, doing things like:
select n.nid, n.type, n.title, count(*) as num_votes, avg(value) as avg_vote from votingapi_vote v inner join node n on content_id = n.nid where n.type = 'image' and n.created between unix_timestamp('2007-11-01 00:00:00') and unix_timestamp('2007-11-30 23:59:59') group by v.content_id having num_votes
1 order by avg_vote desc, num_votes desc limit 20;
Would be impossible, or would entail very expensive full table scans.
Except that unix_timestamp() is MySQL-specific, isn't it? --Larry Garfield
Yes, but it is one level above native date/time fields. All databases have equivalent functions for date math and the like (EXTRACT, INTERVAL, ...etc). My point was the need for in-SQL operations for queries, which no matter how you abstract the individual CRUD (we already have user_load() and node_load() for that), you still need in-SQL for reporting and aggreggation. On Dec 3, 2007 7:40 PM, Larry Garfield <larry@garfieldtech.com> wrote:
On Mon, 3 Dec 2007 18:42:56 -0500, "Khalid Baheyeldin" <kb@2bits.com> wrote:
On Dec 3, 2007 1:49 AM, Larry Garfield <larry@garfieldtech.com> wrote:
As for in-SQL operations on the date value, like MONTH() or YEAR() in MySQL? If someone has an idea for those that doesn't involve regexing every
query
I would dearly love to hear it. :-)
The current use of UNIX timestamp is a compromise, and has its issues, but still allows in-SQL operations.
In-SQL operations is very very important. Going to ActiveRecord type of abstraction without the ability to do aggregate operations on the data is just not an option.
Without in-SQL, doing things like:
select n.nid, n.type, n.title, count(*) as num_votes, avg(value) as avg_vote from votingapi_vote v inner join node n on content_id = n.nid where n.type = 'image' and n.created between unix_timestamp('2007-11-01 00:00:00') and unix_timestamp('2007-11-30 23:59:59') group by v.content_id having num_votes
1 order by avg_vote desc, num_votes desc limit 20;
Would be impossible, or would entail very expensive full table scans.
Except that unix_timestamp() is MySQL-specific, isn't it?
--Larry Garfield
-- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
On Monday 03 December 2007, Khalid Baheyeldin wrote:
Yes, but it is one level above native date/time fields.
All databases have equivalent functions for date math and the like (EXTRACT, INTERVAL, ...etc).
My point was the need for in-SQL operations for queries, which no matter how you abstract the individual CRUD (we already have user_load() and node_load() for that), you still need in-SQL for reporting and aggreggation.
Agreed. Doing that in a database independent fashion is the $64,000 question. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
Quoting Larry Garfield <larry@garfieldtech.com>:
Except that unix_timestamp() is MySQL-specific, isn't it?
Isn't it preferred to use the ansi-sql92 TIMESTAMP column and allow the DB engine to do TRT with it? Then we can use PHP5 functions variant_date_from_timestamp and variant_date_to_timestamp to convert it. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
adrian rossouw wrote:
On 03 Dec 2007, at 6:38 AM, Larry Garfield wrote:
#3 is what I'm pushing for in the D7 database overhaul, within reason. How to
abstract everything without making the syntax insane or the performance
terrible is what I'm currently trying to wrap my head around. The biggest
problem is field type handling, which I want to keep away from module
developers but varies greatly depending on the database.
and the biggest issue here is date / time types.
it's fscking crazy how different these are between databases.
I could have swore that some where I mentioned creating query constructs with the syntatical properties of SQL statements and mapping them to phrases... Now that I have my old servers back in hand the idea might be found at http://darrelopry.darkstyles.doesntexist.com/node/63... While that was specifically geared toward the idea of schema... I'm sure the idea of mapping syntactical and grammatical abstractions to phrases can be applied to queries themselves... and with a little introspection into the query construct individual database engines could try to output optimized queries to their databases. see also mentions of this concept in machine translation projects aka food for thought: http://www.linguistics.pitt.edu/news-events/flyers/Aranovich.pdf http://books.google.com/books?id=SZwjb_56JzkC&pg=PA118&lpg=PA118&dq=language... http://publications.csail.mit.edu/abstracts/abstracts05/jsylee/jsylee.html The analysis, transfer, generation approach seems to be pretty common in machine translation projects. Which is technically what we're doing. There are of corners that can be cut since SQL is limited in scope and has a strongly defined structure compared to natural language. We can skip the analysis part, since we as developers are doing the analysis to create the queries, the query construct is the transfer/abstraction of the syntactical structure of the query and the generation is the responsibility of the database specific abstraction layer. just one approach I like... it may be way overkill...
On Mon, 03 Dec 2007 01:52:13 -0500 Darrel O'Pry <dopry@thing.net> wrote:
adrian rossouw wrote:
On 03 Dec 2007, at 6:38 AM, Larry Garfield wrote:
#3 is what I'm pushing for in the D7 database overhaul, within reason. How to
abstract everything without making the syntax insane or the performance
terrible is what I'm currently trying to wrap my head around. The biggest
problem is field type handling, which I want to keep away from module
developers but varies greatly depending on the database.
and the biggest issue here is date / time types.
it's fscking crazy how different these are between databases.
I could have swore that some where I mentioned creating query constructs with the syntatical properties of SQL statements and mapping them to phrases...
If by query construct you mean a string... as Larry on his website (somewhere... connection is not that good right now) it is not the most efficient. Un-serialised stuff (program logic) -> serialized stuff (sql alike construct) -> unserialized stuff (sql parser). What have been done in schema api is more on the spot. You keep the different part of the query separated and let the DB specific engine glue them. That way you skip the analysis and the "transfer" part is a sort of compilation, but generally much easier. You'd still have to provide a common way to pump custom code directly in the DB depending on the db_type that's not so ugly as a switch and that provide a fall-back. You've some added feature too... since the "compiler" understand a bit of SQL you could avoid some SQL injection too (even if this as I suspect is what Larry define "The biggest problem is field type handling".) But... this will require a lot of rewriting of modules. While there is no special logic in building up tables, and you generally don't do it on the fly, passing objects in spite of strings is a quite different thing. As a side note objects may offer some extra caching for stuff like record count and paging. Something that seems critical in terms of optimisation across several DB.
Now that I have my old servers back in hand the idea might be found
the old server looks buys ;)
at http://darrelopry.darkstyles.doesntexist.com/node/63... While
-- Ivan Sergio Borgonovo http://www.webthatworks.it
Quoting Khalid Baheyeldin <kb@2bits.com>:
3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above).
I am for #3 as the way forward.
Yes, it is the way to move forward. Doing these things anywhere else just causes the frustration demonstrated by this thread. We end up with one way to code the data selections, table creations, etc within everything but the Drupal SQL [1] abstraction engine. Most of it is already taken care of, we just need to formalize, tweak (read patch) and document. [1] http://drupal.org/node/191486 Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
On Mon, 03 Dec 2007 08:07:53 -0500 Earnie Boyd <earnie@users.sourceforge.net> wrote:
Quoting Khalid Baheyeldin <kb@2bits.com>:
3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above).
I am for #3 as the way forward.
Yes, it is the way to move forward. Doing these things anywhere else just causes the frustration demonstrated by this thread. We end up with one way to code the data selections, table creations, etc within everything but the Drupal SQL [1] abstraction engine. Most of it is already taken care of, we just need to formalize, tweak (read patch) and document.
There are plenty DB abstraction layers and ORM. I'd hate to redo the work in a "not so smart way". At the lowest level you can find things like PDO, adodb etc... with no surprise at the higher level you can find things like SQLAlchemy, Geniusql, storm all written in python. PHP just recently start to resemble an OO language. Stuff like ORM are too general beasts, to be fun to use they need a stronger OO language and carry an overhead most people using Drupal won't be happy to live with. RoR has ActiveRecord but RoR stays at a lower level in the pyramid of framework->CMS. To increase abstraction at the lowest level we can just pick up PDO. At a higher abstraction layer, as usual, it is important to guess which is the most suitable level of abstraction and generalisation to avoid to be stuck in the future while not degrading performances. I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch. As a side note without serialising, un-serialising, re-serialising, the second level of abstraction could provide more security for SQL injection since it will actually be a sort of SQL compiler rather than a regexp filter. I'd rely a lot on conventions to keep the code short and less error prone. Eg. if you're using group by, the returned fields should be in the select statement and in the group by statement in pg. If you pass fields names with a certain convention, fields will be put in the group by statement by the compiler without the need to list them twice and so on... Similarly for the distinct statement. Somehow we could even pre-compile SQL statement or save some cached info (as count, I'm testing count caching across pages to speed up paging on MS SQL). -- Ivan Sergio Borgonovo http://www.webthatworks.it
Beware the premature optimization. Saying that one way to avoid these problems is to use lowest common denominator SQL sounds awful on the surface, because -- horrors! -- we can't use all those great, unique to one database only, functions we optimize performance. But in reality, very few of the database-unique SQL syntax is really highly optimized behavior (e.g. count(*) on a MySQL MyISAM table). Most of them are just "different" ways of doing things than standard or than other databases. concat() is not faster than standard SQL ||, for instance. ;-) We "waste" far more database performance by poorly designed tables, poorly designed indexes and poorly designed algorithms than we ever gain in MySQL unique features, I'll bet. Let's be sure we focus our optimization effort in places it really helps, and not let it get in the way of making the software more portable and useable. I now surrender the soapbox.
On Mon, 3 Dec 2007 16:51:18 +0100, "Chris Johnson" <cxjohnson@gmail.com> wrote:
Let's be sure we focus our optimization effort in places it really helps, and not let it get in the way of making the software more portable and useable.
I now surrender the soapbox.
One of the GHOP tasks we have active is to do a performance profile of Drupal to determine where to focus such energies in Drupal 7. :-) Let's revisit the performance issue after that's been done. --Larry Garfield
On Mon, 3 Dec 2007 16:51:18 +0100 "Chris Johnson" <cxjohnson@gmail.com> wrote:
Beware the premature optimization.
Just let the door open to it (that is more a design choice rather than optimisation itself)... and it may be a political move as well to stop resistance by people that would say that "this is going to kill the performance on my beloved DB". -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 3 Dec 2007 15:22:09 +0100, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 03 Dec 2007 08:07:53 -0500 Earnie Boyd <earnie@users.sourceforge.net> wrote:
Quoting Khalid Baheyeldin <kb@2bits.com>:
3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above).
I am for #3 as the way forward.
Yes, it is the way to move forward. Doing these things anywhere else just causes the frustration demonstrated by this thread. We end up with one way to code the data selections, table creations, etc within everything but the Drupal SQL [1] abstraction engine. Most of it is already taken care of, we just need to formalize, tweak (read patch) and document.
There are plenty DB abstraction layers and ORM. I'd hate to redo the work in a "not so smart way".
At the lowest level you can find things like PDO, adodb etc... with no surprise at the higher level you can find things like SQLAlchemy, Geniusql, storm all written in python. PHP just recently start to resemble an OO language.
Were you at DrupalCon Barcelona? :-) I had a session on the database abstraction question, and the direction that I'm coding for in D7. Slides and video are still available online, I think: http://drupal.org/node/180155
Stuff like ORM are too general beasts, to be fun to use they need a stronger OO language and carry an overhead most people using Drupal won't be happy to live with.
Actually, I consider node_load() an ORM. Not a particularly advanced one, but a basic ORM nonetheless. Before anyone says that it should become more full-featured, that way lies the Data API saga. :-)
RoR has ActiveRecord but RoR stays at a lower level in the pyramid of framework->CMS.
To increase abstraction at the lowest level we can just pick up PDO. At a higher abstraction layer, as usual, it is important to guess which is the most suitable level of abstraction and generalisation to avoid to be stuck in the future while not degrading performances.
I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch.
That's sort of what I'm looking at. Layer 1 is a revised db_query() to use PDO in a type-agnostic way. Layer 3 is node_load() and kin. Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays. The part I'm still trying to figure out is Layer 2 for SELECT statements. So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use. I'm not sure if that is an acceptable solution, however.
As a side note without serialising, un-serialising, re-serialising, the second level of abstraction could provide more security for SQL injection since it will actually be a sort of SQL compiler rather than a regexp filter.
Yep. String-parsing a serialized string is a design flaw.
I'd rely a lot on conventions to keep the code short and less error prone.
Eg. if you're using group by, the returned fields should be in the select statement and in the group by statement in pg. If you pass fields names with a certain convention, fields will be put in the group by statement by the compiler without the need to list them twice and so on...
Similarly for the distinct statement.
Somehow we could even pre-compile SQL statement or save some cached info (as count, I'm testing count caching across pages to speed up paging on MS SQL).
Well, using formal prepared statements will cache the query itself, at least on databases that benefit from that. (MySQL doesn't all that much, but many others do.) I don't know if caching a built query string is going to buy us all that much. If the query string is that expensive to build, then we need to revise our query building mechanism. --Larry Garfield
On Mon, 3 Dec 2007 10:31:22 -0600 Larry Garfield <larry@garfieldtech.com> wrote:
Were you at DrupalCon Barcelona? :-) I had a session on the database abstraction question, and the direction that I'm coding for in D7. Slides and video are still available online, I think:
No, I wasn't in Barcelona and thx for the link. I did miss this one.
Stuff like ORM are too general beasts, to be fun to use they need a stronger OO language and carry an overhead most people using Drupal won't be happy to live with.
Actually, I consider node_load() an ORM. Not a particularly advanced one, but a basic ORM nonetheless. Before anyone says that it should become more full-featured, that way lies the Data API saga. :-)
If you call it OORM One Object Relational Model I may agree ;) That's why I wrote real ORM are too general. We actually know we've to deal with nodes, taxonomy, users... and yeah we've to leave the doors open for other objects but as Drupal is a framework for CMS we need an "ORM" for a CMS not for everything.
I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch.
That's sort of what I'm looking at. Layer 1 is a revised
I'm married ;)
db_query() to use PDO in a type-agnostic way. Layer 3 is node_load() and kin. Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays.
I think we could a) get a broader picture looking at real ORM (as SQLAlchemy and Co. and maybe something in RoR) then narrow down what we generally (with a reasonable abundant generally) do and later write a "sql cross compiler". I did some experiments that should be similar to yours with arrays to live with MS SQL and pgsql and recycle Drupal api but well... I'm not that convinced. I mean... tons of people did it before me and I don't feel as smart as getting it right without looking at real implementations reasonably carefully. Another important thing is the acceptance of such a method and making it uniform to all the rest (schema api). So... some how arrays (aka meta info) are the only way to go and it looks to late unless we're going to change schema api as well. ORM in python are well pythonic. A specialised-ORM in Drupal has to be drupalish or there won't be acceptance and it will be hard to "remember". BTW I'm working on profile info and it is a pain how forms are so similar to the profile array but still different. And it is not that fun to write a form2profile function to transform an array in another array. It would be very nice if you could juggle schema and query meta info. eg. most of the times $header for table api could be deduced from the field list or the opposite...
The part I'm still trying to figure out is Layer 2 for SELECT statements. So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use.
I didn't get it.
I'm not sure if that is an acceptable solution, however.
Any sample?
I'd rely a lot on conventions to keep the code short and less error prone.
Eg. if you're using group by, the returned fields should be in the select statement and in the group by statement in pg. If you pass fields names with a certain convention, fields will be put in the group by statement by the compiler without the need to list them twice and so on...
Similarly for the distinct statement.
Somehow we could even pre-compile SQL statement or save some cached info (as count, I'm testing count caching across pages to speed up paging on MS SQL).
Well, using formal prepared statements will cache the query itself, at least on databases that benefit from that. (MySQL doesn't all that much, but many others do.) I don't know if caching a built
A concrete example: on MS SQL paging is different. To obtain performance speed-up I used *_data_seek. I could to better using TOP too (LIMIT in pg/mysql jargon), but then I wouldn't have the rows number. I could cache the row number in a query_cache object and use TOP the next page. Having an api that relate queries with some cached data that every DB abstraction layer could use for its optimisation may come handy. Each DB abstraction layer will do it as it needs with different data... but with the same tool.
query string is going to buy us all that much. If the query string is that expensive to build, then we need to revise our query building mechanism.
Reasonable. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 3 Dec 2007 18:56:41 +0100, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch.
That's sort of what I'm looking at. Layer 1 is a revised
I'm married ;)
ROFL!
db_query() to use PDO in a type-agnostic way. Layer 3 is node_load() and kin. Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays.
I think we could a) get a broader picture looking at real ORM (as SQLAlchemy and Co. and maybe something in RoR) then narrow down what we generally (with a reasonable abundant generally) do and later write a "sql cross compiler".
I've started looking into code for ADODB, but dear god that thing is ginormous. I am a big big fan of small but powerful code. :-)
I did some experiments that should be similar to yours with arrays to live with MS SQL and pgsql and recycle Drupal api but well... I'm not that convinced. I mean... tons of people did it before me and I don't feel as smart as getting it right without looking at real implementations reasonably carefully. Another important thing is the acceptance of such a method and making it uniform to all the rest (schema api). So... some how arrays (aka meta info) are the only way to go and it looks to late unless we're going to change schema api as well. ORM in python are well pythonic. A specialised-ORM in Drupal has to be drupalish or there won't be acceptance and it will be hard to "remember".
As powerful as arrays are, they're not the end-all and be-all. I have a feeling that the days of no-objects-in-core are numbered. We just have to be careful not to reimplement Java. :-)
BTW I'm working on profile info and it is a pain how forms are so similar to the profile array but still different. And it is not that fun to write a form2profile function to transform an array in another array. It would be very nice if you could juggle schema and query meta info. eg. most of the times $header for table api could be deduced from the field list or the opposite...
The part I'm still trying to figure out is Layer 2 for SELECT statements. So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use.
I didn't get it.
I'm not sure if that is an acceptable solution, however.
Any sample?
Well, I've nothing written yet. It's still in the "simmer in the back of my head for a while" stage. But off the cuff, I'm thinking something vaguely along the lines of: $select = db_select(); $select->join('node', 'n')->join('user', 'u')->field('uid', 'n')->countField('nid', 'n', 'num_nids') ->where('type', 'n', 'page')->group('uid', 'n')->order('num_nids', 'DESC')->limit(0, 5); $result = $select->execute(); Which in MySQL would translate to: SELECT n.uid, count(n.nid) as num_nids FROM node n INNER JOIN users u ON n.uid=u.uid WHERE n.type='page' GROUP BY n.uid ORDER BY num_nids DESC LIMIT 5 A Fluent API (mutator methods return $this, so you can chain calls like in jQuery) makes the syntax much easier and shorter, and lets you specify things in any order. It also then serves as a very easy arbitrary query builder, which (maybe maybe maybe) can then become the core of Views-partially-in-core. Of course, SELECT statements can get extremely complex, which is why I'm wary of this method as it means we will almost certainly end up making certain edge-case queries impossible. As I said, still percolating. As Gabor said, though, let's focus on D6 for the time being. :-) Contact me off-list if you want to keep up this conversation. --Larry Garfield
On Nov 28, 2007 5:48 AM, Chris Johnson <cxjohnson@gmail.com> wrote:
If I were able to devote more time to just testing Drupal for the sake of testing it and with no further purpose, I could and would use Postgres more of the time. But even then -- if my goal were to test the new menu code, I'd want to eliminate any other possible error causing sources, such as using Postgres.
So unless we actually push the issue of supporting multiple databases as a community, the vast majority of developers will never make the effort. And with each passing release spent in that mode, the harder it will become to reverse it. We will end up supporting only one database, MySQL, de facto.
Very good post Chris. This is exactly the problem: those who use Drupal on PostgreSQL must be the ones who would resolve the SQL differences, test and debug patches, ...etc. They are the ones that have a vested interest in it more than the rest of us. All of us want a cross platform cross database CMS, but the testing work must be done by those who use PostgreSQL. The rest of us have no time or resources to dedicate for something we will not directly use. -- Khalid M. Baheyeldin 2bits.com http://2bits.com Drupal optimization, development, customization and consulting.
Khalid wrote:
This is exactly the problem: those who use Drupal on PostgreSQL must be the ones who would resolve the SQL differences, test and debug patches, ...etc.
They are the ones that have a vested interest in it more than the rest of us. All of us want a cross platform cross database CMS, but the testing
work must be done by those who use PostgreSQL.
Which brings us back to the original topic of this thread. And the fact there's a release critical issue[1] which has been waiting for 4 days and 16 hours now for a quick test on postgresql. I don't think I need to point out the irony of this situation. 1. http://drupal.org/node/164532 Improve table indices for common queries.
This is exactly the problem: those who use Drupal on PostgreSQL must be the ones who would resolve the SQL differences, test and debug patches,...etc.
All of us want a cross platform cross database CMS,
While we MySQL developers are willing to validate our queries against an SQL validator but the problems are usually not on this level. Some examples are: text fields without defaults work on MySQL (even they can't have a default if i remember correctly) but not on pgsql so we need to supply a value. DDL have different syntax -- schema solved a lot but not all. So let me reiterate my proposal: If a patch works on MySQL and has a good chance of working on postgresql -- because the queries validate or the specific features exist on postgresql but there can be syntax differences etc -- then please commit it. This is not, again NOT going down the "one database route" -- we will validate our queries, will still look into the postgresql manual as we did before etc to ensure things are OK -- just please let the postgresql folks test when they have time / resource without holding up the other whatever significant percent of core developers.
In response to "Karoly Negyesi" <karoly@negyesi.net>:
This is exactly the problem: those who use Drupal on PostgreSQL must be the ones who would resolve the SQL differences, test and debug patches,...etc.
All of us want a cross platform cross database CMS,
While we MySQL developers are willing to validate our queries against an SQL validator but the problems are usually not on this level. Some examples are: text fields without defaults work on MySQL (even they can't have a default if i remember correctly) but not on pgsql so we need to supply a value. DDL have different syntax -- schema solved a lot but not all.
Before I start, let me reiterate for anyone who may have missed it earlier, that I am a rabid PostgreSQL advocate. That being said, I _agree_ with Karoly _in_theory_. PostgreSQL compat should _not_ hold Drupal back. To let it hold Drupal back would be a horrible idea. It would be utterly destructive to the project. Unfortunately, I don't see any _actual_ evidence that the problem you describe even exists, and I see a LOT of false information being put forth in this thread. Let me enumerate a few items of concern: *) The argument has been made that PostgreSQL compatibility is holding Drupal back, then 1 (One!) patch is referred to again and again. How important is this one patch? If this _1_ patch is holding Drupal back, please commit it! *) The argument was made that PostgreSQL is too complicated to install and test on, yet I described the setup in a single email. If anyone found my instructions too difficult to follow, _please_ let me know and I will go to efforts to clarify. *) Now you make a ridiculous claim as to the SQL syntax of PostgreSQL: "text fields without defaults [don't work in pgsql]" test=# create table testkaroly (testfield TEXT); CREATE TABLE test=# \d testkaroly Table "public.testkaroly" Column | Type | Modifiers -----------+------+----------- testfield | text | test=# create table testkaroly2 (testfield TEXT DEFAULT 'test'); CREATE TABLE test=# \d testkaroly2 Table "public.testkaroly2" Column | Type | Modifiers -----------+------+---------------------- testfield | text | default 'test'::text As you can see clearly, TEXT fields work with and without defaults. What completely unreliable source are you using for your information? In any event, once I have some life issues sorted out I will be more involved in Drupal development (hopefully come January). Until then, I simply feel obligated to debunk such blatantly incorrect assertions as this text/default thing. -- Bill Moran http://www.potentialtech.com
On Nov 29, 2007 3:03 AM, Khalid Baheyeldin <kb@2bits.com> wrote:
On Nov 28, 2007 5:48 AM, Chris Johnson <cxjohnson@gmail.com> wrote:
So unless we actually push the issue of supporting multiple databases as a community, the vast majority of developers will never make the effort. And with each passing release spent in that mode, the harder it will become to reverse it. We will end up supporting only one database, MySQL, de facto.
Very good post Chris.
This is exactly the problem: those who use Drupal on PostgreSQL must be the ones who would resolve the SQL differences, test and debug patches, ...etc.
They are the ones that have a vested interest in it more than the rest of us. All of us want a cross platform cross database CMS, but the testing work must be done by those who use PostgreSQL.
This is exactly NOT what I meant. If "all of us want a cross platform cross database CMS", then we /all/ have to bear some of the burden of making it so -- not just those second class citizens who use the less popular database. Otherwise, we're back to square one again.
This is exactly NOT what I meant. If "all of us want a cross platform cross database CMS", then we /all/ have to bear some of the burden of making it so -- not just those second class citizens who use the less popular database. Otherwise, we're back to square one again.
Yes we take some of the burden: we do not use MYSQLisms, we make sure that the queries validate --this can be part of the new policy, something positive that came out of this talk -- we check the postgresql manual if we do something funny -- there is just the actual test we are talking of. I do not think our viewpoints are too far away :)
In response to "Karoly Negyesi" <karoly@negyesi.net>:
This is exactly NOT what I meant. If "all of us want a cross platform cross database CMS", then we /all/ have to bear some of the burden of making it so -- not just those second class citizens who use the less popular database. Otherwise, we're back to square one again.
Yes we take some of the burden: we do not use MYSQLisms, we make sure that the queries validate --this can be part of the new policy, something positive that came out of this talk -- we check the postgresql manual if we do something funny -- there is just the actual test we are talking of. I do not think our viewpoints are too far away :)
I do. I think you're trying to make it seem like people agree with you when they really don't. -- Bill Moran http://www.potentialtech.com
Quoting Ivan Sergio Borgonovo <mail@webthatworks.it>:
The path should be to make the DB abstraction layer more DB agnostic and make developer more aware there is not just mysql. Everything will look terrible if you design for mysql and then try to port it to any other DB. This increase the inertia of moving to a more agnostic DB layer and supporting anything else than mysql.
Which is what I suggested about 4 days ago and created http://drupal.org/node/191486 to take this on. Writing to Drupal SQL within the code base and letting the further abstraction handle the differences. Drupal SQL would have a format similar to MySql so that the commands should be able to be passed directly to the MySql layer when the db_type is mysql but other db_type would parse the string further to make changes as needed. This can also fix quoting differences so I can quote the columns in MySql style with the back quote character "`" in my Drupal SQL code and when the db_type is pgsql I change it to the single quote "'" character (assuming that is what pgsql uses to quote columns). Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
Karoly,
"In my opinion, any patch should be committable to core once it works on mysql and has a decent hope (use common sense) to work on postgresql."
How can people read "Let's drop postgresql!" into my words?
Because: 1. You posted on your blog that you want to drop support for pgsql: "I want Drupal core to work with MySQL and that's it." (http://www.drupal4hu.com/node/64) 2. In conversations on IRC and elsewhere, you have expressed great frustration at being bothered with pgsql. 3. You've specifically asked me why I think it is important that the mysql and pgsql databases have consistent schemas, expressing that you think it is not important. So, when you say "any patch should be committable to core once it works on mysql," it sounds an awful lot like "because I don't think support for other databases is important at all and I'd be happy to see it go." You are correct that the particular message I replied to did not call for de-supporting pgsql but I think it is reasonable to assume your previous statements apply to your current statements, unless you disclaim them. Honestly, I have no real opinion about pgsql support per se. I've never used it except for D6 development and testing. The decision to support it in Drupal was made before I arrived in the project and I don't know why. However, I believe the discipline of supporting more than mysql is healthy and necessary for the project. I do think Larry has a point, though: This situation will only get harder and we cannot hold up all patches until they work on every conceivable platform. If we have consensus on what platforms we support and lack of hostility to re-opening closed issues due to portability issues that come up later, and sometimes "mysql-only developers" even having to rewrite code that works on mysql in order to make it portable, then a process that includes commits before patches are tested on all supported platforms is probably sensible. HOWEVER, I still think the mandatory works-before-commit set of platforms should include more than just mysql; including at least two DBMS's makes it much more likely that the code will be portable for additional DBMS's later *without* requiring rewriting. One more thing. I use XAMPP on Windows for development. Here is my process for using PostgreSQL 8.3: 1. Download and run the installer. 2. Uncomment the "extension=php_pgsql.dll" line in Apache's php.ini. 3. Re-start Apache. 4. Change $db_url in settings.php. Not very hard. If a handful of core developers did this and just used pgsql for all core development, we'd have a better product and process. Barry
assume your previous statements apply to your current statements, unless you disclaim them.
Really? So if I write a mail which makes it crystal clear I changed my view then I should add "Hey! look! I changed my opinion!". Sure.
I do think Larry has a point, though: This situation will only get harder and we cannot hold up all patches until they work on every conceivable platform. If we have consensus on what platforms we support and lack of hostility to re-opening closed issues due to portability issues that come up later, and sometimes "mysql-only developers" even having to rewrite code that works on mysql in order to make it portable, then a process that includes commits before patches are tested on all supported platforms is probably sensible.
Here we agree.
HOWEVER, I still think the mandatory works-before-commit set of platforms should include more than just mysql; including at least two DBMS's makes it much more likely that the code will be portable for additional DBMS's later *without* requiring rewriting.
Here we don't.
Here is my process for using PostgreSQL 8.3:
Huh, nice. OK, I shall admit that I already have installed postgresql on Tank but I am so unsure about it... anyone could help the process with a tutorial which makes sure the database can accept connections from localhost and only there? Also, add a "drupal" user with "drupal" password which can do... anything whatever it means. This process is clear on MySQL and after two years with occassionally working with pgsql, I could not figure out its grant system. I am daft, I know.
"Karoly Negyesi" <karoly@negyesi.net> wrote:
Huh, nice. OK, I shall admit that I already have installed postgresql on Tank but I am so unsure about it... anyone could help the process with a tutorial which makes sure the database can accept connections from localhost and only there?
Once it's installed, there's another step called "initializing the cluster" ... which is basically creating an empty database. Some packaging systems might do this for you. If PostgreSQL starts, then it's done (it won't implicitly create an empty system). PG is designed to allow multiple version on a single machine. I don't know how many people actually do this, but it's a godsend when you need it. The point is that each "cluster" (as it's called) is a directory with all the configuration and database files for that running instance -- usually /usr/local/pgsql or something like like that. do a "find / -name postgresql.conf" and you'll find it. That's the first file you want to look at. There's a lot of esoteric tuning options, but the obvious one near the beginning is called "listen_address" which defaults to "localhost" only. (Although different package systems may install a custom config file, so you should check)
Also, add a "drupal" user with "drupal" password which can do... anything whatever it means. This process is clear on MySQL and after two years with occassionally working with pgsql, I could not figure out its grant system. I am daft, I know.
I doubt you're daft, but I'm confused as to where the problem is. PostgreSQL's grant system is simpler than MySQL's in my experience. The only thing that might be complex is that PG versions greater than 8 have unified users and groups into a single concept called a "role". If you just think of a "user" as a role with login permissions, you'll be OK. If you connect with the psql command, you can issue "\du" to get a list of users. I'm guessing by default that there's only the default superuser on your system. To create a new one: CREATE ROLE drupal WITH LOGIN; Or, to create it with superuser permissions: CREATE ROLE drupal WITH LOGIN SUPERUSER; You can also tweak roles after creation: ALTER ROLE drupal PASSWORD = 'somepassword'; Then create a database: CREATE DATABASE drupal; And tweak settings as you like: GRANT all ON DATABASE drupal TO drupal; PG has a two-pronged approach to security. The role system which I just described is actually the second layer. The first layer is called "host-based authentication". These two layers combined basically equate to MySQL's role system. In the same directory as the postgresql.conf file, you'll find a pg_hba.conf file. In this are lines that control what users from what hosts can connect to what databases. For your purposes, you probably want lines like: host all all 127.0.0.1/32 trust local all all trust The first means that any user connecting via the loopback is allowed to connect without a password. The second means that any user can connect to any database via the local unix socket without a password. You can get more specific if you want: # Allow anyone from local network to connect to database myaccount # as user drupal, as long as they know the password host drupal myaccount 192.168.5.0/24 password # Allow anyone to connect to database drupal as any user from a # single remote machine without a password host all drupal 172.16.56.21/32 trust psql is your friend. Some useful commands for seeing what your database looks like: # Show all roles (The "u" is legacy from when they were users) \du # Show all databases \l # Show the table structure of table "users" \d users # Show the permissions on all objects in the database \dp # Show the built-in help \? If you get stuck or have questions, ask. -- Bill Moran http://www.potentialtech.com
Here's a question - I'm interested in picking up a book on postrgeSQL - something that will both teach me how to set it up properly and how to configure it, as well as telling me how it's particular dialect of SQL works. I thought I'd start with O'Reilly, but they don't have any books for it listed in their latest catalog. I looked around, and there's only a few books I can find that are recent. So, how new of a book do I need for postgreSQL? And which one would be best? -- John Fiala
"John Fiala" <jcfiala@gmail.com> wrote:
Here's a question - I'm interested in picking up a book on postrgeSQL - something that will both teach me how to set it up properly and how to configure it, as well as telling me how it's particular dialect of SQL works. I thought I'd start with O'Reilly, but they don't have any books for it listed in their latest catalog. I looked around, and there's only a few books I can find that are recent.
So, how new of a book do I need for postgreSQL? And which one would be best?
Personally, I'd make sure whatever you get is written at least for 8.X. The differences (especially in tuning and management) between 7.X and 8.X are pretty significant. Beyond that, I can't offer a lot of help. Personally, I've been able to use the online docs (http://www.postgresql.org/docs/8.2/static/index.html) for all my reference purposes. There is a page referencing books in print on the web site: http://www.postgresql.org/docs/books/ Hope this helps. -- Bill Moran http://www.potentialtech.com
participants (25)
-
adrian rossouw -
Barry Jaspan -
Bill Moran -
Brenda Wallace -
Bèr Kessels -
catch -
Chris Johnson -
Darrel O'Pry -
Derek Wright -
Dries Buytaert -
Earnest Berry III -
Earnie Boyd -
George Kappel -
Greg Knaddison -
Henrique Recidive -
Ivan Sergio Borgonovo -
Jakob Petsovits -
Jeff Eaton -
John Fiala -
Karoly Negyesi -
Ken Rickard -
Khalid Baheyeldin -
Konstantin Käfer -
Larry Garfield -
Trevor Mckeown