[development] Do not let postgresql hold back great patches

Larry Garfield larry at garfieldtech.com
Mon Dec 3 16:31:22 UTC 2007


On Mon, 3 Dec 2007 15:22:09 +0100, Ivan Sergio Borgonovo <mail at webthatworks.it> wrote:
> On Mon, 03 Dec 2007 08:07:53 -0500
> Earnie Boyd <earnie at users.sourceforge.net> wrote:
> 
>> Quoting Khalid Baheyeldin <kb at 2bits.com>:
> 
>> > 3. Do database specific stuff (including optimization) in the
>> > abstraction layer (this is what Bill Moran advocates above).
> 
>> > I am for #3 as the way forward.
> 
>> Yes, it is the way to move forward.  Doing these things anywhere
>> else just causes the frustration demonstrated by this thread.  We
>> end up with one way to code the data selections, table creations,
>> etc within everything but the Drupal SQL [1] abstraction engine.
>> Most of it is already taken care of, we just need to formalize,
>> tweak (read patch) and document.
> 
> There are plenty DB abstraction layers and ORM. I'd hate to redo the
> work in a "not so smart way".
> 
> At the lowest level you can find things like PDO, adodb etc... with
> no surprise at the higher level you can find things like SQLAlchemy,
> Geniusql, storm all written in python.
> PHP just recently start to resemble an OO language.

Were you at DrupalCon Barcelona? :-)  I had a session on the database abstraction question, and the direction that I'm coding for in D7.  Slides and video are still available online, I think:

http://drupal.org/node/180155

> Stuff like ORM are too general beasts, to be fun to use they need a
> stronger OO language and carry an overhead most people using Drupal
> won't be happy to live with.

Actually, I consider node_load() an ORM.  Not a particularly advanced one, but a basic ORM nonetheless.  Before anyone says that it should become more full-featured, that way lies the Data API saga. :-)

> RoR has ActiveRecord but RoR stays at a lower level in the pyramid of
> framework->CMS.
> 
> To increase abstraction at the lowest level we can just pick up PDO.
> At a higher abstraction layer, as usual, it is important to guess
> which is the most suitable level of abstraction and generalisation to
> avoid to be stuck in the future while not degrading performances.
> 
> I'd provide more than one very thin layer of abstraction.
> A higher level could be an ORM specialised on the most important
> objects in Drupal (nodes, users, taxonomy...).
> At a lower level an SQL rewriting engine that could be extended later
> that will support the most popular SQL functions and operations
> across different DB.
> At the lowest level a way to directly write SQL with a default not
> optimised route using the second level of abstraction and a specific
> DB route for each DB if the developers are willing to optimise, but
> in a more elegant way than writing switch.

That's sort of what I'm looking at.  Layer 1 is a revised db_query() to use PDO in a type-agnostic way.  Layer 3 is node_load() and kin.  Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays.  The part I'm still trying to figure out is Layer 2 for SELECT statements.  So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use.  I'm not sure if that is an acceptable solution, however.  

> As a side note without serialising, un-serialising, re-serialising,
> the second level of abstraction could provide more security for SQL
> injection since it will actually be a sort of SQL compiler rather
> than a regexp filter.

Yep.  String-parsing a serialized string is a design flaw.  

> I'd rely a lot on conventions to keep the code short and less
> error prone.
> 
> Eg. if you're using group by, the returned fields should be in the
> select statement and in the group by statement in pg.
> If you pass fields names with a certain convention, fields will be
> put in the group by statement by the compiler without the need to
> list them twice and so on...
> 
> Similarly for the distinct statement.
> 
> Somehow we could even pre-compile SQL statement or save some cached
> info (as count, I'm testing count caching across pages to speed up
> paging on MS SQL).

Well, using formal prepared statements will cache the query itself, at least on databases that benefit from that.  (MySQL doesn't all that much, but many others do.)  I don't know if caching a built query string is going to buy us all that much.  If the query string is that expensive to build, then we need to revise our query building mechanism.  

--Larry Garfield



More information about the development mailing list