[development] DB abstraction layer_S_

Mon Dec 3 17:56:41 UTC 2007

On Mon, 3 Dec 2007 10:31:22 -0600
Larry Garfield <larry at garfieldtech.com> wrote:

> Were you at DrupalCon Barcelona? :-)  I had a session on the
> database abstraction question, and the direction that I'm coding
> for in D7.  Slides and video are still available online, I think:

> http://drupal.org/node/180155

No, I wasn't in Barcelona and thx for the link. I did miss this one.

> > Stuff like ORM are too general beasts, to be fun to use they need
> > a stronger OO language and carry an overhead most people using
> > Drupal won't be happy to live with.

> Actually, I consider node_load() an ORM.  Not a particularly
> advanced one, but a basic ORM nonetheless.  Before anyone says that
> it should become more full-featured, that way lies the Data API
> saga. :-)

If you call it OORM One Object Relational Model I may agree ;) That's
why I wrote real ORM are too general. We actually know we've to deal
with nodes, taxonomy, users... and yeah we've to leave the doors open
for other objects but as Drupal is a framework for CMS we need an
"ORM" for a CMS not for everything.

> > I'd provide more than one very thin layer of abstraction.
> > A higher level could be an ORM specialised on the most important
> > objects in Drupal (nodes, users, taxonomy...).
> > At a lower level an SQL rewriting engine that could be extended
> > later that will support the most popular SQL functions and
> > operations across different DB.
> > At the lowest level a way to directly write SQL with a default not
> > optimised route using the second level of abstraction and a
> > specific DB route for each DB if the developers are willing to
> > optimise, but in a more elegant way than writing switch.

> That's sort of what I'm looking at.  Layer 1 is a revised

I'm married ;)

> db_query() to use PDO in a type-agnostic way.  Layer 3 is
> node_load() and kin.  Layer 2, for insert/update/delete, are new
> utility functions that build the query for you based on arrays.

I think we could a) get a broader picture looking at real ORM (as
SQLAlchemy and Co. and maybe something in RoR) then narrow down what
we generally (with a reasonable abundant generally) do and later
write a "sql cross compiler".
I did some experiments that should be similar to yours with arrays to
live with MS SQL and pgsql and recycle Drupal api but well... I'm not
that convinced. I mean... tons of people did it before me and I don't
feel as smart as getting it right without looking at real
implementations reasonably carefully.
Another important thing is the acceptance of such a method and making
it uniform to all the rest (schema api).
So... some how arrays (aka meta info) are the only way to go and it
looks to late unless we're going to change schema api as well.
ORM in python are well pythonic. A specialised-ORM in Drupal has to
be drupalish or there won't be acceptance and it will be hard to
"remember".
BTW I'm working on profile info and it is a pain how forms are so
similar to the profile array but still different. And it is not that
fun to write a form2profile function to transform an array in another
array. It would be very nice if you could juggle schema and query
meta info. eg. most of the times $header for table api could be
deduced from the field list or the opposite...

> The part I'm still trying to figure out is Layer 2 for SELECT
> statements.  So far the best I've come up with is a very-simple-api
> query builder (probably a fluent API) object that for complex
> queries (complex meaning "contains a function") you *must* use.

I didn't get it.

> I'm not sure if that is an acceptable solution, however.  

Any sample?

> > I'd rely a lot on conventions to keep the code short and less
> > error prone.
> > 
> > Eg. if you're using group by, the returned fields should be in the
> > select statement and in the group by statement in pg.
> > If you pass fields names with a certain convention, fields will be
> > put in the group by statement by the compiler without the need to
> > list them twice and so on...
> > 
> > Similarly for the distinct statement.
> > 
> > Somehow we could even pre-compile SQL statement or save some
> > cached info (as count, I'm testing count caching across pages to
> > speed up paging on MS SQL).

> Well, using formal prepared statements will cache the query itself,
> at least on databases that benefit from that.  (MySQL doesn't all
> that much, but many others do.)  I don't know if caching a built

A concrete example: on MS SQL paging is different. To obtain
performance speed-up I used *_data_seek. I could to better using TOP
too (LIMIT in pg/mysql jargon), but then I wouldn't have the rows
number. I could cache the row number in a query_cache object and use
TOP the next page. Having an api that relate queries with some cached
data that every DB abstraction layer could use for its optimisation
may come handy. Each DB abstraction layer will do it as it needs with
different data... but with the same tool.

> query string is going to buy us all that much.  If the query string
> is that expensive to build, then we need to revise our query
> building mechanism.  

Reasonable.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it