On Mon, 3 Dec 2007 10:31:22 -0600 Larry Garfield <larry@garfieldtech.com> wrote:
Were you at DrupalCon Barcelona? :-) I had a session on the database abstraction question, and the direction that I'm coding for in D7. Slides and video are still available online, I think:
No, I wasn't in Barcelona and thx for the link. I did miss this one.
Stuff like ORM are too general beasts, to be fun to use they need a stronger OO language and carry an overhead most people using Drupal won't be happy to live with.
Actually, I consider node_load() an ORM. Not a particularly advanced one, but a basic ORM nonetheless. Before anyone says that it should become more full-featured, that way lies the Data API saga. :-)
If you call it OORM One Object Relational Model I may agree ;) That's why I wrote real ORM are too general. We actually know we've to deal with nodes, taxonomy, users... and yeah we've to leave the doors open for other objects but as Drupal is a framework for CMS we need an "ORM" for a CMS not for everything.
I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch.
That's sort of what I'm looking at. Layer 1 is a revised
I'm married ;)
db_query() to use PDO in a type-agnostic way. Layer 3 is node_load() and kin. Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays.
I think we could a) get a broader picture looking at real ORM (as SQLAlchemy and Co. and maybe something in RoR) then narrow down what we generally (with a reasonable abundant generally) do and later write a "sql cross compiler". I did some experiments that should be similar to yours with arrays to live with MS SQL and pgsql and recycle Drupal api but well... I'm not that convinced. I mean... tons of people did it before me and I don't feel as smart as getting it right without looking at real implementations reasonably carefully. Another important thing is the acceptance of such a method and making it uniform to all the rest (schema api). So... some how arrays (aka meta info) are the only way to go and it looks to late unless we're going to change schema api as well. ORM in python are well pythonic. A specialised-ORM in Drupal has to be drupalish or there won't be acceptance and it will be hard to "remember". BTW I'm working on profile info and it is a pain how forms are so similar to the profile array but still different. And it is not that fun to write a form2profile function to transform an array in another array. It would be very nice if you could juggle schema and query meta info. eg. most of the times $header for table api could be deduced from the field list or the opposite...
The part I'm still trying to figure out is Layer 2 for SELECT statements. So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use.
I didn't get it.
I'm not sure if that is an acceptable solution, however.
Any sample?
I'd rely a lot on conventions to keep the code short and less error prone.
Eg. if you're using group by, the returned fields should be in the select statement and in the group by statement in pg. If you pass fields names with a certain convention, fields will be put in the group by statement by the compiler without the need to list them twice and so on...
Similarly for the distinct statement.
Somehow we could even pre-compile SQL statement or save some cached info (as count, I'm testing count caching across pages to speed up paging on MS SQL).
Well, using formal prepared statements will cache the query itself, at least on databases that benefit from that. (MySQL doesn't all that much, but many others do.) I don't know if caching a built
A concrete example: on MS SQL paging is different. To obtain performance speed-up I used *_data_seek. I could to better using TOP too (LIMIT in pg/mysql jargon), but then I wouldn't have the rows number. I could cache the row number in a query_cache object and use TOP the next page. Having an api that relate queries with some cached data that every DB abstraction layer could use for its optimisation may come handy. Each DB abstraction layer will do it as it needs with different data... but with the same tool.
query string is going to buy us all that much. If the query string is that expensive to build, then we need to revise our query building mechanism.
Reasonable. -- Ivan Sergio Borgonovo http://www.webthatworks.it