On Mon, 3 Dec 2007 15:22:09 +0100, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 03 Dec 2007 08:07:53 -0500 Earnie Boyd <earnie@users.sourceforge.net> wrote:
Quoting Khalid Baheyeldin <kb@2bits.com>:
3. Do database specific stuff (including optimization) in the abstraction layer (this is what Bill Moran advocates above).
I am for #3 as the way forward.
Yes, it is the way to move forward. Doing these things anywhere else just causes the frustration demonstrated by this thread. We end up with one way to code the data selections, table creations, etc within everything but the Drupal SQL [1] abstraction engine. Most of it is already taken care of, we just need to formalize, tweak (read patch) and document.
There are plenty DB abstraction layers and ORM. I'd hate to redo the work in a "not so smart way".
At the lowest level you can find things like PDO, adodb etc... with no surprise at the higher level you can find things like SQLAlchemy, Geniusql, storm all written in python. PHP just recently start to resemble an OO language.
Were you at DrupalCon Barcelona? :-) I had a session on the database abstraction question, and the direction that I'm coding for in D7. Slides and video are still available online, I think: http://drupal.org/node/180155
Stuff like ORM are too general beasts, to be fun to use they need a stronger OO language and carry an overhead most people using Drupal won't be happy to live with.
Actually, I consider node_load() an ORM. Not a particularly advanced one, but a basic ORM nonetheless. Before anyone says that it should become more full-featured, that way lies the Data API saga. :-)
RoR has ActiveRecord but RoR stays at a lower level in the pyramid of framework->CMS.
To increase abstraction at the lowest level we can just pick up PDO. At a higher abstraction layer, as usual, it is important to guess which is the most suitable level of abstraction and generalisation to avoid to be stuck in the future while not degrading performances.
I'd provide more than one very thin layer of abstraction. A higher level could be an ORM specialised on the most important objects in Drupal (nodes, users, taxonomy...). At a lower level an SQL rewriting engine that could be extended later that will support the most popular SQL functions and operations across different DB. At the lowest level a way to directly write SQL with a default not optimised route using the second level of abstraction and a specific DB route for each DB if the developers are willing to optimise, but in a more elegant way than writing switch.
That's sort of what I'm looking at. Layer 1 is a revised db_query() to use PDO in a type-agnostic way. Layer 3 is node_load() and kin. Layer 2, for insert/update/delete, are new utility functions that build the query for you based on arrays. The part I'm still trying to figure out is Layer 2 for SELECT statements. So far the best I've come up with is a very-simple-api query builder (probably a fluent API) object that for complex queries (complex meaning "contains a function") you *must* use. I'm not sure if that is an acceptable solution, however.
As a side note without serialising, un-serialising, re-serialising, the second level of abstraction could provide more security for SQL injection since it will actually be a sort of SQL compiler rather than a regexp filter.
Yep. String-parsing a serialized string is a design flaw.
I'd rely a lot on conventions to keep the code short and less error prone.
Eg. if you're using group by, the returned fields should be in the select statement and in the group by statement in pg. If you pass fields names with a certain convention, fields will be put in the group by statement by the compiler without the need to list them twice and so on...
Similarly for the distinct statement.
Somehow we could even pre-compile SQL statement or save some cached info (as count, I'm testing count caching across pages to speed up paging on MS SQL).
Well, using formal prepared statements will cache the query itself, at least on databases that benefit from that. (MySQL doesn't all that much, but many others do.) I don't know if caching a built query string is going to buy us all that much. If the query string is that expensive to build, then we need to revise our query building mechanism. --Larry Garfield