Re: [development] Database / SQL future thoughts

5 May 2009

      On Tue, May 5, 2009 at 9:30 PM, Jeff Eaton <jeff@viapositiva.net> wrote:
...
On May 5, 2009, at 1:54 PM, Bertrand Mansion wrote:
...
Well, I think I know everything there is to know about Drupal. I have
been developing modules for it for 4 years now and deployed a dozen of
websites for customers, some quite large... I think it is a good
opportunity now, with the emergence of these new databases, to think
about what we have been doing for years, and how.
Instead of being arrogant and underestimating others, you should start
by asking yourself if there really isn't any other way to better
manage tags (and cache, and sessions, and hierarchies, and callbacks,
and file storage, etc).
You're quite right, Bertrand, and I apologize for the snarkiness of my
comment. Since that article you pointed to described Drupal's tagging system
*without* suggesting it was a fundamentally flawed approach, I assumed you
were not familiar with the Taxonomy system's internals. This is not a matter
of arrogance but of misinterpreting your statement. As I'm sure you know
from being on the devel list, there is an unending stream of "Drupal Should
Do X Like Y, And Here's A Blog Post To Prove It" comments that are not
necessarily rooted in familiarity with the way the system already works.
Greg's statement, though, stands: Taxonomy as it presently stands is a
generalized metadata system, and the optimizations discussed in the first
two parts of the article you linked to are not possible without building an
entirely different set of specialized systems. The third model, explained in
the article that you linked to, is what Drupal uses currently.
A number of other developers have suggested that other approaches might be
good -- rather than tying ourselves to a relational model, we should
consider treating nodes as cached objects, for example. Doing so would
probably yield some great improvements for the specific use cases we
optimize the storage mechanism for. I could be wrong, but at present the use
of a traditional SQL backend is still our best bet for a generalized system
that allows users to design their schemas and their views in an ad-hoc
fashion without writing code.
Is there any way that something other than SQL could leverage multiple
loosely connected systems like CCK, Taxonomy, and Views without crippling
performance in other areas? That's not a rhetorical question; I'm curious
and would like to know if I'm overlooking some fundamental issues.
This is exactly what I am investigating for another project where I
use Tokyo Tyrant with PHP. I don't have figures yet nor concrete
solutions, but I find it very interesting and challenging to try to
think differently. In an application I wrote, I chose to manage my
tags differently (only one table with lots of redundancy but fast),
and it worked well. In another application, I also tried another way
to deal with child/parent relations (not the celko's way but using
LIKE, depths and paths) and it also worked well, was easier to manage
and faster.

With these new databases, at first, I found it very difficult to
forget everything about relational DBs and ORM like solutions where a
table looks like an object. I am almost sure that Drupal, like any
other CMS, could take advantage from systems like CouchDB or Tokyo
Cabinet (or others). Take for example the 'node' and 'node_revision'
tables, in such DBs they wouldn't need to be separate entities.
CouchDB has versioning. Tokyo Cabinet can compress your data on the
fly so you can store many versions of your node without having to
worry about relations.

For CCK, it wouldn't even be needed because Tokyo Cabinet tables, like
CouchDB's, can have arbitrary number of fields. It is your application
which decides which fields are required, not your database.

I find it quite exciting, you should see for yourself.
PS: I'll release a PHP class that talks with Tokyo Tyrant soon,
probably in PEAR.

-- 
Bertrand Mansion
Mamasam