On Tue, May 5, 2009 at 9:30 PM, Jeff Eaton <jeff@viapositiva.net> wrote:
On May 5, 2009, at 1:54 PM, Bertrand Mansion wrote:
Well, I think I know everything there is to know about Drupal. I have been developing modules for it for 4 years now and deployed a dozen of websites for customers, some quite large... I think it is a good opportunity now, with the emergence of these new databases, to think about what we have been doing for years, and how.
Instead of being arrogant and underestimating others, you should start by asking yourself if there really isn't any other way to better manage tags (and cache, and sessions, and hierarchies, and callbacks, and file storage, etc).
You're quite right, Bertrand, and I apologize for the snarkiness of my comment. Since that article you pointed to described Drupal's tagging system *without* suggesting it was a fundamentally flawed approach, I assumed you were not familiar with the Taxonomy system's internals. This is not a matter of arrogance but of misinterpreting your statement. As I'm sure you know from being on the devel list, there is an unending stream of "Drupal Should Do X Like Y, And Here's A Blog Post To Prove It" comments that are not necessarily rooted in familiarity with the way the system already works.
Greg's statement, though, stands: Taxonomy as it presently stands is a generalized metadata system, and the optimizations discussed in the first two parts of the article you linked to are not possible without building an entirely different set of specialized systems. The third model, explained in the article that you linked to, is what Drupal uses currently.
A number of other developers have suggested that other approaches might be good -- rather than tying ourselves to a relational model, we should consider treating nodes as cached objects, for example. Doing so would probably yield some great improvements for the specific use cases we optimize the storage mechanism for. I could be wrong, but at present the use of a traditional SQL backend is still our best bet for a generalized system that allows users to design their schemas and their views in an ad-hoc fashion without writing code.
Is there any way that something other than SQL could leverage multiple loosely connected systems like CCK, Taxonomy, and Views without crippling performance in other areas? That's not a rhetorical question; I'm curious and would like to know if I'm overlooking some fundamental issues.
This is exactly what I am investigating for another project where I use Tokyo Tyrant with PHP. I don't have figures yet nor concrete solutions, but I find it very interesting and challenging to try to think differently. In an application I wrote, I chose to manage my tags differently (only one table with lots of redundancy but fast), and it worked well. In another application, I also tried another way to deal with child/parent relations (not the celko's way but using LIKE, depths and paths) and it also worked well, was easier to manage and faster. With these new databases, at first, I found it very difficult to forget everything about relational DBs and ORM like solutions where a table looks like an object. I am almost sure that Drupal, like any other CMS, could take advantage from systems like CouchDB or Tokyo Cabinet (or others). Take for example the 'node' and 'node_revision' tables, in such DBs they wouldn't need to be separate entities. CouchDB has versioning. Tokyo Cabinet can compress your data on the fly so you can store many versions of your node without having to worry about relations. For CCK, it wouldn't even be needed because Tokyo Cabinet tables, like CouchDB's, can have arbitrary number of fields. It is your application which decides which fields are required, not your database. I find it quite exciting, you should see for yourself. PS: I'll release a PHP class that talks with Tokyo Tyrant soon, probably in PEAR. -- Bertrand Mansion Mamasam