nodify.module idea: 'everything is a node'
Hi folks, An idea I'm hoping for some feedback on before wading into coding. As Drupal devs we've often talked about how to give all Drupal objects (terms, vocabularies, users, nodes, etc.) all the features of nodes. Instead of relying on separate APIs and methods and access control etc. for each object type, we look for a single system that each of these could tap into. Instead of implementing separate ways of adding characteristics (e.g., user profiles), we look for all the benefits of a single, soundly developed one (content.module). I'm pondering a 'nodify' module that would accomplish these aims through attaching content.module content types to each object. Create a user/role/term/vocabulary and it gets all the characteristics of an admin-configurable content type. We already have similar implementations. One I've been looking at with interest is fago's usernode module. It's a node type module that attaches nodes to users. (The module can be manually edited to instead use a different, e.g., CCK, content type.) Basically we'd do the same thing, with two differences: 1. instead of doing so for just one Drupal object type (users), we'd construct a set of methods that could apply to all types. 2. instead of implementing a custom node type through a node module we'd use content.module to create a new, extensible node type (getting the advantages of being able to add fields). As a result, we'd be able to do things with non-node objects like: - attach anything implemented through nodeapi (e.g., mailing) - implement access control through node_access modules - add CCK field types - expose directly to views - control rendering through nodapi 'view' op - etc. Specifically, I'd see: 1. A core set of handlers and methods, like: - nodify_create_content_type() to create a content type with a name llike 'nodify_term' or'nodify_user'. - nodapi hook implementation - hook to load data on supported object types - custom hooks as needed to call methods in all supported object types. e.g. nodify_help() might call help in all support object types' .inc files for messages specific to that object type, e.g., a term. 2. For each module with a one or more supported object types, an include file (e.g., taxonomy.inc for terms and vocabularies, user.inc for users and roles) with specific methods: - hook implementation to describe available supported object types. E.g., taxonomy.inc would return something about supporting 'term' and 'vocabulary'. - custom hook implementations, e.g., of nodify_help. - hook implementations (user.inc implements hook_user(), taxonomy.inc implements hook_taxonomy) to attach - form alters: e.g., term editing form is altered to add the node edit form for 'nodify_term' content type. - views hooks Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this? Thanks, Nedjo
Funnily enough, it's part of my 2007 wishes for Drupal (http://plusvite.net/?c2e4e998), but I was seeing it as a more radical change, in order to actually reduce code volume and API complexity: just four generalized hooks (CRUD) could be used on each of these, instead of having specific hooks like hook_user with sub-ops, hook_insert/delete..., hook_taxonomy with sub-ops... It would make our overall entity model more homogeneous, simpler, and probably require less code overall than the current system, or than one using a nodify module. But it's a fairly major restructuring that would probably seriously break backwards compatibility. Regarding pitfalls, the case of the category module, which brought this (and more) to terms is certainly an interesting precedent. This module has had a difficult start, IIRC. Frederic ----- Original Message ----- From: "Nedjo Rogers" <nedjo@islandnet.com> To: <development@drupal.org> Sent: Friday, January 19, 2007 9:31 PM Subject: [development] nodify.module idea: 'everything is a node' [...]
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this? [...]
Nedjo Rogers wrote:
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this?
Thanks, Nedjo
I've done these kinds of schemes before (I do it with a variety of CiviCRM objects). It has its pluses and minuses. First, it's worth thinking about specific use cases, since in many cases, I've discovered that a node devoted to the object is *not* what people really want. More typically, what they *really* want is to see the object as an element in a page, or as an item in a list. For the first, CCK does the job better than a stand-alone node object. For the latter, Views does a very good job. I've found that almost everything I originally thought to do via "nodification" I ended up doing via CCK or Views, as a result. If you do want to do it, I've found it to be pretty simple to implement. If the objects map well to the kind of info people like to use for nodes (i.e., it's not weird to see it as a page by itself, you need to control access like a node, etc.), then it's not a bad solution. Rob
In addition to Wolfgang usernode module, there is a node comment module by Robert that just hit CVS last week, which allows nodes (think CCK) to be comments. Everything as node has advantages for sure. Uniformity, generality and abstraction comes to mind.
On 19 Jan 2007, at 21:31, Nedjo Rogers wrote:
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this?
I've said it before and I'll say it again; the nodification of such objects is not a good idea for a variety of reasons. Three obvious reasons are: it does _not_ become easier for your users, it does _not_ become easier for the average developer (unless you're a Drupal expert, the code becomes more difficult to understand), and it will be an incredible resource hog. -1 for 'comments as nodes', 'users as nodes' or 'categories as nodes'. -- Dries Buytaert :: http://www.buytaert.net/
Dries, You and some others obviously know more than I do about resource use in core, but - it seems that the changes for 6.0, with (as I understand them) node loading mechanism becoming lighter before rendering, might make loading faster - the core code might become more difficult to understand (although I'm not sure why it would, but I'm no core expert), but the API used for contrib modules could actually become simpler, with a slightly reduced number of hooks and no additional parameters, which could make contrib code more easy to write and maintain: CRUD uperations like loading, updating, creating and deleting are already hooked for each of these entities, but in a different way. Would it not be simpler to have the same way be used for each entity ? - you say it does not become easier for users, and this may be true. But it certainly does not become more difficult for them, since most users are not even aware to the "node" term - reducing the number of tables might reduce load (otoh, same-table joins like (content)node->nid to (user)node->nid can be less efficient on some engines), which might indeed be a source of resource hogging, as you suggest. This needs benchmarking, though: this is the only engineering proof. Maybe it has been done at the occasion of one previous discussion on this recurring theme ? Overall though, I have the slightest impression that the rejection of the idea might stem from having from the origin of the software held "nodes" as the basic content entity, wrapped, complemented, and organized by every other entity, and such a change would mean nodes become the basic entity, not only for content, and content nodes become a specialization of this more abstract object, which is another point of view somehow. Note that this would pave the way for interesting evolutions like potential revisions on terms, users and comments (and more). Now *that* would be a major change. Is this also considered when rejecting Nedjo's idea ? Frederic ----- Original Message ----- From: "Dries Buytaert" <dries.buytaert@gmail.com> To: <development@drupal.org> Sent: Saturday, January 20, 2007 11:54 AM Subject: Re: [development] nodify.module idea: 'everything is a node'
On 19 Jan 2007, at 21:31, Nedjo Rogers wrote:
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this?
I've said it before and I'll say it again; the nodification of such objects is not a good idea for a variety of reasons. Three obvious reasons are: it does _not_ become easier for your users, it does _not_ become easier for the average developer (unless you're a Drupal expert, the code becomes more difficult to understand), and it will be an incredible resource hog.
-1 for 'comments as nodes', 'users as nodes' or 'categories as nodes'.
-- Dries Buytaert :: http://www.buytaert.net/
Hi, My personal rule in this area is that if it is content then it is a node. So things like users, categories, and yes I have have had someone say for E-Commerce 'a transaction should be a node', which are not content should not be a node. The only thing that is borderline is a comment, as sometimes a comment can actually be content. but the overhead of a node for a comment really makes it impractical. Also building a one size fits all box for every type of object really means that there will be overheads in areas which are not needed for some types of objects. Gordon. FGM wrote:
Dries,
You and some others obviously know more than I do about resource use in core, but - it seems that the changes for 6.0, with (as I understand them) node loading mechanism becoming lighter before rendering, might make loading faster - the core code might become more difficult to understand (although I'm not sure why it would, but I'm no core expert), but the API used for contrib modules could actually become simpler, with a slightly reduced number of hooks and no additional parameters, which could make contrib code more easy to write and maintain: CRUD uperations like loading, updating, creating and deleting are already hooked for each of these entities, but in a different way. Would it not be simpler to have the same way be used for each entity ? - you say it does not become easier for users, and this may be true. But it certainly does not become more difficult for them, since most users are not even aware to the "node" term - reducing the number of tables might reduce load (otoh, same-table joins like (content)node->nid to (user)node->nid can be less efficient on some engines), which might indeed be a source of resource hogging, as you suggest. This needs benchmarking, though: this is the only engineering proof. Maybe it has been done at the occasion of one previous discussion on this recurring theme ?
Overall though, I have the slightest impression that the rejection of the idea might stem from having from the origin of the software held "nodes" as the basic content entity, wrapped, complemented, and organized by every other entity, and such a change would mean nodes become the basic entity, not only for content, and content nodes become a specialization of this more abstract object, which is another point of view somehow.
Note that this would pave the way for interesting evolutions like potential revisions on terms, users and comments (and more).
Now *that* would be a major change.
Is this also considered when rejecting Nedjo's idea ?
Frederic
----- Original Message ----- From: "Dries Buytaert" <dries.buytaert@gmail.com> To: <development@drupal.org> Sent: Saturday, January 20, 2007 11:54 AM Subject: Re: [development] nodify.module idea: 'everything is a node'
On 19 Jan 2007, at 21:31, Nedjo Rogers wrote:
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this? I've said it before and I'll say it again; the nodification of such objects is not a good idea for a variety of reasons. Three obvious reasons are: it does _not_ become easier for your users, it does _not_ become easier for the average developer (unless you're a Drupal expert, the code becomes more difficult to understand), and it will be an incredible resource hog.
-1 for 'comments as nodes', 'users as nodes' or 'categories as nodes'.
-- Dries Buytaert :: http://www.buytaert.net/
!DSPAM:1000,45b1f93e105754881810528!
Aren't we really talking polymorphism here when we say we want to use the something like the node API "toolkit" on other objects? If I read and understand correctly, there are 2 major suggestions. One group wants to make more objects be nodes, so that this benefit is gained. Others don't want some other object types to be nodes and have suggested generalizing the node API to obtain the benefits. Using node-like API functions on other objects sounds like polymorphism to me, which is a "solved problem" so to speak. Any Java or C++ programmers out there who might comment on this? (I only know enough Java and C++ to be dangerous. :-)
It's interesting to watch this idea evolve, and I thought I'd try to sum up our current situation (in Drupal 5) regarding content-related entites and the hooks available for them. This gives a table of all the content-related hooks as I could remember them. I've tried to align horizontally hooks pertaining to a semantically similar operation, and of course columns are used for object types. The table is available here: http://plusvite.net/?245fea66 I've also done a blog post about it and my impressions, to increase remanence on the discussion: http://plusvite.net/?266be461 Frederic G. MARAND fgm / osinet ----- Original Message ----- From: "Chris Johnson" <cxjohnson@gmail.com> To: <development@drupal.org> Sent: Monday, January 22, 2007 9:33 PM Subject: Re: [development] nodify.module idea: 'everything is a node'
Aren't we really talking polymorphism here when we say we want to use the something like the node API "toolkit" on other objects?
If I read and understand correctly, there are 2 major suggestions. One group wants to make more objects be nodes, so that this benefit is gained. Others don't want some other object types to be nodes and have suggested generalizing the node API to obtain the benefits.
Using node-like API functions on other objects sounds like polymorphism to me, which is a "solved problem" so to speak. Any Java or C++ programmers out there who might comment on this? (I only know enough Java and C++ to be dangerous. :-)
Interesting. Thank you for creating the summary table.
As someone from the world of java and c++, I am fascinated in a very non-sectarian way how the hook mechanism in general manages to allow modules to "override" behavior proper to the core. Dries has mentioned this in several presentations, and there is the article on OOP and Drupal in the API docs. In no way is this the place for disputing programming paradigms and their relative merits. But since someone asks for an opinion, I would like to point out that polymorphism is not simply overriding a function by the same name, it assumes that a virtual machine traverses a set of inherited objects and automatically picks out the right type, and invokes its implementation. So there wouldn't be any redundant code anywhere, and I could write code (this is the important thing) for "nodes" that would work for an "image" type inherited from node without changing anything. I am already fantasizing (and have been for some time) on what an object oriented reverse engineering of Drupal might look like, and one day it will have to be done. Right now, however, when we want to override theme behavior for a given node, we have to copy node.tpl.php to node-modulename.tpl.php and change the default behavior; and the theme function acts like a virtual machine and chooses the function corresponding to the "object" by the file name. So the theme function is actually implementing behavior proper to a virtual machine. It may be efficient in execution, but in Drupal this has to be done again and again. So it is not polymorphism, in the strict sense. I have been following much of these discussions, thinking to myself that the essence of the problem is "what is the most efficient way to make procedural event-driven (callback hook) code act like an OOP virtual machine." At some point, the only way to solve things completely will be to move to PHP5 classes or to Ruby... although in a year or so, these languages may very well converge into a single VM... But the object oriented paradigm is mandatory at some point. God knows when, things are working out pretty well without it :) Victor Kane http://awebfactory.com.ar On 1/22/07, Chris Johnson <cxjohnson@gmail.com> wrote:
Aren't we really talking polymorphism here when we say we want to use the something like the node API "toolkit" on other objects?
If I read and understand correctly, there are 2 major suggestions. One group wants to make more objects be nodes, so that this benefit is gained. Others don't want some other object types to be nodes and have suggested generalizing the node API to obtain the benefits.
Using node-like API functions on other objects sounds like polymorphism to me, which is a "solved problem" so to speak. Any Java or C++ programmers out there who might comment on this? (I only know enough Java and C++ to be dangerous. :-)
Using node-like API functions on other objects sounds like polymorphism to me, which is a "solved problem" so to speak. Any Java or C++ programmers out there who might comment on this? (I only know enough Java and C++ to be dangerous. :-)
the proposed concepts are somehow similar, but there are differents. E.g. hook_form_alter() allows modules to change the node forms. If two modules are changing the form it's still working fine as long as they change two different parts of the form. With OO programming techniques this wouldn't be possible, as module2 would need to subtype the form of module1, which is already subtyping the form. So the hook system seems to a bit more powerful (at least in this case). I like the hook system of drupal, it's easy to understand and flexible. -fago
You could use something like an Accumulator within a Visitor Pattern to allow altering forms/node objects/link lists/etc. But some more thinking would have to go into this, and I think it very much resembles the current approach. fago wrote:
Using node-like API functions on other objects sounds like polymorphism to me, which is a "solved problem" so to speak. Any Java or C++ programmers out there who might comment on this? (I only know enough Java and C++ to be dangerous. :-)
the proposed concepts are somehow similar, but there are differents. E.g. hook_form_alter() allows modules to change the node forms. If two modules are changing the form it's still working fine as long as they change two different parts of the form.
With OO programming techniques this wouldn't be possible, as module2 would need to subtype the form of module1, which is already subtyping the form. So the hook system seems to a bit more powerful (at least in this case).
I like the hook system of drupal, it's easy to understand and flexible.
-fago
fago and Victor: I was actually thinking about polymorphism on a higher, more abstract level compared to having an API which can do the same things to a variety of different objects, e.g. nodes, users, vocabs, etc. That is, imagine Drupal to be the virtual machine itself -- and ignore "proper" OO language ideas like class inheritance, etc. Anyway, an interesting comparison nonetheless. Thanks for the follow-up responses.
absolutely, here we are speaking in terms of analogies, in somewhat general terms, and above all in terms of real-world usability of the api your idea is a positive one, and opens the way for future mapping to alternative paradigms. On 1/23/07, Chris Johnson <cxjohnson@gmail.com> wrote:
fago and Victor: I was actually thinking about polymorphism on a higher, more abstract level compared to having an API which can do the same things to a variety of different objects, e.g. nodes, users, vocabs, etc. That is, imagine Drupal to be the virtual machine itself -- and ignore "proper" OO language ideas like class inheritance, etc. Anyway, an interesting comparison nonetheless. Thanks for the follow-up responses.
Am Samstag, den 20.01.2007, 11:54 +0100 schrieb Dries Buytaert:
On 19 Jan 2007, at 21:31, Nedjo Rogers wrote:
Questions: - does this approach sound useful? - who's working on something similar? better? - pitfalls? - anyone wanting to work on this?
I've said it before and I'll say it again; the nodification of such objects is not a good idea for a variety of reasons. Three obvious reasons are: it does _not_ become easier for your users, it does _not_ become easier for the average developer (unless you're a Drupal expert, the code becomes more difficult to understand), and it will be an incredible resource hog.
-1 for 'comments as nodes', 'users as nodes' or 'categories as nodes'.
Initially I was also a fan of the "everything is a node" approach. However nowadays I think different. Imo nodes are and should be site content - and nothing else. But people want to use that great tools that are available for nodes for other things too. To break out in this direction, we could generalize the API around all the different things (nodes, users, taxonomy, comments, ..). Such a generalized API would make things a lot easier for developers. I agree with Bèr that there would be need for a such a general "drupal object". So nodes, users, taxonomies, comments, .. could be all such a "drupal object", which all share the same API. With that, tools might become more powerful as they could be written easily to work with any kind of drupal object. @usernodes: The usernode module is useful as it enables views to list users. Also it's useful to theme the user page like any other node. But I don't think users should be nodes - users are no site content. However the user page is or might be - so perhaps this should be a node. Which leads to another point - the user profile: site-content. So perhaps some more things should be a node - or rebuilt as another general drupal object. regards, fago
Hello Op vrijdag 19 januari 2007 21:31, schreef Nedjo Rogers:
- does this approach sound useful?
In contrast to the points Dries raises, there are a couple of good sides to more 'general object handling'. * views can be used for stuff like users, blocks, categories etc. * workflows and actions can apply to these objects, by simply resusing existing modules. in contrast to having to duplicate this into a 'commentworkflow.module'. * CCK alike field-building works, without having to develop an etirely new CCK for each and every part in Drupal. * permissions come for free. There is no way to e.g. apply proper permissions to (parts of) profiles, comments, categories, blocks etc. * Theming is a lot easier, esp. now that in 5 the node is abstrated into a real hierarchical object and passed as such to the theme. Instead of wading trough five entirely different concepts (just look at theme_node VS theme_profile, it will make you frown) theming becomes much more consistent and general. However, maybe its a bad idea to call it nodifying. Nodes have been serving a specific purpose for too long in Drupal. I guess many developers can no longer see trough the nodes-as-primary-content anymore, and therefore lack the ability to think Out Of The Box when it comes to this. Instead, call it some kind of general object-layer (a Drupalified Hookified and simplified Active Records, maybe). That way people won't confuse 'general objects' with for example the {nodes} database table. And won't need to wrap their brains around the idea of a 'node beig more then a title+body'. etc. Because right now people often fail to think about the Larger idea of generalised objects, and instead come with arguments like 'but putting averything in a node-table will be bad for performance'. Sure it will, but the whole database architecture is not even talked about, at this point.
- who's working on something similar? better? A lot: Out of the top of my head: microcontent => blocks as nodes. usernodes (+nodefamily+cck+pageroute) => Profiles as nodes (also refer to the group.drupal.org about this) node aggregator (there are two) => RSS aggregation items + feeds are normal nodes.
Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
On 20 Jan 2007, at 12:19, Bèr Kessels wrote:
However, maybe its a bad idea to call it nodifying. Nodes have been serving a specific purpose for too long in Drupal. I guess many developers can no longer see trough the nodes-as-primary-content anymore, and therefore lack the ability to think Out Of The Box when it comes to this.
That way people won't confuse 'general objects' with for example the {nodes} database table. And won't need to wrap their brains around the idea of a 'node beig more then a title+body'. etc. Because right now people often fail to think about the Larger idea of generalised objects, and instead come with arguments like 'but putting averything in a node-table will be bad for performance'. Sure it will, but the whole database architecture is not even talked about, at this point.
This certainly doesn't hold for me. You stop underestimating people's ability to think. :) As you said, there might be room for a light-weight container object, that both nodes, comments, users and taxonomies extend from. It's something I've been thinking about for quite a while. But that's not what FGM proposed, and not what I responded to. -- Dries Buytaert :: http://www.buytaert.net/
Op zaterdag 20 januari 2007 17:04, schreef Dries Buytaert:
This certainly doesn't hold for me. You stop underestimating people's ability to think. :)
Nah, I think we can all think pretty well :). Still, too often a discussion like 'comments as nodes' is smothered in arguments like 'but on Drupal.org we have 111140 nodes and 190926 comments: Comments as nodes would make the node table grow 180%'. My point is that often people fail to look further then 'stick it all in the nodes table', or 'pipe it all trough the nodes hooks' when we discuss more node-ifying. Simply because they have very clear and welldefined ideas about a 'node'.
As you said, there might be room for a light-weight container object, that both nodes, comments, users and taxonomies extend from. It's something I've been thinking about for quite a while. But that's not what FGM proposed, and not what I responded to.
I don't find any references to any particular hooks, nor any proposed database architechures in his mail. All I see is a general idea about 'objectifying' stuff. With the ultimate goal to make 'stuff' more consistent, in data, architecture and in APIs. No reference to actual end-user UIs yet. No reference to, for example, the /admin/node interface or that it should remain the same when stuff is nodified more. Therefore, lets call this 'objectifying' and 'consistencifying' core. Instead of nodifying. My point is that the word+thing 'node' has its place in Drupal, hence calling 'things' nodes, only 'adds noise' to the actual discussion. not? Bèr-ifying -- Drupalifying, Rubyifying on Rails and Joomla!ifying developmentified: webschuur.com | Drupal hostingified: www.sympal.nl
Dries wrote:
there might be room for a light-weight container object, that both nodes, comments, users and taxonomies extend from.
Agreed, this is a much more promising approach than what I outlined--and proof once again that some quick, sound advice from the dev list can save a lot of useless coding :) So I'm seeing two stages to implementing this. 1. Write a generic object.inc or object.module. Maybe this has two tables of its own: CREATE TABLE {object} ( oid int unsigned NOT NULL auto_increment, type_id int unsigned NOT NULL default '0', PRIMARY KEY (oid, type_id), UNIQUE KEY oid (vid), KEY nid (type_id) ) CREATE TABLE {object_type} ( type_id int unsigned NOT NULL auto_increment, name varchar(32) NOT NULL default '', PRIMARY KEY (type_id), ) with some initial values default for the object types: INSERT INTO {object_type} ( values (1, 'node'), (2, 'user'), (3, 'comment'), (4, 'vocabulary'), (5, 'term') ) Every object is now uniquely identified not by a single id but by two: the object id and the object type id. Modules can register their own custom object_types. Possibly other core drupal objects also register: files, roles. object.module or object.inc has methods similar to our current ones: object_load(), object_save(), object_delete(), and uses a hook_object() similar to our current hook_nodeapi(), hook_user, and hook_taxonomy() set. The object methods call any implementations specific to the type. E.g., object_load() invokes a function $object_type->name .'_load'. 2. Having reworked our existing object handling to extend a base object, we then look one by one at the systems currently specific to nodes and consider attaching them to the base object instead, so that they still apply to nodes but are also applicable to other object types. E.g.: * node_access table becomes object_access, and gains a field, type_id (object type). * Current nodeapi modules could be optionally rewritten as hook_object implementations, so that they can attach their behaviours to any object type. * Possibly, node_revisions becomes object_revisions. Does this sound right? Doable? For 6.0?
I've remained a quiet observer so far in this discussion, mainly because I'm well aware of Dries' opinion on the 'everything is a node' idea. But now that the discussion has moved away from 'nodify', I think it's safe to join in. :P I agree that creating a new 'base object' in Drupal is an excellent idea, and one that will seriously improve Drupal's ability to share more functionality between more parts of itself. I would suggest calling the new base thing an 'entity', rather than an 'object', though. The category module, which I wrote, implements the 'terms and vocabularies as nodes' idea. I wrote the category module because I wanted to do things with categories that are not available in the taxonomy module, but that are built in to the node system (and the book module). What I did was a hack, but it was necessary: however, I will be very happy if Drupal is improved so that this hack is no longer necessary (although I have my doubts about a 'base entity' being able to completely remove the need for the category module). As the author and as a heavy user of the category module, I have found that it's very useful to have these features (and more) shared between the node system and the category system: - body text (filtered) - commenting - versioning - adding CCK fields - displaying lists with views - metadata (e.g. author, created/updated time, published, promoted) - hierarchical (i.e. parent-child) relationships - display of TOC and prev/next links (for hierarchical nodes/categories) - tagging (i.e. ability to tag categories with other categories, rather than just tagging nodes) - automatic URL generation with pathauto - more powerful theming, e.g. per-content-type theming, CCK field theming So, if we develop a new 'base entity' for Drupal, then I would like to see everything in the above list be moved from the node system to the base entity, because everything in the above list is useful to share between (at the very least) the node system and the category/taxonomy system. Honestly, from looking at that list, you may be thinking what I've thought all along: "if you want to share that many things between nodes and categories, why not just have categories as nodes?" Which is of course why I wrote the category module in the first place, and why I still think that it was a good idea to do so. Also, regarding the point in my list "hierarchical (i.e. parent-child) relationships", it would be great if this new 'base entity' incorporated the idea of a 'relationships API' that various people have been talking about. Because relationship handling is a feature that needs to be shared between the node, category, and comment systems, and most probably between many more. And regarding the points in my list about CCK and Views: I think that one of the key aims of this 'base entity' idea should be to allow CCK and Views to 'move up the chain', so that they can be utilised by ANY entity within Drupal. If you think that adding custom fields to your book reviews is cool, and that displaying your mom's cooking recipes in a custom list is cool, imagine how much cooler it would be if you could add custom fields to e.g. your site's comments, and if you could pass your site's users into a custom view listing! People think that CCK and Views are 'killer modules' now, but they're nothing compared to what they would be if Drupal allowed them to break out of being node-centric. These modules are too important to be tied down to the node system: they need to be 'services' that are available anywhere and everywhere within the Drupal entity model. Cheers, Jaza. On 1/21/07, Nedjo Rogers <nedjo@islandnet.com> wrote:
Dries wrote:
there might be room for a light-weight container object, that both nodes, comments, users and taxonomies extend from.
Agreed, this is a much more promising approach than what I outlined--and proof once again that some quick, sound advice from the dev list can save a lot of useless coding :)
So I'm seeing two stages to implementing this.
1. Write a generic object.inc or object.module. Maybe this has two tables of its own:
CREATE TABLE {object} ( oid int unsigned NOT NULL auto_increment, type_id int unsigned NOT NULL default '0', PRIMARY KEY (oid, type_id), UNIQUE KEY oid (vid), KEY nid (type_id) )
CREATE TABLE {object_type} ( type_id int unsigned NOT NULL auto_increment, name varchar(32) NOT NULL default '', PRIMARY KEY (type_id), )
with some initial values default for the object types:
INSERT INTO {object_type} ( values (1, 'node'), (2, 'user'), (3, 'comment'), (4, 'vocabulary'), (5, 'term') )
Every object is now uniquely identified not by a single id but by two: the object id and the object type id. Modules can register their own custom object_types. Possibly other core drupal objects also register: files, roles.
object.module or object.inc has methods similar to our current ones: object_load(), object_save(), object_delete(), and uses a hook_object() similar to our current hook_nodeapi(), hook_user, and hook_taxonomy() set. The object methods call any implementations specific to the type. E.g., object_load() invokes a function $object_type->name .'_load'.
2. Having reworked our existing object handling to extend a base object, we then look one by one at the systems currently specific to nodes and consider attaching them to the base object instead, so that they still apply to nodes but are also applicable to other object types. E.g.:
* node_access table becomes object_access, and gains a field, type_id (object type). * Current nodeapi modules could be optionally rewritten as hook_object implementations, so that they can attach their behaviours to any object type. * Possibly, node_revisions becomes object_revisions.
Does this sound right? Doable? For 6.0?
I don't like the idea of storing a 'thing' in one table alone. Its not good for performance and horrible when you want to normalise your data. Therefore, I think its a much better idea to go the Active Records route: Define good defaults/standards and provide APIs that can pull information from these standardised tables: drupal_object_load('comment', array('id' => 450)); or drupal_object_load('term', array('author' => 'Joe', 'created_at' => '12-01-2007')); Off course we need to elaborate on all the details and the hows, whens and ifs. But the idea is simple: you define what 'object' to grab. the drupal_object_load will then: 1) look for a $object_name. '_load' function. If exists it will leave the loading to that function. 2) if not exists it will simply try to grab a record from the {$object_name} table. Obviously we need to stick in some more hooks (to let other modules change the objects before or after they are loaded etc) Op zondag 21 januari 2007 03:15, schreef Jeremy Epstein:
- body text (filtered) - commenting - versioning - adding CCK fields - displaying lists with views - metadata (e.g. author, created/updated time, published, promoted) - hierarchical (i.e. parent-child) relationships - display of TOC and prev/next links (for hierarchical nodes/categories) - tagging (i.e. ability to tag categories with other categories, rather than just tagging nodes) - automatic URL generation with pathauto - more powerful theming, e.g. per-content-type theming, CCK field theming
Two more fields, coming from active records (Ruby on Rails) bein *extremely* useful. Especially because in Rails, the Rails core inserts/updates them for you, provided you named them correct: updated_at created_at Furthermore Rails has a very nice way of handling foreign keys (re: that other discussion) also on none-FK-enabled DB engines, by using standardised table and column names (note the plural and singular!) N-to-N tables: terms, nodes. Join table must then be called nodes_terms nodes_terms has columns | id (private key) | node_id | term_id | Active record (Rails) will then sort out the joins for you! without any programming, using only one statement, by the developer. Something like this could be achieved in a simpler, Drupal-way!? 1-N tables: comments, nodes. No join table. comments has, in addition to its private columns one more: node_id. Active record will then load the node with id node_id whenever you load a certain comment. And please, lets not start about wether or not you like RoR, but instead focus on simplifying this idea and boiling it down until it reaches a Drupal-worthy concept? Maybe we should or could have a look at how hooks can solve parts from this? And if it is acutally a Drupal-worthy concept to start with? Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
On 22 Jan 2007, at 10:54 AM, Bèr Kessels wrote:
I don't like the idea of storing a 'thing' in one table alone. Its not good for performance and horrible when you want to normalise your data.
Anyway, this is part of what the data api stuff i presented at DrupalCon tackled, you can see some example code in my sandbox (along with my slides). You would define one function (model_X), which would define the fields and constraints of the object. You can define multiple views (forms or displays) of the object by defining display_X or form_X functions. The fapi callbacks now become CRUD functions for the model. IE: create_X, load_X, update_X, delete_X. Just by defining the model, and at least one of the CRUD functions, fapi can automatically create a form for you. Instead of having node_load, etc. We would have one function drupal_load('X', $id, $id2, $id3); (or drupal_load('X', array(/* fields */)) ); Inside the drupal_load, drupal_delete etc functions, we would have a mechanism that manages the object table. IE: when inserting a node, it adds an entry into objects for type: node oid : 12 When loading a node, it also loads up the info from the object table. It'll also be able to load up any object using drupal_load('object', $oid); -- Adrian
Am Montag, den 22.01.2007, 09:54 +0100 schrieb Bèr Kessels:
I don't like the idea of storing a 'thing' in one table alone. Its not good for performance and horrible when you want to normalise your data.
I also don't like the idea of a big object table - I just don't see a need for this. I would prefer the pair of type string like node/user/vocabulary/.. with its ID, unique only for its type. I see three pros for this approach: * better performance * easier upgrade from currently existing databases * easier readable object references as you have the type
Therefore, I think its a much better idea to go the Active Records route: Define good defaults/standards and provide APIs that can pull information from these standardised tables
standardized tables are one way. Personally I don't like it that much. I think it would be more visible and flexible, if object type properties are clearly stated somewhere e.g. by a hook which aggregates object type definitions. I also want to point to this interesting issue related to this discussion: http://drupal.org/node/79684 Strategy 3 comes close to the "drupal object" idea. I would go and create the functions drupal_create() drupal_read() drupal_update() drupal_delete() with appropriate arguments. e.g. drupal_read('node', array('nid' => 42)); Then we could go and generalize the node API to work with all drupal objects. But probably it's not necessary or useful to apply the full node API to every drupal object, e.g. the node access system. So perhaps it would make sense that the object type declaring module states, which capabilities the object type has. This could be done with the above mentioned object type definitions, which are parsed from the API. So we could create object types with appropriate capabilities and appropriate resource usage. Basically this is the idea I have in mind, however this needs to be elaborated further. regards, fago
I'm not sure I see having a global object table in the database buys us anything, either. Having such a global object table *in documentation* certainly would be useful for understanding and clarity. Although I may yet be persuaded to seeing its benefits, right now it makes more sense to me to have a small set of base objects, each persisted in the database by one or more tables devoted to that object. Continuing this theme in my idealized view of Drupal data architecture (and my view is always subject to change ;-), I see a small number (1 to 3?) of well-defined polymorphic APIs for handling these base objects, using highly optimized code and SQL. In theory, all access to them would be through such code, though in practicality that is impossible. The compromise is to handle 80% of the cases in such a manner. The other 20% are novel cases.
In the discussion of whether everything should be a node, there are two recurring objections. I believe both can be addressed. The 3 numbered paragraphs below describe a data model that responds to the objections. Then, the paragraphs following look at additional implications. First objection is that database performance would be a nightmare if everything were stored in one table. I agree. However, it is not necessary to keep everything in one table. The second objection is stated in different ways, so I'll synthesize: data nodes, taxonomy vocabularies and terms, and other things like comments and users, are all different _kinds_of_things_. It doesn't make sense to treat them the same. I respond by saying that they *are* the same thing: they are data structures with most fields and operations in common. The fact that they don't _mean_ the same thing to us meatpeople isn't relevant to the question of storage and low-level API's, only to the implementation of each type's handler functions. Here's what I have in mind. 1. Put each node type in its own table, and name the table for the type, e.g. {node_footype}. Each node-type table will have columns for the fields that are absolutely universal to all nodes, at a minimum the globally unique node ID, plus timestamps, etc. Note that the node type is not stored in a column in this table, as it is invariant across rows; the typename is implicit in the table name. (In memory, of course, a node's type identifier would need to be recorded.) 2. Each "node type component" that bundles its own fields and defines its own handlers will also have its own table. For example, Case Tracker and Category implement fields and behavior that can be attached to any node type. This one will only have the fields that pertain to the add-on functionality. Again, the type is not recorded in the table data; it is implicit in the table's name, e.g. {typecomp_casetracker}. 3. The fields that are added to the node type's definition individually would be an implicit node type component, so we'll put those in {typecomp_footype}, corresponding to {nodetype_footype}. For example, the fields we would add with CCK, or in a module that defines a complete and independent node type. [Note this implies that node types and type components share a namespace. I don't believe that's a problem. It makes sense for a node type component to have dibs on the base node type of the same name. (If a node type name is already taken, you need to choose a different name for your module anyway.)] So, the definition of a node type is recorded as a type-name and a list of the associated add-ons, the type components. A type component is defined with a name and a list of field names and types, presumably from a catalog of field type definitions recorded elsewhere. For any node type, regardless of the meaning of its data, whether those are content or categories or whatever, the database operations are the same. For example, I want to retrieve all of one node's fields. I the programmer don't even necessarily know what they all are, and I don't need to. (Please excuse my sloppy syntax.) $querytext = 'select * from node_'. $nodetype. ' '; $jointype = 'inner'; foreach( list_type_components($nodetype) as $comp) { $querytext .= "$jointype join typecomp_$comp on node_$nodetype.node_id = typecomp_$comp.node_id". ' '; } $querytext .= 'where blah blah whatever' If the fieldnames are encoded from the type component's name, e.g. my_component_name.text_field1, the developer will never need to know or care about the storage mechanism, as all he'll ever give or get is an associative array. Each type component's handler would add/remove/whatever the node's ID and type to an index table owned by that type component. This takes care of the case where we want to manipulate a node or nodes from the database without knowing their type, and that's when we want to work with all of one metatype (taxo term, content, etc.). For each metatype, create a node type component that's exclusive to it, and *POOF!*. All the semantical differences among node types take care of themselves in the naming conventions, and where necessary in the handler functions. I believe this mechanism might allow us to avoid the major database performance issues, particularly the self-join problem. It should also simplify the engine by merging all our current data-metatypes into only one basic abstract type with one API, without introducing confusion for the module-level developers. -Edgar
On Monday 22 January 2007 7:59 pm, Edgar Whipple wrote:
Here's what I have in mind.
1. Put each node type in its own table, and name the table for the type, e.g. {node_footype}. Each node-type table will have columns for the fields that are absolutely universal to all nodes, at a minimum the globally unique node ID, plus timestamps, etc. Note that the node type is not stored in a column in this table, as it is invariant across rows; the typename is implicit in the table name. (In memory, of course, a node's type identifier would need to be recorded.)
And right there you lose the ability to treat a node as "just a node", because the node data is so spread out. You can't easily "find all nodes created after time X", because there's n different tables you have to search. (The system I use at work is broken down like that. It's nasty.) Of course, the flip side of the current system is that loading a node becomes very expensive because you have to load from the node table, then figure out what the node type is from that and then load data from another table, then potentially make other queries to load other fields. Unfortunately I don't know of any fancy SQL to determine a table to join against dynamically. (That would be ideal.) And of course, I'm seeing a trend toward more CCK-esque nodes, which means fields get split out into separate tables to allow for richer data types and multi-value fields. That complicates things in an entirely different way. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On Mon, 2007-01-22 at 20:48 -0600, Larry Garfield wrote:
And right there you lose the ability to treat a node as "just a node", because the node data is so spread out. You can't easily "find all nodes created after time X", because there's n different tables you have to search.
I think I would tend to disagree. Yes, you need to load from the database a definition of each node type. However, you only have to do that once per node type, not once per transaction. Once you have that type definition in memory, you can build a single query for your node operation, regardless of how many tables the node data are spread out among. I'll offer a refinement of my previous example for illustration: function retrieve_newer($nodetype, $cutoff_datetime, $comparison='>=') { $querytext = 'SELECT * FROM node_'. $nodetype. ' '; $jointype = 'INNER'; foreach( list_type_components($nodetype) AS $comp) { $querytext .= "$jointype JOIN typecomp_$comp ON node_$nodetype.node_id = typecomp_$comp.node_id". ' '; } /* etc. */ return $resultset; } The developer/user never needs to know about the table structure. All he needs to know is the type identifier, or even just a type component identifier. For SELECTing node data, he gets all the fields together, and when he refers to $mynode[component_name][field_name] he'll get the fields he wants. The only way he wouldn't is if he asked for the wrong node type, and that's hard to do in this example. (As I mentioned previously, you would never even want a node's data without having at least partial knowledge of its type.) So, we'll have *at most* one query per node operation--less with lists of nodes--plus one query per node type per bootstrap. (We may be able to eliminate the latter. I'm about to post a proposed module loading scheme that could reduce or eliminate those.) Now, this does imply a separate query for each node type in any list of nodes. I think maybe we could work to improve even that with the right join setup, but even if I'm wrong, one query per node type ain't so bad.
And of course, I'm seeing a trend toward more CCK-esque nodes, which means fields get split out into separate tables to allow for richer data types and multi-value fields. That complicates things in an entirely different way.
Again, I believe I may not agree. Assume we require that *all* node fields be bundled up in the named groups that I called "node type components", and in the manner of modules like Case Tracker or Category. Assume also that each type-component has its own table, with the relationships I described before. Then, simply by knowing either the node's type identifier _or_ the identifier of any of the node's type components, we have full access to all of a node's data, *even if* we don't have full information beforehand of its type. _Complexity_ becomes much less of an issue at the module-development level. There is the peformance load of table joins to consider, but I'm not worried about it much. For one, we have that now. For another, perhaps that will be offset by the reduced number of queries involved in a node transaction. -Edgar
I think Edgar has some novel ideas, and they provide an implementation for my previous ideas of base objects and a cross-object database API. Although we would need benchmarks to be sure, I suspect that for inner joins on a reasonable number[1] of node types (which should all have compact primary keys), it should be very fast. This is the kind of operation most databases are highly optimized at performing. [1] MySQL limits the number of tables in a join to either 31 or 61, depending on the length of a longword for the host and operating system. How many node types maximum would we ever have?
Nedjo Rogers wrote:
1. Write a generic object.inc or object.module. Maybe this has two tables of its own: [...] Every object is now uniquely identified not by a single id but by two: the object id and the object type id. Modules can register their own custom object_types. Possibly other core drupal objects also register: files, roles. This has an implication that may not be your intention. In my experience, compound primary keys can be a real bear for subsequent programmers. It obfuscates the table relationships, and can make a field look like a key when it isn't. Perhaps it would be better to guarantee that every object's id will be unique, regardless of type. Aesthetics aside, this would also leave the door open for node type transformations.
This quibble notwithstanding, it seems an excellent approach.
INSERT INTO {object_type} ( values (1, 'node'), (2, 'user'), (3, 'comment'), (4, 'vocabulary'), (5, 'term') ) 2. Having reworked our existing object handling to extend a base object, we then look one by one at the systems currently specific to nodes and consider attaching them to the base object instead, so that they still apply to nodes but are also applicable to other object types An important point: in the current system, all of these data types can overlap. Users can be nodes using nodeuser; anything can be a vocabulary or a term with the category module. I think this is a good thing, and that the primary reason many people dislike the overlap is that it can't be done very cleanly at present. For this reason, and for the sake of CCK-style type inheritance, I think we need another layer between your 'object' and the entities named 'user', 'comment', 'term', etc. I'll explain in a separate followup. .E.g.: Does this sound right? Doable? For 6.0? I wasn't around for the 5.0 effort. Is it OK that this will pretty much break every existing module? Or, on the other hand, would it be feasible for a legacy-support layer to provide API translation?
-Edgar So, what are the basic
This is a followup to Nedjo Rogers' idea about stripping the basic Drupal object down to the irreducible minimum, then building back up to current functionality in a cleaner way. I think that, in order to maximize both simplicity and flexibility, we need another abstraction layer beyond what you described. It would yield the logical (and desirable) conclusion to the direction CCK and node are currently traveling. Specifically, current node "types" are a basic skeletal node, plus one or more collections of fields-and-functions slapped on the side. If on the one hand we we extend your approach by one additional layer, and on the other hand extend the Drupal-5/CCK approach to its logical conclusion, it think it will meet in the middle at a nice universal framework. My goal is to make all the principles learned from Drupal<=5 as universal as we (usefully) can. [This is turning out long, so I'm sending it in parts. If you want to just skim, read the intro below look for the headings marked LAYER X, in parts 2 and 3, then skip to the discussions in parts 4 and following] You [Nedjo] wrote, in part:
1. Write a generic object.inc or object.module. Maybe this has two tables of its own:
CREATE TABLE {object} ( oid int unsigned NOT NULL auto_increment, type_id int unsigned NOT NULL default '0', PRIMARY KEY (oid, type_id), UNIQUE KEY oid (vid), KEY nid (type_id) )
CREATE TABLE {object_type} ( type_id int unsigned NOT NULL auto_increment, name varchar(32) NOT NULL default '', PRIMARY KEY (type_id), )
with some initial values default for the object types:
INSERT INTO {object_type} ( values (1, 'node'), (2, 'user'), (3, 'comment'), (4, 'vocabulary'), (5, 'term') )
Every object is now uniquely identified not by a single id but by two: the object id and the object type id. Modules can register their own custom object_types. Possibly other core drupal objects also register: files, roles.
I think you're on the right track. Currently, our basic node type is represented by 'story'. Two data fields, title and body, plus some housekeeping. More complex node types, like most things out of CCK, add more fields. Case Tracker, Category , and other modules are even closer to what I have in mind. The modules don't represent new node types, although they do include basic example implementations. They do their real work by defining *type components*, bundles of functionality that one attached to a basic node-type, making a new, custom type. Because these "bundles" or type components are orthogonal (read here as mutually irrelevant), you can combine them many different ways within single new node type. So, lets universalize that using your approach. First, I'll describe the layers of software logic I have in mind. Then I'll describe how what we now call the "core distribution" would become a reference implementation on top of that logic. I'm not necessarily suggesting we use this as a design roadmap for the new implementation, but only as a conceptual tool for discussing and defining the roadmap. The difference here between my idea and yours is, I had the impression you were thinking modules would implement specific node types. I think most modules should be required to provide a *separate* node-type-component/bundle types, and only optionally provide an actual, instantiable node type. I believe your [Nedjo's] table {objects} and the more fundamental half of your {object_type} correspond to my layers 0, 1 and 2. The more complex aspects of correspond to {object_type}are spread across layer 2 and part of 3. The rest of my 3, and layer 4, are the part that I think you may have implied, or at least had silently in mind, but didn't mention out loud. [Continued in part 2]
[Continued from part 1] I was saying I believe your [Nedjo's] table {objects} and the more fundamental half of your {object_type} correspond to my layers 0, 1 and 2. The more complex aspects of correspond to {object_type}are spread across layer 2 and part of 3. The rest of my 3, and layer 4, are the part that I think you may have implied, or at least had silently in mind, but didn't mention out loud. LAYER 0 Reduce the fundamental type of storable data to an irreducible "nude node". It has unique, database-specific (non-portable) object ID, the class identifier that you called 'type_id', universal housekeeping info like timestamps, *and nothing else*. This type could be actually instantiated by higher layer's function, if that's ever useful, but the type doesn't actually record any useful data. I'm calling this layer "0" because there's no software associated with it. It's just a principle present in the design of the programmable layers. LAYER 1 This is where we put logic for defining data fields. No actual, instantiable field *types*, just abstract logic--and the API specification--for defining individual fields. That is, their names, their basic database-level data types, their validation parameters or handers, and so on. It would be logically consistent to leave defining basic field types to a bunch of API calls in the reference implementation's installation. And, unless that produces unreasonably long installation times, I think we should, so everyone can see how it works. Perhaps it would make sense to provide predefined primitive field-types as simple INSERT data in the core download, but also provide the programs that called the API to produce that data. (My goal is to avoid hand-written startup data as much as possible.) This will help ensure the core download is entire consistent with the whole design, and it will let everyone see how it was all done, using the concrete example of the reference implementation. LAYER 2 Two logical parts at this level. On the left hand, the mechanism for bundling field types together into the named node-type-components I described above. Specifically, the API at this level lets you gather several defined field types together under a single name, and associate handler functions with it. The basic bundles that we've become accustomed to as Drupal's "built-in" types would be reference implementation, just like the fields for Layer 1. On the right hand is the seemingly unrelated mechanism for actually storing node data in the database. However, these two things have to be together. There's no point being able to store data in the database if you can't define how it's organized. And there's no point defining how to organize data if you can't store that information in the database. LAYER 3 Everything Drupal does could be accomplished with the API layers I've described, but it would be a major pain. The typing would be tedious and error prone, and the learning curve would be too steep. Therefore, it is useful for some node types to have a type-specific API, as they do now. However, it is *not* useful for those API's to all be unique, and it is *not* useful for for every module to have one. This is quintuply true if we really do make everything a module. Many modules operate on only one node at a time, and the standard hooks are fully sufficient. Others, like many of the taxonomy-related modules, require something more elaborate, although we may be able to accomodate that with standard callbacks. And, some of the node types semantics are so different that it's useful to give them special function names, even if what those different functions do is all alike. So, the Layer 3 API will provide tools for defining those higher-level functions. For example, these two hypothetical function calls, with appropriate values, would have exactly the same meaning and effect: <code> populate_node_field($node_id, $node_type, $field_name, $field_value); assign_taxo_term($node_id, $vocabulary, $term); </code> But one of them makes a lot more sense, as users of the current API's can no doubt attest. What Layer 3 should provide is a *standardized* means for defining shortcuts like this, for bypassing standard callbacks with lower-level API calls when necessary, and even for declaring that the standard APIs are bypassed altogether, when necessary. HOWEVER, by using the API to set that up, and requiring that contrib developers do the same, we can some of avoid the horrible database issues and merely-awful-but-bad-enough namespace issues and incompatibility that strike here and there. (Think node-type table name collisions in CCK issues, or [as much as I like the module] trying to integrate just about anything with Category.) The point is, a modules would use *this* API to declare its *custom* API. Ideally, anyone reading it should understand in lower-API terms exactly what that module is trying to accomplish, without having to grovel through the implementation of that custom API. (Some of this layer's "API" might really just be further deveopment of the current .info file requirements.) I think this is also the appropriate layer of abstraction for the API's for admin pages, input filters, and a lot of the stuff that happens outside of actual nodes. (But not all; I think you'll be surprised.) LAYER 4 Layers 1 through 3 create the engine. Layer 4 is where, in a very real sense, we create Drupal itself. These are the high-level API's that module developers declare using Layer 3, and possibly using the lower layers. That's both the high-level API's for Drupal's core node types, and all the contrib modules' special-purpose API's (*when needed*, and we need to be strict about that). A side note: Some of my words might seem like a call to do away with the hooks API. I do, in fact, think the hooks idea should be reexamined, if only just in case something better presents itself. However, I'm not advocating a change like that here. I believe this model will accommodate the hooks structure. More importantly, I strongly believe that this approach will facilitate creating a legacy-support API that fully implements the hooks system for the benefit of D-5 modules. LAYER 5 There is no layer 5, unless you want to count printable documentation. :-D Part 3 of this <ahem> manifesto will describe examples of using this approach, Part 4 will look at some implications for contrib-level development, and Part 5 will go into the new module-loading mechanism I alluded to. [Continued in Part 3]
[Continued from Part 2] Again, for the sake of consistency and reliability, I'd strongly advocate all the basic node types should be defined in the reference-site implementation using a series of calls to the Layer 1, 2, and 3 APIs, and not defined directly in any layer's code. This will help ensure both that the APIs are correctly implemented, and However, for sake of faster installation of basic node types, we might want to create predigested database data to include in the core download, as with Layer 2. For the sake of faster bootstrapping, we might also want to tag a few node types in the database as *preloadable*. Then, extract this information at new-module-install time into a dynamically defined .inc file full of static array() defs, to be used every time Drupal is bootstrapped. I've just realized I'm implicitly proposing a completely new mechanism for creating and installing modules. I think the idea I have in mind is a is a good one, but I'll spare the details for followup part 5. As examples, the current modules that define field types and widgets for CCK are conceptually talking to my Layer 1. The modules that gather fields into meaningful arrangements that can be added on to existing node types, like Case Tracker or Category, talk to Layer 2. The ones that define a complete node type talk to Layer 3, and lower layers only if they need to. Those that call upon another module's specialty API are calling Layer 4. Nobody ever talks directly to Layer 0, and when people talk, they're talking *in* Layer 5. :-> Using this approach, I think we can safely continuing to call our basic object a "node", without losing any flexibility, and without confusing people. We'll be able to say, "Tada! everything really is a node!" without terrorizing either the developers or the database admins. I'll say more about that in the next followup. [Continued in Part 4]
[Continued from part 3] I said a previous followup that I think we really can make everything a node. I wanted to add a bit about why I think so, and about what some of the implications of my pre-roadmap concept might be. However, I covered in a separate post much of what I had in mind, under the subject "Everything's a node != Everything's in 1 table". So instead, I'll talk here about the effect my ideas might have on module-level developers. (Some of this isn't entirely consistent with my previous posts. Please excuse that; I'm thinking out loud here.) I mentioned that I think there should be a unified API for handling all node types, which includes all the "metatypes": content nodes, taxonomy vocabularies and terms, users, and all the rest. The example of various modules proves that any of those "non-content" types can in fact represent content. Therefore, they should be nodifiable. But, I don't think module developers should be forced into thinking of all different node types in identical conceptual terms, and I certainly don't think the users should have to deal with it. So, here's an idea on how to avoid that. As I mentioned elswhere, for $node_type = 'mynodetype', these two function calls have the same meaning: node_create($node_type, $field_values_associative_array); mynodetype_create($field_values_associative_array); So, define default CRUD handlers that everyone can use, and declare them automatically for every new node type component. (The possibilities of _how_ are several, so I won't go into that here.) By simply, say, adding a line to my_module.info like: module_type = MODULE_NODE_TYPECOMPONENT I get a full set of default handlers with *my own* module's name on the functions. That means developers can think in terms of taxonomy_term_create() and custom_contenttype_delete(), without having to think much about how they're implemented using the same API's and the same storage engine. Then, perhaps by adding a couple more lines: api_overrides = API_NODE_DELETE, API_NODE_VALIDATE api_extras = myfunctionname, myotherfunctionname all sorts of possibilities open up. The developers can introduce whatever high-level functions they need, in a standardized and robust way, and without complicating low-level data handling any more than the absolute miminum necessary. CONCLUSION My point is that nodifying everything, with all the benefits that go with that, doesn't need to make Drupal any harder for ordinary developers to wrap their brains around. We could, in fact, make it easier, *if* we are sufficiently willing to distill truly flexible, powerful back-end API's and their implementation. In Part 3, I mentioned an idea for a new module installation model. I'll describe that idea further in my next and final followup. This, however, concludes my basic proposal. -Edgar Whipple drupaladmin@misterwhipple.com www.misterwhipple.com/drupal5 as soon as I copy the data to my hosting account.
I mentioned in previous followups that I had an idea about a method for installing modules, and that I think it has potential to improve both install time and bootstrap time, and perhaps robustness as well. Well, it turns out the idea is simpler than I thought, and perhaps slightly less exciting therefore. So, here's a little more detail here about my core idea. But not a lot. MODULE STORAGE SETUP The simple idea is that, ideally speaking, no module should ever set up its own storage in the database. Instead, let every module do any or all of the following. 1) Define new field types, implementing validation and widget handlers, specifying underlying database primitive types (but no more than that), etc. 2) Define a new node-type component using existing field-type declarations, optionally defining non-default CRUD handlers, and *very* optionally defining higher-level API functions. (We could even set up default handlers for some of the more-common-but-not-universal additions, to encourage consistency in names and semantics. For example, more than a couple modules do "set a relation of my custome type X between nodes Y and Z". Perhaps we could define generic, hook-slash-callback implemented API's for that sort of thing.) 3) Declare one or more new node types, using only an identifier and a list of already-defined node-component types. ...and let the core API functions handle the specifics for setting it all up in the database. Of course the developer will still implement his callbacks and such (duh, right?); he's just not allowed to touch the node-data and other tables in the database. I've dug into the guts of about a dozen modules so far. [I know; it's not much. Just give me time. :-) ] And so far, best I can tell, none of them will need direct access if the new core functions along the lines I have described. Making these feasible and then enforcing them could prevent all sorts of database awfulness, starting with namespace collisions. It would also make it easier for developers to discern what they really need to add, and what they don't. I have more in mind, but I'll stop there. -Edgar
Brilliant! This unified CRUD interface with custom non-CRUD handlers (and the rest of your proposal) describe a good implementation of the vague ideas I've been having about the Drupal database for a couple years, or so. There are many side benefits to this sort of architecture. ..chrisxj
there might be room for a light-weight container object, that both nodes, comments, users and taxonomies extend from.
Our discussion has raised a lot of exciting potentials, thanks to all who contributed. The best way to move forward, I'm suspecting, is to start with a minimal implementation of what everyone seems to agree on: a consistent set of common handlers for all object types (nodes, users, etc.) and a single hook that acts on them all. From there we'll be much better positioned to consider further, more radical steps if they're needed. A first draft of a patch: http://drupal.org/node/113435 Please, tear it apart!
participants (15)
-
adrian rossouw -
Bèr Kessels -
Chris Johnson -
Dries Buytaert -
Edgar Whipple -
fago -
FGM -
Gordon Heydon -
Jeremy Epstein -
Khalid B -
Larry Garfield -
Michelangelo Partipilo -
Nedjo Rogers -
Rob Thorne -
Victor Kane