[development] Everything's a node != Everything's in 1 table

Tue Jan 23 03:28:22 UTC 2007

On Mon, 2007-01-22 at 20:48 -0600, Larry Garfield wrote:

> And right there you lose the ability to treat a node as "just a node", because 
> the node data is so spread out.  You can't easily "find all nodes created 
> after time X", because there's n different tables you have to search.  

I think I would tend to disagree. Yes, you need to load from the
database a definition of each node type. However, you only have to do
that once per node type, not once per transaction. Once you have that
type definition in memory, you can build a single query for your node
operation, regardless of how many tables the node data are spread out
among. I'll offer a refinement of my previous example for illustration:

function retrieve_newer($nodetype, $cutoff_datetime, $comparison='>=') {
  $querytext = 'SELECT * FROM node_'. $nodetype. ' ';
  $jointype = 'INNER';
  foreach( list_type_components($nodetype) AS $comp) {
    $querytext .= "$jointype JOIN typecomp_$comp 
      ON node_$nodetype.node_id 
        = typecomp_$comp.node_id". ' ';
  }
/* etc. */
  return $resultset;
}

The developer/user never needs to know about the table structure. All he
needs to know is the type identifier, or even just a type component
identifier. For SELECTing node data, he gets all the fields together,
and when he refers to 

  $mynode[component_name][field_name]

he'll get the fields he wants. The only way he wouldn't is if he asked
for the wrong node type, and that's hard to do in this example. (As I
mentioned previously, you would never even want a node's data without
having at least partial knowledge of its type.)

So, we'll have *at most* one query per node operation--less with lists
of nodes--plus one query per node type per bootstrap. (We may be able to
eliminate the latter. I'm about to post a proposed module loading scheme
that could reduce or eliminate those.)

Now, this does imply a separate query for each node type in any list of
nodes. I think maybe we could work to improve even that with the right
join setup, but even if I'm wrong, one query per node type ain't so bad.

> And of course, I'm seeing a trend toward more CCK-esque nodes, which means 
> fields get split out into separate tables to allow for richer data types and 
> multi-value fields.  That complicates things in an entirely different way.

Again, I believe I may not agree. Assume we require that *all* node
fields be bundled up in the named groups that I called "node type
components", and in the manner of modules like Case Tracker or Category.
Assume also that each type-component has its own table, with the
relationships I described before.

Then, simply by knowing either the node's type identifier _or_ the
identifier of any of the node's type components, we have full access to
all of a node's data, *even if* we don't have full information
beforehand of its type. _Complexity_ becomes much less of an issue at
the module-development level.

There is the peformance load of table joins to consider, but I'm not
worried about it much. For one, we have that now. For another, perhaps
that will be offset by the reduced number of queries involved in a node
transaction.

-Edgar