Re: [drupal-devel] Random: Drupal Relationships

12 Mar 2005


      Morbus Iff wrote:
...
tid | uid    | type    | id
I seriously considered adding a "type" column to the relativity table 
for future expansion.  Only considering the needs of relativity module, 
this would not be needed, though.  The rules defining relationships, 
however, require much more flexibility, and Aran Deltac has proposed a 
table structure for holding per-node relationship data:

CREATE TABLE relativity_variable (
  vid tinyint(10) unsigned NOT NULL auto_increment,
  nid int(10) unsigned NOT NULL default '0',
  node_type varchar(32) default '',
  node_relation tinyint unsigned default 0,
  var_namespace varchar(32) default '',
  var_key varchar(32) default '',
  PRIMARY KEY (vid),
  KEY (var_namespace(2),var_key(6)),
  KEY (node_type,node_relation)
);

CREATE TABLE relativity_variable_value (
  vid tinyint(10) unsigned NOT NULL,
  var_value longtext default '',
  PRIMARY KEY (vid)
);

This may be too relativity.module-specific for a generic relationship 
table.  The var_namespace would basically keep separate features form 
polluting the namespace of other features.  This could be "taxonomy", 
"relativity", "book",  "section", "friend", etc.  The performance of 
maintaining separate "variable" and "variable_value" tables would need 
to be assessed against putting the longtext "value" as part of the same 
"variable" table.  Don't let the naming convention throw you off, 
"relativity_variable" would hold per-relationship rules (per-node, 
per-user, etc) and other meta-information.  It would allow for multiple 
rules to govern each relationship, too ("max_child_nodes" => 10, 
"allowed_child_types" => "page,story,image", or "friend_type" => 
"coworker", "friend_level" => "acquaintance").
...
Here is the table structure and example data (again, assume that if an 
ID makes sense in the column, it'd be there, and that "f" is "from", 
"r" is "relationship" and "t" is "to"). I've include explanatory 
sentences under each column:
f_type | f_data | r_type | r_data | r_who | t_type | t_data
 ------------------------------------------------------------
 term   | cat  | taxo   |        | morb  | image  | 67
  "taxonomy relationship between term cat and image 67, by morbus"
term   | dog  | taxo   |        | moshe | node   | 17
  "taxonomy relationship between term dog and node 17, by moshe
user   | morb | user   | friend | moshe  | user   | dries
  "user relationship: moshe asserts that
   user morbus is a friend of user dries"
...
* anything can relate to anything else.
 * the types of relationship are defined by the module.
 * relationships may have additional data (rating, "friend").
The above is pretty powerful (but at what cost - who knows a lot about 
gigantic tables - is the lookup insanely bad?), and is flexible enough 
to handle a lot of features that would normally require overhauling 
existing tables or creating new ones entirely. Two recent modules I've 
seen, for example, serve the following purpose:
* different URLs use different templates (sections module, I think).
 * nodes can be in a child/parent rel (node_relativity, I think).
I'm assuming they both do this magic with new tables. But, what about:
url   | /forums | sections   |        | morbus | template  | happy.tpl
 node  | 16      | relativity | child   | morbus | node      | 19
 node  | 18      | relativity | inspired | morbus | node      | 25
And so on.
I don't know what this would do to performance on a system with a lot of 
relationships.  I could see a site with a few thousand nodes, a thousand 
or so users, each with an average buddylist of 10 users, a large 
folksonomy and a book or two getting pretty large.  Indexes could 
obviously take care of much of this, but constantly hitting a single 
table of some 50k records might hinder scalability to some degree.  To 
your point, this would need to be tested by someone more familiar with 
large tables.  I, too, am interested to see what a seasoned DBA would 
have to say about this arrangement.

-Mark