relationships API vs i18n
Hi, The more I think about how the non-i18n Drupal world would benefit from i18n features comming into core (so how do we convince you with all the good stuff :) is that relations are perfect implementation candidates for node translation connection storage. Previous discussion and implementation: Create generalized relationship module http://drupal.org/node/28480 Relationship - Node linking and Metadata http://drupal.org/project/relationship Relationships & site structuring group http://groups.drupal.org/relationships-site-structuring The RDF triple approach is a true and tried method to handling relations (see the existing module description). I would propose that we should include a small triple handler in core. This would be good for i18n and other modules to use, so we might get to a win-win situation. Contrib modules can extend on this triple handling with RDF generation and such. Who is willing to work on relations for core? IMHO we should look into reimplementing the book module functionality first with this, since the book module interface is quirky as it is anyway. There the storage of weight might be a good question. While triple's don't allow you to store weights, the RDF representation allows for 'primary data' to be connected to the triple (rdf:value for those RDF-literate). This approach can be used to represent the weight IMHO. Gabor
Wow. Triples in core. Triples as a potential i18n solution. *Very* mind bending. Consider: triples could actually point to translated content that is somewhere completely different. Talk to the NINA / CivicActions guys ... They want to do a bunch more stuff with RDF. On 11/9/06, Gabor Hojtsy <gabor@hojtsy.hu> wrote:
Hi,
The more I think about how the non-i18n Drupal world would benefit from i18n features comming into core (so how do we convince you with all the good stuff :) is that relations are perfect implementation candidates for node translation connection storage.
Previous discussion and implementation:
Create generalized relationship module http://drupal.org/node/28480
Relationship - Node linking and Metadata http://drupal.org/project/relationship
Relationships & site structuring group http://groups.drupal.org/relationships-site-structuring
The RDF triple approach is a true and tried method to handling relations (see the existing module description). I would propose that we should include a small triple handler in core. This would be good for i18n and other modules to use, so we might get to a win-win situation. Contrib modules can extend on this triple handling with RDF generation and such.
Who is willing to work on relations for core? IMHO we should look into reimplementing the book module functionality first with this, since the book module interface is quirky as it is anyway. There the storage of weight might be a good question. While triple's don't allow you to store weights, the RDF representation allows for 'primary data' to be connected to the triple (rdf:value for those RDF-literate). This approach can be used to represent the weight IMHO.
Gabor
-- Boris Mann Vancouver 778-896-2747 San Francisco 415-367-3595 Skype borismann http://www.bryght.com
On 09 Nov 2006, at 10:10, Gabor Hojtsy wrote:
The RDF triple approach is a true and tried method to handling relations (see the existing module description). I would propose that we should include a small triple handler in core. This would be good for i18n and other modules to use, so we might get to a win- win situation. Contrib modules can extend on this triple handling with RDF generation and such.
Interesting thought, but it is not quite clear what you envision the 'triple handler' to do? Do you suggest we store RDF in the database? (I don't like that idea.) Or should we just look at RDF and use matching terminology/properties so we can generate RDF if we want to? The CivicActions people were doing quite a bit of work with RDF, and RDF triples, in specific. -- Dries Buytaert :: http://www.buytaert.net/
On Thu, 9 Nov 2006, Dries Buytaert wrote:
On 09 Nov 2006, at 10:10, Gabor Hojtsy wrote:
The RDF triple approach is a true and tried method to handling relations (see the existing module description). I would propose that we should include a small triple handler in core. This would be good for i18n and other modules to use, so we might get to a win-win situation. Contrib modules can extend on this triple handling with RDF generation and such.
Interesting thought, but it is not quite clear what you envision the 'triple handler' to do? Do you suggest we store RDF in the database? (I don't like that idea.) Or should we just look at RDF and use matching terminology/properties so we can generate RDF if we want to?
We should not store RDF. RDF at its basics is just a triple description format (a predicate about a subject relating to an object). This is like "node/6 is translation of node/2". We need to somehow adress node/6 and node/2 and we should be able to mark that the relationship is about translation. We can of course introduce a new table into core to handle this just for i18n, and it would be very simple to do: CREATE TABLE translation ( nid1 int nid2 int ); The problem is that it is too domain specific and not really forward looking IMHO. We might just end up with this, but we might get further if we think ahead. There is an ongoing relationships discussion in the community to somehow provide this triple-like system to relate nodes to each other (and even nodes to users etc). By offloading this work to a relationships API, I hope to get more out of this "related nodes" concept then just binary relations of translation nodes. Also this would be something to do for those who are absolutely not interested in i18n, but would like to help bring some cool stuff into Drupal. The existing relations efforts (some ready code) already operate in this field, and category module uses a similar conceptual approach. I hope that this way we can also revitalize the book module. Whether you generate RDF out of these simple triples is up to you (certainly not with core support). Gabor
Gabor Hojtsy wrote:
By offloading this work to a relationships API, I hope to get more out of this "related nodes" concept then just binary relations of translation nodes. Also this would be something to do for those who are absolutely not interested in i18n, but would like to help bring some cool stuff into Drupal.
The existing relations efforts (some ready code) already operate in this field, and category module uses a similar conceptual approach. I hope that this way we can also revitalize the book module.
That sort of relationship model is (IMO) a necessity for Drupal's long-term health. Like everyone else I took a crack at implementing such a model and have had it sitting in my sandbox for a few months. If we do use 'relationships' as the basis for translation, I hope we take the time to do it in such a way that taxonomy terms, buddy lists, and other kinds of connections can be represented as well. --Jeff
+1 for relations. I remember sharing some emails with Jeff(eaton) about this a long time ago; but yes, triples are pretty much the way to go. So we have: CREATE TABLE relation ( type CHAR(12) nid1 int nid2 int priority int; ); which takes care of all possible relations(books, buddylists, relativity, etc) and has a lightweight ordering mechanism for things like page numbers, etc. -Arnab On 11/9/06, Jeff Eaton <jeff@viapositiva.net> wrote:
Gabor Hojtsy wrote:
By offloading this work to a relationships API, I hope to get more out of this "related nodes" concept then just binary relations of translation nodes. Also this would be something to do for those who are absolutely not interested in i18n, but would like to help bring some cool stuff into Drupal.
The existing relations efforts (some ready code) already operate in this field, and category module uses a similar conceptual approach. I hope that this way we can also revitalize the book module.
That sort of relationship model is (IMO) a necessity for Drupal's long-term health. Like everyone else I took a crack at implementing such a model and have had it sitting in my sandbox for a few months.
If we do use 'relationships' as the basis for translation, I hope we take the time to do it in such a way that taxonomy terms, buddy lists, and other kinds of connections can be represented as well.
--Jeff
It would be good to support relations between entities of any type, not just nodes. So instead, we could have: CREATE TABLE relation ( type char(12), id1 int, type1 char(12), id2 int, type2 char(12), weight int ); I also give my +1 to a lightweight triple-based relationship API in core. I know that dman (author of the relationships / RDF module) has been putting a lot of work into this stuff, although he has expressed concern that his module is currently too heavy for most simple uses. It would be great if we could get a super-light version of his API into core. Cheers, Jaza. On 11/10/06, Arnab Nandi <arnab@arnab.org> wrote:
+1 for relations. I remember sharing some emails with Jeff(eaton) about this a long time ago; but yes, triples are pretty much the way to go. So we have:
CREATE TABLE relation ( type CHAR(12) nid1 int nid2 int priority int; );
which takes care of all possible relations(books, buddylists, relativity, etc) and has a lightweight ordering mechanism for things like page numbers, etc.
-Arnab
On 11/9/06, Jeff Eaton <jeff@viapositiva.net> wrote:
Gabor Hojtsy wrote:
By offloading this work to a relationships API, I hope to get more out of this "related nodes" concept then just binary relations of translation nodes. Also this would be something to do for those who are absolutely not interested in i18n, but would like to help bring some cool stuff into Drupal.
The existing relations efforts (some ready code) already operate in this field, and category module uses a similar conceptual approach. I hope that this way we can also revitalize the book module.
That sort of relationship model is (IMO) a necessity for Drupal's long-term health. Like everyone else I took a crack at implementing such a model and have had it sitting in my sandbox for a few months.
If we do use 'relationships' as the basis for translation, I hope we take the time to do it in such a way that taxonomy terms, buddy lists, and other kinds of connections can be represented as well.
--Jeff
On 9-Nov-06, at 8:21 PM, Jeremy Epstein wrote:
It would be good to support relations between entities of any type, not just nodes. So instead, we could have:
CREATE TABLE relation ( type char(12), id1 int, type1 char(12), id2 int, type2 char(12), weight int );
Would a "module" field here also be helpful, so you know which module set the relation? And a "name" field perhaps? (maybe that's getting a little too frilly ;))
On 9-Nov-06, at 10:14 PM, Angela Byron wrote:
On 9-Nov-06, at 8:21 PM, Jeremy Epstein wrote:
It would be good to support relations between entities of any type, not just nodes. So instead, we could have:
CREATE TABLE relation ( type char(12), id1 int, type1 char(12), id2 int, type2 char(12), weight int );
Would a "module" field here also be helpful, so you know which module set the relation? And a "name" field perhaps? (maybe that's getting a little too frilly ;))
Frilly, perhaps... but we've seen time and time again how tracking the module that is "maintaining" the relationship can be *very* helpful (files table anyone?) ... name is perhaps excessive, but personally I'd +1 it. -- James Walker :: http://walkah.net/ :: xmpp:walkah@walkah.net
Module sound dangerous to me. I can imagine two different modules that both need a particular relationship. If it is already established that is ok. If not, then they establish it with their module name. -----Original Message----- From: development-bounces@drupal.org [mailto:development-bounces@drupal.org] On Behalf Of James Walker Sent: Thursday, November 09, 2006 10:26 PM To: development@drupal.org Subject: Re: [development] relationships API vs i18n On 9-Nov-06, at 10:14 PM, Angela Byron wrote:
On 9-Nov-06, at 8:21 PM, Jeremy Epstein wrote:
It would be good to support relations between entities of any type, not just nodes. So instead, we could have:
CREATE TABLE relation ( type char(12), id1 int, type1 char(12), id2 int, type2 char(12), weight int );
Would a "module" field here also be helpful, so you know which module set the relation? And a "name" field perhaps? (maybe that's getting a little too frilly ;))
Frilly, perhaps... but we've seen time and time again how tracking the module that is "maintaining" the relationship can be *very* helpful (files table anyone?) ... name is perhaps excessive, but personally I'd +1 it. -- James Walker :: http://walkah.net/ :: xmpp:walkah@walkah.net -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.14.0/525 - Release Date: 11/9/2006
On Fri, 10 Nov 2006, Jeremy Epstein wrote:
It would be good to support relations between entities of any type, not just nodes. So instead, we could have:
CREATE TABLE relation ( type char(12), id1 int, type1 char(12), id2 int, type2 char(12), weight int );
I also give my +1 to a lightweight triple-based relationship API in core. I know that dman (author of the relationships / RDF module) has been putting a lot of work into this stuff, although he has expressed concern that his module is currently too heavy for most simple uses. It would be great if we could get a super-light version of his API into core.
Question is if the above is enough lightweight. Scanning for integers in a table are the quickest method to grab some record. Question is what type of stuff do we want to have relations with. Eventually everything becomes a node as it seems from ongoing development :) :) See profile nodes, category module for taxonomy as nodes and such. Well, OK we are not there yet, so we need relations between different type of stuff. Maybe if we need to sacrifice some design perfectness for speed, we can do: CRATE TABLE relation_nid_nid ( id1 INT, id2 INT, type varchar, weight INT ) CREATE TABLE relation_nid_tid ( id1 INT, id2 INT, type varchar, weight INT ) ... The same column names are to ease(?) SQL selection. You can also represent tid->nid relations by a reverse relation type in the same table. This design does not allow for querying all stuff that relates to a node at once, but is a lot quicker for situations when you only need to find some particular type of entities related to nodes (like taxonomy items, other nodes, users, etc). This would be rather limiting since the created tables would define the possible relation types. If there is no tid_uid table, then you cant have that relation. Note that I am just bringing up this because we need to keep performance in mind. Maybe this is a horrible idea (and I certainly see some of its week points), but it might energize some discussion in this field. Gabor
If we do use 'relationships' as the basis for translation, I hope we take the time to do it in such a way that taxonomy terms, buddy lists, and other kinds of connections can be represented as well.
"take the time"! we have been talking about this for about 5 years. in that time, i don't recall any core patch for this API. even a simple nid => nid system would be very helpful. i'm not necessarily advocating that, but i submit that a decent implementation that exists is preferable over a great one that doesn't. in other words, i'll take almost anything thats simple.
On 09 Nov 2006, at 22:21, Gabor Hojtsy wrote:
We can of course introduce a new table into core to handle this just for i18n, and it would be very simple to do:
CREATE TABLE translation ( nid1 int nid2 int );
The problem is that it is too domain specific and not really forward looking IMHO. We might just end up with this, but we might get further if we think ahead.
I don't see the addded value of doing this through a relationships API with a generic table. It would only safe 10 lines of code, and make things slower and harder to grok. Having simple, specialized tables (like the one above) is anything but a bad thing, IMO. -- Dries Buytaert :: http://www.buytaert.net/
On 11/10/06, Dries Buytaert <dries.buytaert@gmail.com> wrote:
Having simple, specialized tables (like the one above) is anything but a bad thing, IMO.
I completely agree with this; there is a tradeoff in keeping specialized tables for high-volume tasks(e.g. taxonomy); and low-volume, high variablity tasks("related to", "books", etc). But we should not forget the latter case. For the latter case, having a new table for each of these relationships (most of which are custom ones anyway) is fairly overkill (since there will be ~10-20 tuples per table, then). The concept of relations is used for everything that's not high volume, and does not deserve a special table. Think of it as a "catchall" for all relations that are not treated specially. -Arnab
-- Dries Buytaert :: http://www.buytaert.net/
The last few posts to this thread assume that we need to generalize the API *and* generalize the storage table. A lightweight triples API could take a table name as a starting parameter (with a sensible default, of course), and then modules like the buddylist could add their own triple table but still use the API. So before we start pouring cement around our database foundation, lets try to identify the types of functions we want in the API. Aside from putting triples into the database and getting them out again, what would, say i18n need? Furthermore, every ambitious developer under the sun has taken a stab at this already (dikini, vauxia, eaton, myself, dmann, jaza...), so what have we learned from it? And finally, it would be silly to talk about this without asking what role chx's tree API might play. Triples can result in trees, too, after all. -Robert Arnab Nandi wrote:
On 11/10/06, Dries Buytaert <dries.buytaert@gmail.com> wrote:
Having simple, specialized tables (like the one above) is anything but a bad thing, IMO.
I completely agree with this; there is a tradeoff in keeping specialized tables for high-volume tasks(e.g. taxonomy); and low-volume, high variablity tasks("related to", "books", etc). But we should not forget the latter case. For the latter case, having a new table for each of these relationships (most of which are custom ones anyway) is fairly overkill (since there will be ~10-20 tuples per table, then).
The concept of relations is used for everything that's not high volume, and does not deserve a special table. Think of it as a "catchall" for all relations that are not treated specially.
-Arnab
-- Dries Buytaert :: http://www.buytaert.net/
Furthermore, every ambitious developer under the sun has taken a stab at this already (dikini, vauxia, eaton, myself, dmann, jaza...), so what have we learned from it?
Me too - I started one for the FRBR's implementation of relations for the LibDB module. I planned to spit out RDF too. What I've learned from talking to a number of the folks above is that: we're not going to agree on an implementation <g>. Vauxia, specifically, has done a large amount of speed/performance testing, I believe. -- Morbus Iff ( strive for mediocrity ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
Robert Douglass wrote:
The last few posts to this thread assume that we need to generalize the API *and* generalize the storage table. A lightweight triples API could take a table name as a starting parameter (with a sensible default, of course), and then modules like the buddylist could add their own triple table but still use the API.
So before we start pouring cement around our database foundation, lets try to identify the types of functions we want in the API. Aside from putting triples into the database and getting them out again, what would, say i18n need?
Here are a couple of the priorities I came up with when I made my attempt: 1) Must be able to capture different relationship types. (Standard taxonomy terms used the 'describes' relationship type. Book pages used the 'child_of' relationship type. 2) Use an actual ID for each relationship. Currently, node/taxonomy intersections have no id, so it's impossible to hang other meta-data tables off of them. 3) Capture optional weight, so relationships can be ranked. 4) Capture an optional uid, so that relationships can be user-specific. If we add to that the ability to break things out into custom tables for optimization purposes, the way cache tables work in 5.0, I think we have a real winner. My schema worked out to something like this: ---- rid int(10) NOT NULL auto_increment, source_id int(10) NOT NULL default 0, relationship_type text NOT NULL, target_id int(10) NOT NULL default 0, uid int(10) default NULL, weight tinyint(3) NOT NULL default 0, ---- I also had a small .inc with helper functions in my sandbox. The real problem isn't, IMO, writing this kind of glue. It's converting the monsters like taxonomy over to the new schema. --Jeff
FWIW, http://drupal.org/project/nat is a quickly put together module that uses the taxonomy system to establish node-node relationships. -K
On Fri, 10 Nov 2006, Dries Buytaert wrote:
On 09 Nov 2006, at 22:21, Gabor Hojtsy wrote:
We can of course introduce a new table into core to handle this just for i18n, and it would be very simple to do:
CREATE TABLE translation ( nid1 int nid2 int
;
The problem is that it is too domain specific and not really forward looking IMHO. We might just end up with this, but we might get further if we think ahead.
I don't see the addded value of doing this through a relationships API with a generic table. It would only safe 10 lines of code, and make things slower and harder to grok. Having simple, specialized tables (like the one above) is anything but a bad thing, IMO.
OK, I am fine with a simple specialized solution. Gabor
On Nov 9, 2006, at 11:26 PM, Dries Buytaert wrote:
Having simple, specialized tables (like the one above) is anything but a bad thing, IMO.
so, everyone who wants to use relationships has to fend for themselves? provide their own DB tables, their own APIs, etc, etc? :( things i want to use relationships for: project world: 1) issue <-> issue (duplicate, depends on, etc) 2) issue <-> user (currently hard-coded as 3 things: "assigned", "participated", "created", but this is inflexible and there's lots of room for improvement). 3) issue <-> cvs commit (currently only 1 way... the commit points to the issue, not the other way) 4) project <-> project (related projects, dependencies, sub-projects, etc) 5) project <-> user (e.g. email notifications of new releases, but lots of potential) other stuff: 4) user <-> nid (that's all the signup.module should be) ... i could go on and on... -1 to me having to implement separate DB tables and APIs for each of these things. that's going to kill progress on any of this (in terms of the time it'll take me to do it), increase code bloat, and encourage incompatible APIs that are harder for other developers to grok, work with, extend, etc. +1 to giving the relationship itself an id that you can associate metadata with (that's what signup.module would be doing). in fact, jeff eaton's proposal to the list seems fairly complete and straight forward. that's how i see it, at least. ;) thanks, -derek
participants (14)
-
Angela Byron -
Arnab Nandi -
Boris Mann -
Derek Wright -
Dries Buytaert -
Gabor Hojtsy -
James Walker -
Jeff Eaton -
Jeremy Epstein -
Karthik -
Morbus Iff -
Moshe Weitzman -
Robert Douglass -
Walt Daniels