Pathauto scaleability - path column in node / user table
One of the things we discussed doing once 4.8 is open, around the time of vancouver, was that in order to maintain an alias for each and every node, using the path table for that, was just not going to scale. So, simply adding a 'path' column, which is in effect the preferred alias for that object, and then using a node_link($node) function (or just l($node->path)), would allow us to run modules like pathauto on very large production sites with thousands of nodes. This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers? -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
This would be just trhe 'primary' alias for a given object, right? Url alias table holds additional aliases that work but aren't used for building links. +1 from me. I hope someone volunteers for this. I don't think it is too much code. Just fooling with the path functions mainly. On 5/8/06 7:22 AM, "Adrian Rossouw" <adrian@bryght.com> wrote:
One of the things we discussed doing once 4.8 is open, around the time of vancouver, was that in order to maintain an alias for each and every node, using the path table for that, was just not going to scale.
So, simply adding a 'path' column, which is in effect the preferred alias for that object, and then using a node_link($node) function (or just l($node->path)), would allow us to run modules like pathauto on very large production sites with thousands of nodes.
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
-- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On 08 May 2006, at 2:16 PM, Moshe Weitzman wrote:
This would be just trhe 'primary' alias for a given object, right? Url alias table holds additional aliases that work but aren't used for building links. +1 from me. I hope someone volunteers for this. I don't think it is too much code. Just fooling with the path functions mainly. Yeah. That and making any links to nodes be done via a node_link function.
So please. volunteers =) -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
I'd be up for taking a look at this if someone (you?) could play guide. Perhaps the first thing is to write up a battleplan (even if it is small)? I could do that. Dan
On 08 May 2006, at 2:16 PM, Moshe Weitzman wrote:
This would be just trhe 'primary' alias for a given object, right? Url alias table holds additional aliases that work but aren't used for building links. +1 from me. I hope someone volunteers for this. I don't think it is too much code. Just fooling with the path functions mainly. Yeah. That and making any links to nodes be done via a node_link function.
So please. volunteers =)
-- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
oops... I meant to reply to Adrian 1:1 - I don't want to over commit - but I'm interested in poking around at this... Dan
I'd be up for taking a look at this if someone (you?) could play guide. Perhaps the first thing is to write up a battleplan (even if it is small)? I could do that.
Dan
On 08 May 2006, at 2:16 PM, Moshe Weitzman wrote:
This would be just trhe 'primary' alias for a given object, right? Url alias table holds additional aliases that work but aren't used for building links. +1 from me. I hope someone volunteers for this. I don't think it is too much code. Just fooling with the path functions mainly.
Yeah. That and making any links to nodes be done via a node_link function.
So please. volunteers =)
-- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On 5/8/06, Moshe Weitzman <weitzman@tejasa.com> wrote:
This would be just trhe 'primary' alias for a given object, right? Url alias table holds additional aliases that work but aren't used for building links. +1 from me. I hope someone volunteers for this. I don't think it is too much code. Just fooling with the path functions mainly.
What about UI issues? If we're going to favor primary aliases over other ones, we should probably change something in the UI to reflect this way of thinking as well. Grtz, Breyten :-)
On 08 May 2006, at 8:31 PM, Breyten Ernsting wrote:
What about UI issues? If we're going to favor primary aliases over other ones, we should probably change something in the UI to reflect this way of thinking as well.
The path you set on the node form, is this path any other path you set, is on admin/paths -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
This will be a big performance boost, and chx will not complain about the number of queries to url_alias when devel.module is enabled.
From Moshe's comment, do I understand that we will keep the multiple alias (i.e. one url can have multiple aliases), but only optimizing it where nodes are concerned?
Say "node/123" has an alias of "this_is_an_article". Are we going to have that alias in the url_alias table AS WELL AS in the node table? If so, this can lead to data integrity issues, where the alias in the node table says one thing and the ones in the url_alias says another. If we forbid node/123 aliases from going into the url_alias table, then it is not an issue.
On 08 May 2006, at 17:03, Khalid B wrote:
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
This will be a big performance boost, and chx will not complain about the number of queries to url_alias when devel.module is enabled.
Depends. However, with a quick hack, it is easy to measure the potential performance gain. Just make drupal_lookup_path() return the original path if the path is of the format "node/<number>". It should give a reasonable estimation, and would be pretty useful data. -- Dries Buytaert :: http://www.buytaert.net/
On Mon, 08 May 2006 17:03:07 +0200, Khalid B <kb@2bits.com> wrote:
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
This will be a big performance boost, and chx will not complain about the number of queries to url_alias when devel.module is enabled.
Me? Complaining? About the number of queries? That's not a problem with me, I even support the current solution for 4.6 as well. My problem with the whole thing is that it is slow. String matching is always slow. This however would mean that for outgoing node links at least we will have not string matching. Oh joy.
Karoly Negyesi wrote:
On Mon, 08 May 2006 17:03:07 +0200, Khalid B <kb@2bits.com> wrote:
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
This will be a big performance boost, and chx will not complain about the number of queries to url_alias when devel.module is enabled.
Me? Complaining? About the number of queries? That's not a problem with me, I even support the current solution for 4.6 as well.
I was (one of the) one(s) complaining about the number of url_alias queries. They seriously impact page performance with many menu items, IMO.
Adrian Rossouw wrote:
One of the things we discussed doing once 4.8 is open, around the time of vancouver, was that in order to maintain an alias for each and every node, using the path table for that, was just not going to scale.
So, simply adding a 'path' column, which is in effect the preferred alias for that object, and then using a node_link($node) function (or just l($node->path)), would allow us to run modules like pathauto on very large production sites with thousands of nodes.
This is a faily simple, albeit fairly large patch. Do we have any volunteers / naysayers?
I have one request: When this goes in, ensure that modules can affect the generated URL. This is essential to being able to keep context on node views. Nodes often exist in more than one context (imagine the multi-parented book) but it's impossible to do the breadcrumbs right for it. However, if that context could be passed into the node, we could do some very cool stuff.
On 08 May 2006, at 5:56 PM, Earl Miles wrote:
I have one request: When this goes in, ensure that modules can affect the generated URL. This is essential to being able to keep context on node views. Nodes often exist in more than one context (imagine the multi-parented book) but it's impossible to do the breadcrumbs right for it. However, if that context could be passed into the node, we could do some very cool stuff. You mean like with custom_url_rewrite ?
I've always thought url_rewrite should be a hook, not a function. personally. many modules might want to alter url's for context and the like. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
Adrian Rossouw wrote:
One of the things we discussed doing once 4.8 is open, around the time of vancouver, was that in order to maintain an alias for each and every node, using the path table for that, was just not going to scale.
So, simply adding a 'path' column, which is in effect the preferred alias for that object, and then using a node_link($node) function (or just l($node->path)), would allow us to run modules like pathauto on very large production sites with thousands of nodes.
I actually scribbled something like this in my notebook a few nights ago. Change the $path argument of l() to be $target which is either a path (string) or object with a link child. If the target is on object, then use $target->link without an alias lookup. This could be used for nodes, users, etc. -- Neil Drumm http://delocalizedham.com/
Neil Drumm wrote:
Adrian Rossouw wrote:
One of the things we discussed doing once 4.8 is open, around the time of vancouver, was that in order to maintain an alias for each and every node, using the path table for that, was just not going to scale.
So, simply adding a 'path' column, which is in effect the preferred alias for that object, and then using a node_link($node) function (or just l($node->path)), would allow us to run modules like pathauto on very large production sites with thousands of nodes.
I actually scribbled something like this in my notebook a few nights ago.
Change the $path argument of l() to be $target which is either a path (string) or object with a link child. If the target is on object, then use $target->link without an alias lookup. This could be used for nodes, users, etc.
IMHO a good idea! Goba
Gabor Hojtsy wrote:
Neil Drumm wrote:
IMHO a good idea! +1.
To think about: Ensure the menu entry option for nodes somehow uses this or can skip the lookup step. This one is somewhat difficult for me to figure out how to do right, but it's a Giant Issue.
Just one quick question: Who uses multiple aliases for a node? Personally, I think the applications are very limited. So I would favor something like this, too.
Konstantin Käfer wrote:
Just one quick question: Who uses multiple aliases for a node? Personally, I think the applications are very limited. So I would favor something like this, too.
Even if you use multiple aliases it does not matter, since a primary alias need to be used when generating output. Incoming requests can still utilize multiple aliases as stored in the url_alias table. Goba
May I ask why this cannot be accomplished using a JOIN (and adding appropriate field(s) to the url_alias table) rather than adding a new field to the node table? -K
On 5/8/06, Karthik <narakasura@gmail.com> wrote:
May I ask why this cannot be accomplished using a JOIN (and adding appropriate field(s) to the url_alias table) rather than adding a new field to the node table?
Not all aliases in the url_alias table are nodes, and hence we cannot have a "nid" column. If we do a join, it will have to be on a string (e.g. "node/123") which will be very expensive as joins go ... The same type of performance drag that the concat of realm in node access ...etc.
Karthik wrote:
Not all aliases in the url_alias table are nodes, and hence we cannot have a "nid" column.
The id column can be an index with default 0 for non-nodal rows or something similar.
And for users? Goba
Karthik wrote:
Not all aliases in the url_alias table are nodes, and hence we cannot have a "nid" column.
The id column can be an index with default 0 for non-nodal rows or something similar.
-K But this also leaves out 'user' and any othe rnon-node object that may want URLs of this nature.
But this also leaves out 'user' and any othe rnon-node object that may want URLs of this nature.
Then we need a 'realm'/'type' which is also indexed. Is the alternative to add a path varchar field to the node, user, foo and bar tables? Or even a path ID? -K
On Mon, 08 May 2006 22:18:10 +0200, Karthik <narakasura@gmail.com> wrote:
But this also leaves out 'user' and any othe rnon-node object that may want URLs of this nature.
Then we need a 'realm'/'type' which is also indexed. Is the alternative to add a path varchar field to the node, user, foo and bar tables? Or even a path ID?
If you arrived here, then add table_name and id fields to aliases table.
I had to do something like this in VotingAPI -- I store everything as 'content_id' and 'content_type', with content_type generally corresponding to the name of the table that holds the data in question. If we were to standardize on something for this, it could be nice. -Jeff
-----Original Message----- From: Karthik [mailto:narakasura@gmail.com] Sent: Monday, May 08, 2006 3:18 PM To: development@drupal.org Subject: Re: [development] Pathauto scaleability - path column in node / usertable
But this also leaves out 'user' and any othe rnon-node object that may want URLs of this nature.
Then we need a 'realm'/'type' which is also indexed. Is the alternative to add a path varchar field to the node, user, foo and bar tables? Or even a path ID?
-K
Earl Miles wrote:
Karthik wrote:
Not all aliases in the url_alias table are nodes, and hence we cannot have a "nid" column.
The id column can be an index with default 0 for non-nodal rows or something similar.
-K But this also leaves out 'user' and any othe rnon-node object that may want URLs of this nature.
So? Adding an alias column to the node table leaves out all the non-node objects, too. It's 6 of one, half a dozen of the other, i.e a wash or very little difference. Both implementations can be made to work, and they both waste space in database tables under most circumstances.
On 09 May 2006, at 12:16 AM, Chris Johnson wrote:
So? Adding an alias column to the node table leaves out all the non-node objects, too. It's 6 of one, half a dozen of the other, i.e a wash or very little difference. Both implementations can be made to work, and they both waste space in database tables under most circumstances.
Umm. with the implimentation of l(), any other table could get the path property too. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On 5/9/06, Earl Miles <merlin@logrus.com> wrote:
But this also leaves out 'user' and any other non-node object that may want URLs of this nature.
In order for (potentially) any object to manage its own primary aliases, I think we need a hook_path(). I really do. Modules would simply implement this hook by returning, say, the table and the field where the path is stored. I guess they'd also return the URL fragment that is their 'realm' - for node, this would be 'node' (since all node paths are in the form 'node/x'). E.g.: function node_path() { return array('table' => 'node', 'field' => 'path', 'path' => 'node'); } Or perhaps we could even accomplish this without a _path() hook, by using hook_db_rewrite_sql()? It's about time that we added path support to this hook anyway. Particularly if we modified this hook to support SELECT and other clauses (see <http://drupal.org/node/60853>), it could work. I love the idea - allowing modules to manage their own paths. But it has to be implemented so that any module can manage its own paths, not just node module. Cheers, Jaza.
On 5/9/06, Jeremy Epstein <jazepstein@gmail.com> wrote:
On 5/9/06, Earl Miles <merlin@logrus.com> wrote:
But this also leaves out 'user' and any other non-node object that may want URLs of this nature.
In order for (potentially) any object to manage its own primary aliases, I think we need a hook_path(). I really do. Modules would simply implement this hook by returning, say, the table and the field where the path is stored. I guess they'd also return the URL fragment that is their 'realm' - for node, this would be 'node' (since all node paths are in the form 'node/x'). E.g.:
function node_path() { return array('table' => 'node', 'field' => 'path', 'path' => 'node'); }
I think this is a bad idea because it will add way to much complexity (nullifying most of the performance gain?). I don't see why we need a hook for things like that. As the module author, you know where the path is stored. You can retrieve it, and pass it along to by-pass the lookup mechanism. -- Dries Buytaert :: http://buytaert.net/
On 5/8/06, Konstantin Käfer <kkaefer@gmail.com> wrote:
Just one quick question: Who uses multiple aliases for a node? Personally, I think the applications are very limited. So I would favor something like this, too.
Here is a use case: If you use pathauto, and had some scheme of aliasing nodes (e.g. using title only), then changed the scheme to be something else (e.g. vocab/term/title), keeping the old aliases will prevent your visitors from having 404s, and search engines from referring visitors to the 404 page. All you need to do is to check the pathauto option that says add a new alias if one already exists, and you are done. Note: I am FOR the proposed change, not against it. I don't use these multiple aliases myself.
On 08 May 2006, at 7:21 PM, Neil Drumm wrote:
Change the $path argument of l() to be $target which is either a path (string) or object with a link child. If the target is on object, then use $target->link without an alias lookup. This could be used for nodes, users, etc.
+1 very simple fix. HOWEVER. we also need a way to default to node/nid then. So node-
link (and i prefer path really) needs to be initialised at all times.
-- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On 08 May 2006, at 7:21 PM, Neil Drumm wrote:
Change the $path argument of l() to be $target which is either a path (string) or object with a link child. If the target is on object, then use $target->link without an alias lookup. This could be used for nodes, users, etc.
One thing though. It would be nice of you could link the tabs on the node too. ie : journal/my-title-here/edit instead of node/1233/edit -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
+1 on the node operations using the aliases (and +1 on the performance boost). I'm starting to roll out Drupal to a lot of very limited-knowledge clients and having consistency in the URLs is a big plus (and them not having to see "node").
ie : journal/my-title-here/edit instead of node/1233/edit
Rob Roy Barreca Electronic Insight Corporation 12526 High Bluff Drive, Suite 300 San Diego, CA 92130 http://www.electronicinsight.com rob@electronicinsight.com
Adrian Rossouw wrote:
So, simply adding a 'path' column, which is in effect the preferred alias for that object *snip*
This sounds like a GoodThing to me. Something cool that could be done with this also, would be to redirect extra aliases from the table to the preferred alias. That'd help a lot with web analytics, etc by being able to have multiple ways for people to arrive at the same location but also being able to always track it as a single path. -Rowan
participants (17)
-
Adrian Rossouw -
Breyten Ernsting -
Chris Johnson -
Dan Robinson -
Dries Buytaert -
Earl Miles -
Gabor Hojtsy -
Jeff Eaton -
Jeremy Epstein -
Karoly Negyesi -
Karthik -
Khalid B -
Konstantin Käfer -
Moshe Weitzman -
Neil Drumm -
Rob Barreca -
Rowan Kerr