Re: [infrastructure] Drupal.org should have better URL aliases
To Adrian's point, node IDs aren't useful to your casual web visitor. Therefore Drupal distros should default to include and have enabled pathauto and perhaps follow the WordPress model (the only model with which I'm familiar; if another system is better, I'd be interested in its structure as well).
I think WP doesn't use an intermediate table that contains (source URL, destination URL)-tuples. Instead, it directly translates the URL to a database query that looks up the post. Drupal: + Any URL can be aliased, not just those related to posts. + Any URL can have multiple aliases: 'team', 'about' and 'about-us' can point to the same page or content. (A bad idea as Google will penalize you for this.) - Performance penalty when URL aliases are enabled; up to 100+ queries/page. These queries are very fast though. + Very fast when URL aliases are disabled. WordPress: - Only some URLs can be aliased. - Limited to one alias per post. + Small performance penalty. - Not as fast as Drupal when URL aliases are disabled. -- Dries Buytaert :: http://www.buytaert.net/
That's good to reflect on, but I think that we do need to look at getting auto aliases into Drupal and then (obviously) figure out how to make it scale performance-wise... for example, why do you want multiple aliases per node? That doesn't seem terribly useful. Anyway, maybe we can simplify the complexity of Drupal's solution in order for it to scale -- I mean, what good is it if you can only use it on sites with < 100 aliases? Chris On 12/27/05, Dries Buytaert <dries@buytaert.net> wrote:
To Adrian's point, node IDs aren't useful to your casual web visitor. Therefore Drupal distros should default to include and have enabled pathauto and perhaps follow the WordPress model (the only model with which I'm familiar; if another system is better, I'd be interested in its structure as well).
I think WP doesn't use an intermediate table that contains (source URL, destination URL)-tuples. Instead, it directly translates the URL to a database query that looks up the post.
Drupal: + Any URL can be aliased, not just those related to posts. + Any URL can have multiple aliases: 'team', 'about' and 'about-us' can point to the same page or content. (A bad idea as Google will penalize you for this.) - Performance penalty when URL aliases are enabled; up to 100+ queries/page. These queries are very fast though. + Very fast when URL aliases are disabled.
WordPress: - Only some URLs can be aliased. - Limited to one alias per post. + Small performance penalty. - Not as fast as Drupal when URL aliases are disabled.
-- Dries Buytaert :: http://www.buytaert.net/
-- [ infrastructure | http://lists.drupal.org/listinfo/infrastructure ]
On 27 Dec 2005, at 11:30 AM, Dries Buytaert wrote:
To Adrian's point, node IDs aren't useful to your casual web visitor. Therefore Drupal distros should default to include and have enabled pathauto and perhaps follow the WordPress model (the only model with which I'm familiar; if another system is better, I'd be interested in its structure as well).
I think WP doesn't use an intermediate table that contains (source URL, destination URL)-tuples. Instead, it directly translates the URL to a database query that looks up the post.
Why not specify that every node / user has a custom path , in the user / node table. During initialisation it just means we have to do 2 more small lookups to see if the url has been aliased. Creating links we have to make sure to use the $node->path or $user-
path property.
This would cut down on a number of queries for aliases as you almost always have the node object / user object when you are busy linking to it We'd still need the alias table for anything that aren't these primary objects though. You can still provide additional aliases with the alias table, but these won't be used as outgoing links over the node/object path. Drupal:
+ Any URL can be aliased, not just those related to posts. + Any URL can have multiple aliases: 'team', 'about' and 'about- us' can point to the same page or content. (A bad idea as Google will penalize you for this.) - Performance penalty when URL aliases are enabled; up to 100+ queries/page. These queries are very fast though. + Very fast when URL aliases are disabled.
WordPress: - Only some URLs can be aliased. - Limited to one alias per post. + Small performance penalty. - Not as fast as Drupal when URL aliases are disabled.
-- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
Why not specify that every node / user has a custom path , in the user / node table.
During initialisation it just means we have to do 2 more small lookups to see if the url has been aliased. Creating links we have to make sure to use the $node->path or $user-
path property.
This would cut down on a number of queries for aliases as you almost always have the node object / user object when you are busy linking to it
We'd still need the alias table for anything that aren't these primary objects though.
You can still provide additional aliases with the alias table, but these won't be used as outgoing links over the node/object path.
We use a custom URL aliasing method on Weblabor.hu (Drupal 4.6.x), which does the path loading for nodes in node_load directly, by joinging on the url_alias table (this is quite similar to what Adrian suggests, but leaves the data storage alone in the url_alias table). This largely reduces the number of paths that you need to load (and we modified the loading of paths to work similarly to how it is in 4.7, so that only the non-node aliases are precached). Goba
Op dinsdag 27 december 2005 10:30, schreef Dries Buytaert:
I think WP doesn't use an intermediate table that contains (source URL, destination URL)-tuples. Instead, it directly translates the URL to a database query that looks up the post.
I will try to explain the RoR way of doing this. ITs absolutely brillant in ts simplicity. really! And it has absolutely no performance penalty. (but then again, php != Ruby) RoR has, to start with a much cleaner default url structure: example.com/say/hello <= example.com/controller/action For drupal that would be: example.com/say/hello <= example.com/module/function We currently hardwire this in hook_menu. Now, I must add here, that due to RoRs 'build on top of existing stuff' you get all sorts of actions for free. my_controller/list, my_controller/add etc. The next step, obviously is to map these to new actions: RoR calls this routing. We call it hook_menu. In fact, they are so similar in nature, that i suspect the menu author had a peek at RoR :) So, the good news, is that in RoR you need nothing more. The URL generation in RoR uses these maps to make its urls. That is clean, simlpe and fits well in the philosophy 'don't repeat yourself' (aka no-copy-paste-coding). I suggest we adapt this mechanism for Drupal in Core. Then we can do all our path/url mapping in code. Code we already have. Variables that are already built and loaded ($menu). What we cannot do, is hand-editing ni our GUIS. But then I ask myself: If i have pathauto, which does all the mapping in the code, then why the ** do i need to store that in a database, other then for some caching? Hence I suggest we ignore the hand-coded urlmapping for now. And focus on this menu-mapping. After that, we can just use our current database mapping (path.module), through this menu mechanism, to handle those few hand-edited paths. Or is there anyone out there, who maintains over a few hundred hand-edited path aliases? and if so, is that because you want to, or because you lack a [book-structure-mapper|taxonomy-mapper|username-mapper] ? Some pseudoish flowchart code: function hook_menu() { $menu_item[] = array(title => 'foo', original_path => 'bar/%d', original_replace => '$object->oid', 'map' => 'bar/%s', 'map_replace' => 'urlencode($obj->name)'; //replace is to know what to replace the sprintf tokens in a path } .... function create_path($path, $object) { global $menu $mid = menu_find_by_path('') $original_path = sprint_f($menu[$mid][original_path], $menu[$mid] [original_replace]) $new_path = sprint_f($menu[$mid][map_path], $menu[$mid][map_replace]) return _path_str_replace($path, $menu[$mid]); } This way we need no extra hooks, no extra variables, and no extra memory. Or hardly, anyway. While it saves a lot of code. Bèr 'who wants to think this trough more thoroughly, though' Kessels -- | Bèr Kessels | webschuur.com | website development | | Jabber & Google Talk: ber@jabber.webschuur.com | http://bler.webschuur.com | http://www.webschuur.com |
Hence I suggest we ignore the hand-coded urlmapping for now. And focus on this menu-mapping. After that, we can just use our current database mapping (path.module), through this menu mechanism, to handle those few hand-edited paths. Or is there anyone out there, who maintains over a few hundred hand-edited path aliases? and if so, is that because you want to, or because you lack a [book-structure-mapper|taxonomy-mapper|username-mapper] ?
That's a bold statement, but one I've been thinking more and more about lately. I find that custom URL aliases are only REALLY useful for the 'special' pages like 'about' and 'downloads' and 'colophon' that I want to add for each site. It's not very difficult to graft that on top of a rubust algorithmic URL mapping system. The real key, I think, is making sure that the algorithmic system can be customized by users who want to map based on a variety of criteria. --Jeff
This way we need no extra hooks, no extra variables, and no extra memory. Or hardly, anyway. While it saves a lot of code.
Erm. Does that imply that each page should be in the 'menu tree'? How does the system knows that 'project/foo' is node 42 of type project, that 'blog/2005/12/27/hello_world' is node 69 of type blog, or that 'about' is node 1 of type page? And vice versa, that node 42 is 'project/foo', that node 69 is 'blog/2005/12/27/hello_world' or that node 1 is 'about'? I don't see how we can save hooks, variables, memory and code. Can you provide an example in pseudo-code that clarifies the workflow of (i) generating a page with clean URLs and (ii) loading a page based on a clean URL? -- Dries Buytaert :: http://www.buytaert.net/
Op dinsdag 27 december 2005 17:21, schreef Dries Buytaert:
This way we need no extra hooks, no extra variables, and no extra memory. Or hardly, anyway. While it saves a lot of code.
Erm. Does that imply that each page should be in the 'menu tree'?
No. Hence the %foo tokens. Similar to locale replacement
How does the system knows that 'project/foo' is node 42 of type project, that 'blog/2005/12/27/hello_world' is node 69 of type blog, or that 'about' is node 1 of type page? And vice versa, that node 42 is 'project/foo', that node 69 is 'blog/2005/12/27/hello_world' or that node 1 is 'about'?
It does, because menu knows so. 'blog/2005/12/27/hello_world' Let me try to explain:). One way is rather easy: blogfurls.module: function blogfurls_menu() { $items[] = array('title' => 'blog thing', 'original_path' => '/blog/%year/%month/%day/%title', 'original_replace' => array('%year' =>'$obj->year', ..., '%title' => '$obj->title'), // NOTE '$obj->year' is a string! note the ' '. 'callback' => 'blogfurls_show_blog', ... } function blogfurls_show_blog() { global $menu $args['title'] = arg(4); $args['date'] = make_date_from_url(arg(1), ...,arg(3)); $node = node_load($args); return theme('node', $node); } The other part, is a bit harder to explain. And in fact, i have not yet figured out the details of how to do this in PHP: so here is the Rails code :p ActionController::Routing::.... do |map| ... map.connect "blog/:year/:month/:day/:title" #in Ruby ":month/" is like the "$month/" in PHP :controller => "blog" :action => "show_date" #we call this callback, in drupal :requirements => {:year => /(19|20)\d\d/, :month => /[01]?\d/, etc ... an array with preg-stuff to filter out dates done now, RoRs smartness comes to play. and here is where i have n clue how to do this in drupal, yet. @link = url_for(:day => "27") will return http://.../blog/2005/12/27 (the rest, is just figured out by sensible magic" @link = url_for(:year => "2005") will return http://.../blog/2005/ nifteeey! You see, that its url_for function is a bit more complex then our url feature. It does not take strings, but arguments. Maybe that direction is the key to sucess here?
I don't see how we can save hooks, variables, memory and code. Can you provide an example in pseudo-code that clarifies the workflow of (i) generating a page with clean URLs and (ii) loading a page based on a clean URL?
(i) Nothing, null, Voidness. we dont do anything on page generation! We should not do anything wrt url stuff on page generation. (exept some smarter cache generation, but that is totally off topic) (ii) See above. The part where i have no clue yet :) [ Bèr Kessels | Drupal services www.webschuur.com ]
The other part, is a bit harder to explain. And in fact, i have not yet figured out the details of how to do this in PHP: so here is the Rails code :p
ActionController::Routing::.... do |map| ... map.connect "blog/:year/:month/:day/:title" #in Ruby ":month/" is like the "$month/" in PHP :controller => "blog" :action => "show_date" #we call this callback, in drupal :requirements => {:year => /(19|20)\d\d/, :month => /[01]?\d/, etc ... an array with preg-stuff to filter out dates done
now, RoRs smartness comes to play. and here is where i have n clue how to do this in drupal, yet.
You don't know how to do it, yet you repeatedly state that we can save hooks, variables, memory and code? I'm not convinced yet, but hopefully you can come up with a working prototype though. It looks like an interesting path to explore. :) -- Dries Buytaert :: http://www.buytaert.net/
Op woensdag 28 december 2005 08:24, schreef Dries Buytaert:
It looks like an interesting path to explore.
In fact, this is the only thing I am sure of. On Irc i had a live conversation, in which I managed to cover up some of the gaps in my idea. Its nothing moer then a 'direction to explore' yet. Its by no means a fully worked out proof of concept. And unfortunately i will not (be able to) develop anything real. I have got too many parts and things in drupal that I am already neglecting. Besides my contributions, that is betterupload and relations mostly. for me they are a tad more important then this. I put this idea forward, because while working on RoR the concept and simplicity (of use) of the routing struck me. I actually wanted to bring this up in one of my various druby on rails posts :) Bèr -- [ Bèr Kessels | Drupal services www.webschuur.com ]
Op dinsdag 27 december 2005 17:21, schreef Dries Buytaert:
How does the system knows that 'project/foo' is node 42 of type project, that 'blog/2005/12/27/hello_world' is node 69 of type blog, or that 'about' is node 1 of type page? And vice versa, that node 42 is 'project/foo', that node 69 is 'blog/2005/12/27/hello_world' or that node 1 is 'about'?
okay, before anyone is about to aks his: 'how about user defined mappings'. sure: the example: user_path_mapping.module save strings from an interface -> variable_set('my_map_1', $post['map1']); in the menu function user_path_mapping_menu() { $items[] = array('title' => 'cool thing', 'original_path' => ariable_get('my_map_1', $post['map1']), 'original_replace' => array('%year' =>'$obj->year', ..., '%title' => '$obj->title'), // somehow get this from a ui too 'callback' => 'user_path_mapping_show', ... } Bèr -- PGP ber@webschuur.com http://www.webschuur.com/sites/webschuur.com/files/ber_webschuur.asc PGP berkessels@gmx.net http://www.webschuur.com/sites/webschuur.com/files/ber_gmx.asc
Hence I suggest we ignore the hand-coded urlmapping for now. And focus on this menu-mapping. After that, we can just use our current database mapping (path.module), through this menu mechanism, to handle those few hand-edited paths. Or is there anyone out there, who maintains over a few hundred hand-edited path aliases? and if so, is that because you want to, or because you lack a [book-structure-mapper|taxonomy-mapper|username-mapper] ?
We do use custom URL aliases, not autogenerated ones on Weblabor.hu, because we would like to have short urls. A news item from lately: URL: http://weblabor.hu/hirek/20051218/mysql2pgsql Title: MySQL kompatibilitási réteg készül PostreSQL-hez The title translates to: MySQL compatibility layer in the works for PostgreSQL. Note that the author targets Drupal too, but it is a different question. :) Now you see that if this title would be moved directly to the URL, - some accented chars would get there, which should either be URL escaped (like it is the case of Albrecht Dürer for example: http://hu.wikipedia.org/wiki/Albrecht_D%C3%BCrer), or autoreplaced with unaccented variants. The first looks ugly (and spam suspicious) in an email, the second can lead to funny meaning changes due to the missing accent. - the URL would be overly long, or it would not catch the essence of the topic, if truncated Note that we include the date and the 'hirek' (news) word in here to better signal what this URL is about (the date and the 'hirek' prefix is enforced on our interface). This is not to completely oppose a functionally automatic alias system, but to indicate that some people might need to have very different path values (eg. for custom company needs, replacing an old site, translating path values to some local language, etc). Goba
Op dinsdag 27 december 2005 17:26, schreef Gabor Hojtsy:
This is not to completely oppose a functionally automatic alias system, but to indicate that some people might need to have very different path values (eg. for custom company needs, replacing an old site, translating path values to some local language, etc).
I might have been unclear. I did not mean to do awa with the handmade aliases. for those that wish, they should remain, in path.module, stuck in the database. Just for those urls that are autogenerated it makes little sense to stick them in a database. As i mentioned in my re: to Neil: if we can autogenerate someting on each _save, _edit _insert etc, surely we can do so on _load? I would save a lot of DB queries, caching and memory-expensive arrays. Bèr -- | Bèr Kessels | webschuur.com | website development | | Jabber & Google Talk: ber@jabber.webschuur.com | http://bler.webschuur.com | http://www.webschuur.com |
Dries Buytaert wrote:
I think WP doesn't use an intermediate table that contains (source URL, destination URL)-tuples. Instead, it directly translates the URL to a database query that looks up the post.
Which is exactly what we should be doing. This works cleanly for objects with unique titles such as taxonomy terms and users since there is a one to one mapping of titles to objects. Another indexed column would have to be added to save the url-sanitized version of the title for quick lookups. When something is renamed we need to do a permament redirect at the previous URL. This can be done by extending URL aliasing or a parallel system which would be similar to URL aliasing since both work with pairs of URLs. I suggest worrying about the harder problem, node urls, after we solve the easy problems. I might code this if I find the time, but not for 4.7, it is too late in the development cycle to be adding a redirect API. -- Neil Drumm http://delocalizedham.com/
Neil Drumm wrote:
This works cleanly for objects with unique titles such as taxonomy terms and users since there is a one to one mapping of titles to objects.
Actually, I'm guessing taxonomy isn't so simple. It would be closer to .../taxonomy/{vocabulary}/{term} assuming that each vocabulary has uniquely-named terms. -- Neil Drumm http://delocalizedham.com/
Op dinsdag 27 december 2005 19:12, schreef Neil Drumm:
Actually, I'm guessing taxonomy isn't so simple. It would be closer to .../taxonomy/{vocabulary}/{term} assuming that each vocabulary has uniquely-named terms.
Aah. Now I see. Sorry. No, we are completely in agreement, Neil. Sorry again! -- | Bèr Kessels | webschuur.com | website development | | Jabber & Google Talk: ber@jabber.webschuur.com | http://bler.webschuur.com | http://www.webschuur.com |
Op dinsdag 27 december 2005 18:16, schreef Neil Drumm:
Which is exactly what we should be doing. This works cleanly for objects with unique titles such as taxonomy terms and users since there is a one to one mapping of titles to objects. Another indexed column would have to be added to save the url-sanitized version of the title for quick lookups.
This is teh problem: nothing /is/ static. /taxonomy/foo, yes, static enough. but what about taxonoy/foo+bar or taxonoy/foo+bar+baz otr any other combination. truly beleive that we should no stick *any* automatically generated alias in a database. if we can generate an alias on _save _edit etc, surely we can do the same on _load? Bèr -- PGP ber@webschuur.com http://www.webschuur.com/sites/webschuur.com/files/ber_webschuur.asc PGP berkessels@gmx.net http://www.webschuur.com/sites/webschuur.com/files/ber_gmx.asc
participants (7)
-
Adrian Rossouw -
Bèr Kessels -
Chris Messina -
Dries Buytaert -
Gabor Hojtsy -
Jeff Eaton -
Neil Drumm