[drupal-devel] [feature] URL rewrite hook

Jose A Reyero drupal-devel at drupal.org
Thu Sep 15 19:34:55 UTC 2005


Issue status update for 
http://drupal.org/node/29030
Post a follow up: 
http://drupal.org/project/comments/add/29030

 Project:      Drupal
 Version:      cvs
 Component:    base system
 Category:     feature requests
 Priority:     normal
 Assigned to:  chx
 Reported by:  chx
 Updated by:   Jose A Reyero
 Status:       patch (code needs review)

Goba,


Well, about caching,  I think it's no problem as long as  it's separate
for 'incoming' and 'outgoing' paths. I mean 'some_strange_path' ->
'node/21', doesn't imply 'node/21' -> 'some_strange_path'


But, about the wrappers,I actually like them and I'm thinking of some
uses for them. In our case the 'custom_url_rewrite' function may call
again drupal_lookup_path, to check whether the 'language+path' has some
alias. So I'd like to have separate 'drupal_get_path_alias' and
'drupal_lookup_path' having the later restricted to sytem defined
paths.
Also, on incoming paths, we re-search again the path without language
on module_init, this way supporting aliases with and without
language...


I mean, ok to some 'two level caching' (saving that hook in the
wrappers), if it makes sense, but not that convinced about 'all in one
function/single caching'.


Mmm,  maybe I'm messing it too much... does it make sense for you?


And I was thinking too, what if we moved some ugly logic in common.inc
into drupal_get_normal_path() ? I'm talking about this:


  // Initialize $_GET['q'] prior to loading modules and invoking
hook_init().
  if (!empty($_GET['q'])) {
    $_GET['q'] = drupal_get_normal_path(trim($_GET['q'], '/'));
  }
  else {
    $_GET['q'] =
drupal_get_normal_path(variable_get('site_frontpage', 'node'));
  }


However, I dont want this thread to be still more complex, just
pointing at some possible use' of that wrappers.




Jose A Reyero



Previous comments:
------------------------------------------------------------------------

Wed, 17 Aug 2005 11:39:12 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite.patch (1.35 KB)

I need a URL rewrite hook sometimes, here is an implementation, with
ample comments. I looked into arg() to see whether it needs a reset
parameter, and it does not, but it contained a minor bug, which is also
fixed.




------------------------------------------------------------------------

Wed, 17 Aug 2005 11:46:45 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_0.patch (1.73 KB)

Performance boost.




------------------------------------------------------------------------

Wed, 17 Aug 2005 11:48:51 +0000 : Jose A Reyero

+1


This is really needed by i18n module, and maybe other modules could use
it to add some extra information in the query string.




------------------------------------------------------------------------

Wed, 17 Aug 2005 11:50:16 +0000 : killes at www.drop.org

this seems just an evil plot to cater for the evil hack that i18n.module
is. --




------------------------------------------------------------------------

Wed, 17 Aug 2005 11:57:23 +0000 : chx

killes, while I know you do not like the current implementation of i18n,
that module exists and even works. If you do not like the current
approach, you can always write a better one. And also, please note we
try to introduce non-i18n specific solutions this time.




------------------------------------------------------------------------

Wed, 17 Aug 2005 12:05:10 +0000 : killes at www.drop.org

"it works" has never been a consideration in Drupal development, don't
let us start to use it. I don't need an i18n module. Had I had the urge
to write one, it would have been based on walkah's excellent start to be
seen here:


http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/walkah/translate_node/


I don't see anything non-i18n specific here. let us not pollute low
level functions such as url() with custom hacks.




------------------------------------------------------------------------

Wed, 17 Aug 2005 12:26:11 +0000 : Goba

Why extend url() and why not the alias retrieval functions? BTW at that
time it was decided that a single function should be used
(conf_url_rewrite()) and not a hook, because of performance reasons. If
you provide the functionality this was, then conf_url_rewrite() gets
confusing, and even meaningless if you do it in the alias retrieval
functions themselfs.




------------------------------------------------------------------------

Wed, 17 Aug 2005 12:49:30 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_1.patch (1.95 KB)

Goba is so right, that I was already working on it.




------------------------------------------------------------------------

Thu, 18 Aug 2005 11:11:39 +0000 : Jose A Reyero

This second version is more limited, and wont be that useful, because in
case an alias exists, the rewritting is skipped.


IMHO, the aim for this patch should be allowing modules to add
information in the path or in the query string, for *all* the outgoing
urls. I was thinking of i18n module and language information, but this
can be useful for other modules too. 


So please, let's stick to the previous version of the patch




------------------------------------------------------------------------

Thu, 18 Aug 2005 11:45:23 +0000 : Goba

Jose, so you advocate only mangling URLs on output? Really?




------------------------------------------------------------------------

Thu, 18 Aug 2005 15:41:41 +0000 : Jose A Reyero

Goba, yes, my idea was to provide a hook, so any module can mangle the
path/query string. What will happen with that incoming paths is a
different story, I mean any module can access the query string any time
later... And I'd like this to be separated from path aliasing, which can
be done in a different step...


Ideally, of course, this would be done for outgoing and incoming urls,
but there's a number of issues, like incoming path processing being
done before module loading, in common.inc... Another issue, IMHO, is
that current url handling is a bit messy and could be better
streamlined. 


But the first patch was simple enough and quite straight forward, at
least it works for *all* outgoing URLs, and allows to rewrite query
strings, while the second one, I see much more limited use for it....




------------------------------------------------------------------------

Thu, 18 Aug 2005 17:07:13 +0000 : Goba

Jose, AFAIS if you mangle with the outgoing URLs (eg. put 'en/' before
all URLs), and there will be nothing in the incoming processing to
strip that, this will directly lead to 'page not found' replies. But I
probably fail to see the exact use case you would like to propose this
for.




------------------------------------------------------------------------

Thu, 18 Aug 2005 17:48:13 +0000 : Jose A Reyero

Goba, yes, basically you are right... But these are my use cases:


For i18n
----------
- Add language prefix at the beginning of the outgoing urls
- Remove it in the module init hook. 


Yes, I know... this is probably some of what killes calls evil hacks
:-(


But with this new hook, I could also think of adding language in the
query string. And maybe some other modules could want to add some info
in the query string. There are a number of reasons for language to be
in the URL -search engines, links...- and also the ability to create
path aliases with or without language...


Hackish? Yes. But right now we are facing the following dilemma: no
specific calls for non core modules -which I dont disagree with- but
then any implementation of this, for not to require patching, will have
to be in a module, and then incoming paths are first processed before
module loading, and on top of that we have the cache system.... so its
quite a complex thing....


I'd be happy with any idea to implement this more cleanly, or maybe we
should aim higher, like reworking the whole init thing and path
pre-processing... What we are trying for the moment is to introduce
only some general use hooks, like this small patch... otherwise it is a
too big all-or-nothing question to get this working in Drupal....


Thanks for your comments and I'd appreciate any suggestion.




------------------------------------------------------------------------

Mon, 29 Aug 2005 18:54:48 +0000 : Jose A Reyero

Attachment: http://drupal.org/files/issues/url_rewrite_2.patch (713 bytes)

Updated simplified patch.


As other patches are already in, we only need this to run i18n module
with Drupal 4.7 without patching!!




------------------------------------------------------------------------

Mon, 29 Aug 2005 19:29:45 +0000 : DFG

I would be more useful to have a similar hook in drupal_get_path_alias()
and drupal_get_internal_path().




------------------------------------------------------------------------

Tue, 30 Aug 2005 00:26:27 +0000 : fago

patch applies (with offset) and doesn't break anything, as i could see.


a lot of people are interested in i18n, so please include this last
one.
further the possibilty inject a query-string might be useful in other
cases.


+1




------------------------------------------------------------------------

Tue, 13 Sep 2005 12:48:29 +0000 : mgifford

This is a worth while hook to add to the core code.  It will ease the
implementation of more multi-lingual sites in drupal and provide better
support for a broader community of users/developers.


+1




------------------------------------------------------------------------

Tue, 13 Sep 2005 15:22:10 +0000 : Souvent22

+1. This patch is needed. We don't live in a vacum ya know, there are
more languages than english. :).  Hope this gets in, I just made a site
for someone in Italy, and this could help out when making modules.




------------------------------------------------------------------------

Tue, 13 Sep 2005 18:02:25 +0000 : Dries

If this patch gets committed, there will be a third mechanism to rewrite
URLs.  Also, it is a well-known fact that url() and l() are a
performance bottleneck.  I think we need to take a step back, see how
we can overcome the limitations of the current system and come to a
simple yet fast URL rewrite mechanism.  There ought to be a better way.




------------------------------------------------------------------------

Tue, 13 Sep 2005 18:24:22 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_3.patch (1.87 KB)

This version implements a hook in drupal_get_path_alias and in
drupal_get_normal_path . Performance hit should be negligeble in most
cases: a foreach on an empty array.




------------------------------------------------------------------------

Tue, 13 Sep 2005 18:46:39 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_4.patch (1.89 KB)




------------------------------------------------------------------------

Tue, 13 Sep 2005 18:50:22 +0000 : Dries

Moving the code around doesn't change a thing; it's still a third/new
mechanism to rewrite URLs.  I'll take a closer look at this as time
permits.




------------------------------------------------------------------------

Tue, 13 Sep 2005 19:00:54 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_5.patch (1.89 KB)

No, it's a second mechanism only as it removes conf_url_rewrite --
conf_url_rewrite routines can be moved to a module and thus shared as a
module. And you can have more than one rewrite, this way.




------------------------------------------------------------------------

Tue, 13 Sep 2005 23:48:42 +0000 : Jose A Reyero

Attachment: http://drupal.org/files/issues/url_rewrite_alt.patch (1.61 KB)

+1 for chx (plus alternative patch)


I like chx's patch, and I think also that getting rid of
'conf_url_rewrite' and replacing it with a hook is a good thing.


However, if the main concern is performance, we could use too
'conf_url_rewrite', if only all paths were run through it
unconditionally.


So, making clear I'd prefer chx's solution, here's an alternative one.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:09:18 +0000 : Goba

I like chx's generic version (url_rewrite_5.patch) best, as it replaces
an awkward URL rewrite mechanism (introduced by myself, pressed by
performance reasons), with a lot cleaner, albeit a little bit less
performant solution. BTW there is a spelling mistake, chx written
'inccoming' in the patch, plus I see no reason to check for
empty($arguments) in arg() at all, since having the $q set properly
(after this patch) would also ensure that $arguments is properly set.
Note that this patch also fixes  a small performance problem in arg():
now it always tries to do an explode, if $_GET['q'] is empty, that is
on the homepage.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:39:26 +0000 : eldarin

Another approach which also works well:
in menu.module:menu_execute_active_handler() right after setting $path
to the q variable,
I call a module which I called "urlpatterns" which resolves incoming
URLs.
There I match against a set of URL regexp patterns configured by the
module admin.
That way the URL rewrite happens very early in the useragent request to
the server.


The reason for this is further down in
menu.module:menu_execute_active_handler(), where I have a AAA function
which decides if access should be given on a configurable URL basis -
configured with another module doing AAA.


That way security settings mimic .htaccess in some way, while having
the power of regexp flexibility as well as very good extendibility.


The get_normal_path() and get_path_alias() functions then are routed to
a check and lookup in the "urlpatterns" module as well as the AAA
module. The immediate benefits of this is that I don't link to anywhere
on the server, where access is denied to the user.


Just improves security somewhat, as well as performance when I only let
one "subscriber" hang on to the hook given in
menu_execute_active_handler(). In my specialized case, I see no reason
to have more than one URL rewriting module, since it would possible
become a large spaghetti mess with possible unpredictable results if
there is no central way of assuring URLs. I take care of "sub-URL
aliasing" with regexp rules, so there should be no reason to do so
either. It keeps security a bit tidier.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:44:01 +0000 : Goba

Well, true the hook version of the patch does not ensure any order of
the modules being called (currently alphabetical), so it needs careful
programmers to implement the hooks.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:44:56 +0000 : eldarin

In my opinion, performance and security is key to successful URL
aliasing/rewrites. Having it moved to a module is much better than the
current non-perfect scheme. I can't see the need for multi-module
direct access to rewriting URLs though. That's why I use weighted
regexp patterns which flexibly enough also work as sub-URL rewriting
for any module that would register such a rewrite rule - in the same
manner as the menu-building with callbacks.


Does this make any sense ? Should give much better performance and
security than the current (and suggested) schemes, no ?




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:48:16 +0000 : eldarin

Goba, yes.
The effect of the multi-module rewrite policy would be probable loss of
URL control for the site-admin and total chaos. It would also require
massive efforts from module contributors to ensure their module
rewriting would behave.


My outlined solution - which work well in practical terms for my needs
- handles this by allowing the siteadmin to modify weights - and even
disabling - of rules suggested by modules.


That should make life a lot easier for anyone.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:51:02 +0000 : eldarin

I meant to say "multi-module direct rewrite policy", where modules have
direct control, even though siteadmin might have perceived control via
multiple admin configuration pages scattered between all the modules
who would implement a URL rewrite.
;-)




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:52:32 +0000 : Goba

OK, instead of talking, let us see, how your weighted regexps solution
works (in the form of a patch or at least a code example). I have an
idea, but it might be far away from what you actually do.




------------------------------------------------------------------------

Wed, 14 Sep 2005 09:59:40 +0000 : eldarin

The way I do revers lookup for outgoing matching is far from perfect
right now, and is something I haven't had time to completely figure
out. I was thinking of just using a similarly weighted reverse table -
using the same patterns for now acutally - but it would be better with
both incoming and outgoing separate although possibly overlapping
rulesets. They are exclusive in matching since they apply to the
domains of incoming, outgoing.


An idea would be to have the modules suggest the rules as one of three:
for incoming, for outgoing, for both.
My solution involves a lot of security configurations, that's why it
was setup in the very early menu URL handling, and only using cached
lookup in the other functions.




------------------------------------------------------------------------

Wed, 14 Sep 2005 10:04:47 +0000 : eldarin

The weighting of the ruleset is really straightforward - a treegraph
really. The intricate bit is in optimizing performance for this
tree-graph with regards to the security rules as well. Yes, the proof
is in the pudding, as they say ..
;-)




------------------------------------------------------------------------

Wed, 14 Sep 2005 10:19:30 +0000 : eldarin

I forgot to mention that this approach I used, broke normal core URL
aliasing support. I solved that by hooking the nodeapi for insert of
aliases just like the core.
Also - have the issues in http://drupal.org/node/21938 and
http://drupal.org/node/22035 since been overlooked ?


A further possibility would be to use t() for wording matching, but I
guess the best is to implement that directly on the rules and then
further expand the tree-graph if multilanguage aliasing was needed for
one site (think something in the lanes of pathauto). But I had no need
for translating on my project.




------------------------------------------------------------------------

Wed, 14 Sep 2005 12:04:40 +0000 : Jose A Reyero

Well, summarizing -and adding a little bit :-)


- eldarin's solution looks like an iteresting path to explore, but it
still has some side effects - I'll let this for the future, maybe
another thread?


- chx's patch loos like a good one, but has some performance concerns.


- then there's my patch which is like a performance-safe, middle way
alternative.


So I propose we focus on that performace side, and do some benchmarking
with chx's patch. 


I'd like to know whether we have some data about how a generic new hook
-not calling actually any module, only the hook execution itself-, and
being called like a hundred times per page, actually impacts
performance. Does anybody know?


And... in case performance is too bad.... I'm thinking of a variable,
that could be set by modules and hidden to the user, to enable/disable
this hook..... how does this sound like?




------------------------------------------------------------------------

Wed, 14 Sep 2005 12:14:02 +0000 : chx

Re. perf. I'll test these solutions with ab later but I am absolutely
not convinced that a foreach on an empty array is so much slower than a
function_exists against a non-existing function. I expect both to be
negligeble. Stay tuned.




------------------------------------------------------------------------

Wed, 14 Sep 2005 13:30:16 +0000 : Dries

Goba: do you have time to follow up on this, and to make the final
decision with regard to this problem?  I'm aksing because it takes some
digging to understand the problem and to evaluate the different
approaches.  If we want to commit this before the freeze, I'll have to
delegate this job.  (Also make sure left-overs and old mechanisms get
removed.)  If so, just give me a 'go' and I'll commit it after minimal
testing.




------------------------------------------------------------------------

Wed, 14 Sep 2005 20:30:52 +0000 : Goba

I have just arrived home, and I am going to go sleeping, as tomorrow I
am going to have lessons at the university. Sadly I won't be able to
look at the patches and ab results of chx before tomorrow afternoon.
Since I think i18n is one of the most important aspects of the upcoming
Drupal release, I will try to be around tomorrow afternoon to look at
results coming up until then.




------------------------------------------------------------------------

Wed, 14 Sep 2005 21:24:17 +0000 : Dries

If you can look at this patch before Sunday I'll make it slip it (given
you think it is ready).




------------------------------------------------------------------------

Thu, 15 Sep 2005 11:17:32 +0000 : chx

Attachment: http://drupal.org/files/issues/url_rewrite_6.patch (1.87 KB)

Removed array_reverse. Goba pointed out that URL rewrite implementations
should work in any order. So it's not en/node/18 but lang/en/node/18.




------------------------------------------------------------------------

Thu, 15 Sep 2005 14:56:11 +0000 : Goba

Attachment: http://drupal.org/files/issues/Drupal_path_alias_alternate_by_goba.patch (2.12 KB)

Chx, making the URL look uglier just to make it work in any order is not
an option IMHO. The solution needs to be better. Here is another
alternate version. What it does very closely resembles Jose's last
patch, keeping conf_url_rewrite (albeit with a different name
custom_url_rewrite) and with different parameterization. I made it to
be in line with the drupal_lookup_path() function in parameter order
and meaning, giving a third parameter to signal if the path is already
aliased (this might be badly needed in solutions where an already
aliased path should not be further aliased, we had enough discussion
about this in the past).


The pluses of this patch is that the architecture nearly stays the
same, the performance decrease should be negligible (a
function_exists() is always evaluated even if there is no such
function), but we know have an option to alias already aliased URLs
(but we don't need to). The custom_url_rewrite() function can actually
do a foreach on hooks, if someone wants to do that.


Preparing this patch, it struck me, that we don't actually need
drupal_get_normal_path() or drupal_get_path_alias(), as they are just
very thin wrappers around drupal_lookup_path(). In fact, by
incorporating their contents to drupal_lookup_path(), we can
(page-level) cache the results in the $map array. That is given that
the results of the custom_url_rewrite() are time insensitive (there is
no parameter to determine the return value other then those passed).


Opinions?




------------------------------------------------------------------------

Thu, 15 Sep 2005 18:05:05 +0000 : Jose A Reyero

Attachment: http://drupal.org/files/issues/Drupal_path_alias_alternate_by_goba_02.patch (2.08 KB)

Goba, I like your latest patch


Just another small twist, which is passing the actual original path,
more 'inexpensive' info..


> The custom_url_rewrite() function can actually do a foreach on hooks,
if someone wants to do that.


This one, I think,  is a great idea :-)


However I'd still like to know how 'expensive' one more hook for each
outgoing link is...




------------------------------------------------------------------------

Thu, 15 Sep 2005 18:17:32 +0000 : Goba

What I also "proposed", is that we get rid of the ...path_alias() and
...normal_path() wrapper functions, and make the final aliased URL
cached. Jose, would there be any problem with (page level) caching of
the resulting final aliased URL in the use case of the i18n module? I
can hardly think of a custom_url_alias() function which does not return
the same alias for the same parameters in different times.




------------------------------------------------------------------------

Thu, 15 Sep 2005 18:55:39 +0000 : chx

$_GET['q'] can change during the code flow thus arg() cache reset is
necessary. It was in my patch, and actually it's a bit unrelated issue.




------------------------------------------------------------------------

Thu, 15 Sep 2005 19:02:06 +0000 : Goba

Ah, anyway, I am busy implementing code which does custom alias caching
too.




------------------------------------------------------------------------

Thu, 15 Sep 2005 19:19:51 +0000 : Goba

Attachment: http://drupal.org/files/issues/Drupal_path_reform.patch (10.11 KB)

OK, here is an update, which fixes Chx's noted stuff, and integrates
custom alias handling into the page level caching system, removing
drupal_get_path_alias and drupal_get_normal_path on the way. I know I
also swapped the named actions of drupal_lookup_path(), but since the
verb is lookup, the $op should be "what to look up", not "what I have
passed".


Performance was taken care of as much as possible, so there will be no
aliasing if there are no aliases in the database AND no custom alias
function. If either is true, aliasing will be tried, and resultig
aliases will be cached (on the page level). The only conceptual
difference in this and the non-cached version is that the custom
function should work on a one to one relationship level, so that the
cache array does not change over time on the page. If this is not the
case, the custom alias results should not be cached.


Patch yet untested, waiting for opinions from the i18n guys. Does the
above constraints fit with the i18n plans?







More information about the drupal-devel mailing list