Issue status update for http://drupal.org/node/29030 Post a follow up: http://drupal.org/project/comments/add/29030 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: chx Reported by: chx Updated by: Goba Status: patch (code needs review) What I also "proposed", is that we get rid of the ...path_alias() and ...normal_path() wrapper functions, and make the final aliased URL cached. Jose, would there be any problem with (page level) caching of the resulting final aliased URL in the use case of the i18n module? I can hardly think of a custom_url_alias() function which does not return the same alias for the same parameters in different times. Goba Previous comments: ------------------------------------------------------------------------ Wed, 17 Aug 2005 11:39:12 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite.patch (1.35 KB) I need a URL rewrite hook sometimes, here is an implementation, with ample comments. I looked into arg() to see whether it needs a reset parameter, and it does not, but it contained a minor bug, which is also fixed. ------------------------------------------------------------------------ Wed, 17 Aug 2005 11:46:45 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_0.patch (1.73 KB) Performance boost. ------------------------------------------------------------------------ Wed, 17 Aug 2005 11:48:51 +0000 : Jose A Reyero +1 This is really needed by i18n module, and maybe other modules could use it to add some extra information in the query string. ------------------------------------------------------------------------ Wed, 17 Aug 2005 11:50:16 +0000 : killes@www.drop.org this seems just an evil plot to cater for the evil hack that i18n.module is. -- ------------------------------------------------------------------------ Wed, 17 Aug 2005 11:57:23 +0000 : chx killes, while I know you do not like the current implementation of i18n, that module exists and even works. If you do not like the current approach, you can always write a better one. And also, please note we try to introduce non-i18n specific solutions this time. ------------------------------------------------------------------------ Wed, 17 Aug 2005 12:05:10 +0000 : killes@www.drop.org "it works" has never been a consideration in Drupal development, don't let us start to use it. I don't need an i18n module. Had I had the urge to write one, it would have been based on walkah's excellent start to be seen here: http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/walkah/translate_... I don't see anything non-i18n specific here. let us not pollute low level functions such as url() with custom hacks. ------------------------------------------------------------------------ Wed, 17 Aug 2005 12:26:11 +0000 : Goba Why extend url() and why not the alias retrieval functions? BTW at that time it was decided that a single function should be used (conf_url_rewrite()) and not a hook, because of performance reasons. If you provide the functionality this was, then conf_url_rewrite() gets confusing, and even meaningless if you do it in the alias retrieval functions themselfs. ------------------------------------------------------------------------ Wed, 17 Aug 2005 12:49:30 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_1.patch (1.95 KB) Goba is so right, that I was already working on it. ------------------------------------------------------------------------ Thu, 18 Aug 2005 11:11:39 +0000 : Jose A Reyero This second version is more limited, and wont be that useful, because in case an alias exists, the rewritting is skipped. IMHO, the aim for this patch should be allowing modules to add information in the path or in the query string, for *all* the outgoing urls. I was thinking of i18n module and language information, but this can be useful for other modules too. So please, let's stick to the previous version of the patch ------------------------------------------------------------------------ Thu, 18 Aug 2005 11:45:23 +0000 : Goba Jose, so you advocate only mangling URLs on output? Really? ------------------------------------------------------------------------ Thu, 18 Aug 2005 15:41:41 +0000 : Jose A Reyero Goba, yes, my idea was to provide a hook, so any module can mangle the path/query string. What will happen with that incoming paths is a different story, I mean any module can access the query string any time later... And I'd like this to be separated from path aliasing, which can be done in a different step... Ideally, of course, this would be done for outgoing and incoming urls, but there's a number of issues, like incoming path processing being done before module loading, in common.inc... Another issue, IMHO, is that current url handling is a bit messy and could be better streamlined. But the first patch was simple enough and quite straight forward, at least it works for *all* outgoing URLs, and allows to rewrite query strings, while the second one, I see much more limited use for it.... ------------------------------------------------------------------------ Thu, 18 Aug 2005 17:07:13 +0000 : Goba Jose, AFAIS if you mangle with the outgoing URLs (eg. put 'en/' before all URLs), and there will be nothing in the incoming processing to strip that, this will directly lead to 'page not found' replies. But I probably fail to see the exact use case you would like to propose this for. ------------------------------------------------------------------------ Thu, 18 Aug 2005 17:48:13 +0000 : Jose A Reyero Goba, yes, basically you are right... But these are my use cases: For i18n ---------- - Add language prefix at the beginning of the outgoing urls - Remove it in the module init hook. Yes, I know... this is probably some of what killes calls evil hacks :-( But with this new hook, I could also think of adding language in the query string. And maybe some other modules could want to add some info in the query string. There are a number of reasons for language to be in the URL -search engines, links...- and also the ability to create path aliases with or without language... Hackish? Yes. But right now we are facing the following dilemma: no specific calls for non core modules -which I dont disagree with- but then any implementation of this, for not to require patching, will have to be in a module, and then incoming paths are first processed before module loading, and on top of that we have the cache system.... so its quite a complex thing.... I'd be happy with any idea to implement this more cleanly, or maybe we should aim higher, like reworking the whole init thing and path pre-processing... What we are trying for the moment is to introduce only some general use hooks, like this small patch... otherwise it is a too big all-or-nothing question to get this working in Drupal.... Thanks for your comments and I'd appreciate any suggestion. ------------------------------------------------------------------------ Mon, 29 Aug 2005 18:54:48 +0000 : Jose A Reyero Attachment: http://drupal.org/files/issues/url_rewrite_2.patch (713 bytes) Updated simplified patch. As other patches are already in, we only need this to run i18n module with Drupal 4.7 without patching!! ------------------------------------------------------------------------ Mon, 29 Aug 2005 19:29:45 +0000 : DFG I would be more useful to have a similar hook in drupal_get_path_alias() and drupal_get_internal_path(). ------------------------------------------------------------------------ Tue, 30 Aug 2005 00:26:27 +0000 : fago patch applies (with offset) and doesn't break anything, as i could see. a lot of people are interested in i18n, so please include this last one. further the possibilty inject a query-string might be useful in other cases. +1 ------------------------------------------------------------------------ Tue, 13 Sep 2005 12:48:29 +0000 : mgifford This is a worth while hook to add to the core code. It will ease the implementation of more multi-lingual sites in drupal and provide better support for a broader community of users/developers. +1 ------------------------------------------------------------------------ Tue, 13 Sep 2005 15:22:10 +0000 : Souvent22 +1. This patch is needed. We don't live in a vacum ya know, there are more languages than english. :). Hope this gets in, I just made a site for someone in Italy, and this could help out when making modules. ------------------------------------------------------------------------ Tue, 13 Sep 2005 18:02:25 +0000 : Dries If this patch gets committed, there will be a third mechanism to rewrite URLs. Also, it is a well-known fact that url() and l() are a performance bottleneck. I think we need to take a step back, see how we can overcome the limitations of the current system and come to a simple yet fast URL rewrite mechanism. There ought to be a better way. ------------------------------------------------------------------------ Tue, 13 Sep 2005 18:24:22 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_3.patch (1.87 KB) This version implements a hook in drupal_get_path_alias and in drupal_get_normal_path . Performance hit should be negligeble in most cases: a foreach on an empty array. ------------------------------------------------------------------------ Tue, 13 Sep 2005 18:46:39 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_4.patch (1.89 KB) ------------------------------------------------------------------------ Tue, 13 Sep 2005 18:50:22 +0000 : Dries Moving the code around doesn't change a thing; it's still a third/new mechanism to rewrite URLs. I'll take a closer look at this as time permits. ------------------------------------------------------------------------ Tue, 13 Sep 2005 19:00:54 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_5.patch (1.89 KB) No, it's a second mechanism only as it removes conf_url_rewrite -- conf_url_rewrite routines can be moved to a module and thus shared as a module. And you can have more than one rewrite, this way. ------------------------------------------------------------------------ Tue, 13 Sep 2005 23:48:42 +0000 : Jose A Reyero Attachment: http://drupal.org/files/issues/url_rewrite_alt.patch (1.61 KB) +1 for chx (plus alternative patch) I like chx's patch, and I think also that getting rid of 'conf_url_rewrite' and replacing it with a hook is a good thing. However, if the main concern is performance, we could use too 'conf_url_rewrite', if only all paths were run through it unconditionally. So, making clear I'd prefer chx's solution, here's an alternative one. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:09:18 +0000 : Goba I like chx's generic version (url_rewrite_5.patch) best, as it replaces an awkward URL rewrite mechanism (introduced by myself, pressed by performance reasons), with a lot cleaner, albeit a little bit less performant solution. BTW there is a spelling mistake, chx written 'inccoming' in the patch, plus I see no reason to check for empty($arguments) in arg() at all, since having the $q set properly (after this patch) would also ensure that $arguments is properly set. Note that this patch also fixes a small performance problem in arg(): now it always tries to do an explode, if $_GET['q'] is empty, that is on the homepage. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:39:26 +0000 : eldarin Another approach which also works well: in menu.module:menu_execute_active_handler() right after setting $path to the q variable, I call a module which I called "urlpatterns" which resolves incoming URLs. There I match against a set of URL regexp patterns configured by the module admin. That way the URL rewrite happens very early in the useragent request to the server. The reason for this is further down in menu.module:menu_execute_active_handler(), where I have a AAA function which decides if access should be given on a configurable URL basis - configured with another module doing AAA. That way security settings mimic .htaccess in some way, while having the power of regexp flexibility as well as very good extendibility. The get_normal_path() and get_path_alias() functions then are routed to a check and lookup in the "urlpatterns" module as well as the AAA module. The immediate benefits of this is that I don't link to anywhere on the server, where access is denied to the user. Just improves security somewhat, as well as performance when I only let one "subscriber" hang on to the hook given in menu_execute_active_handler(). In my specialized case, I see no reason to have more than one URL rewriting module, since it would possible become a large spaghetti mess with possible unpredictable results if there is no central way of assuring URLs. I take care of "sub-URL aliasing" with regexp rules, so there should be no reason to do so either. It keeps security a bit tidier. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:44:01 +0000 : Goba Well, true the hook version of the patch does not ensure any order of the modules being called (currently alphabetical), so it needs careful programmers to implement the hooks. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:44:56 +0000 : eldarin In my opinion, performance and security is key to successful URL aliasing/rewrites. Having it moved to a module is much better than the current non-perfect scheme. I can't see the need for multi-module direct access to rewriting URLs though. That's why I use weighted regexp patterns which flexibly enough also work as sub-URL rewriting for any module that would register such a rewrite rule - in the same manner as the menu-building with callbacks. Does this make any sense ? Should give much better performance and security than the current (and suggested) schemes, no ? ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:48:16 +0000 : eldarin Goba, yes. The effect of the multi-module rewrite policy would be probable loss of URL control for the site-admin and total chaos. It would also require massive efforts from module contributors to ensure their module rewriting would behave. My outlined solution - which work well in practical terms for my needs - handles this by allowing the siteadmin to modify weights - and even disabling - of rules suggested by modules. That should make life a lot easier for anyone. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:51:02 +0000 : eldarin I meant to say "multi-module direct rewrite policy", where modules have direct control, even though siteadmin might have perceived control via multiple admin configuration pages scattered between all the modules who would implement a URL rewrite. ;-) ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:52:32 +0000 : Goba OK, instead of talking, let us see, how your weighted regexps solution works (in the form of a patch or at least a code example). I have an idea, but it might be far away from what you actually do. ------------------------------------------------------------------------ Wed, 14 Sep 2005 09:59:40 +0000 : eldarin The way I do revers lookup for outgoing matching is far from perfect right now, and is something I haven't had time to completely figure out. I was thinking of just using a similarly weighted reverse table - using the same patterns for now acutally - but it would be better with both incoming and outgoing separate although possibly overlapping rulesets. They are exclusive in matching since they apply to the domains of incoming, outgoing. An idea would be to have the modules suggest the rules as one of three: for incoming, for outgoing, for both. My solution involves a lot of security configurations, that's why it was setup in the very early menu URL handling, and only using cached lookup in the other functions. ------------------------------------------------------------------------ Wed, 14 Sep 2005 10:04:47 +0000 : eldarin The weighting of the ruleset is really straightforward - a treegraph really. The intricate bit is in optimizing performance for this tree-graph with regards to the security rules as well. Yes, the proof is in the pudding, as they say .. ;-) ------------------------------------------------------------------------ Wed, 14 Sep 2005 10:19:30 +0000 : eldarin I forgot to mention that this approach I used, broke normal core URL aliasing support. I solved that by hooking the nodeapi for insert of aliases just like the core. Also - have the issues in http://drupal.org/node/21938 and http://drupal.org/node/22035 since been overlooked ? A further possibility would be to use t() for wording matching, but I guess the best is to implement that directly on the rules and then further expand the tree-graph if multilanguage aliasing was needed for one site (think something in the lanes of pathauto). But I had no need for translating on my project. ------------------------------------------------------------------------ Wed, 14 Sep 2005 12:04:40 +0000 : Jose A Reyero Well, summarizing -and adding a little bit :-) - eldarin's solution looks like an iteresting path to explore, but it still has some side effects - I'll let this for the future, maybe another thread? - chx's patch loos like a good one, but has some performance concerns. - then there's my patch which is like a performance-safe, middle way alternative. So I propose we focus on that performace side, and do some benchmarking with chx's patch. I'd like to know whether we have some data about how a generic new hook -not calling actually any module, only the hook execution itself-, and being called like a hundred times per page, actually impacts performance. Does anybody know? And... in case performance is too bad.... I'm thinking of a variable, that could be set by modules and hidden to the user, to enable/disable this hook..... how does this sound like? ------------------------------------------------------------------------ Wed, 14 Sep 2005 12:14:02 +0000 : chx Re. perf. I'll test these solutions with ab later but I am absolutely not convinced that a foreach on an empty array is so much slower than a function_exists against a non-existing function. I expect both to be negligeble. Stay tuned. ------------------------------------------------------------------------ Wed, 14 Sep 2005 13:30:16 +0000 : Dries Goba: do you have time to follow up on this, and to make the final decision with regard to this problem? I'm aksing because it takes some digging to understand the problem and to evaluate the different approaches. If we want to commit this before the freeze, I'll have to delegate this job. (Also make sure left-overs and old mechanisms get removed.) If so, just give me a 'go' and I'll commit it after minimal testing. ------------------------------------------------------------------------ Wed, 14 Sep 2005 20:30:52 +0000 : Goba I have just arrived home, and I am going to go sleeping, as tomorrow I am going to have lessons at the university. Sadly I won't be able to look at the patches and ab results of chx before tomorrow afternoon. Since I think i18n is one of the most important aspects of the upcoming Drupal release, I will try to be around tomorrow afternoon to look at results coming up until then. ------------------------------------------------------------------------ Wed, 14 Sep 2005 21:24:17 +0000 : Dries If you can look at this patch before Sunday I'll make it slip it (given you think it is ready). ------------------------------------------------------------------------ Thu, 15 Sep 2005 11:17:32 +0000 : chx Attachment: http://drupal.org/files/issues/url_rewrite_6.patch (1.87 KB) Removed array_reverse. Goba pointed out that URL rewrite implementations should work in any order. So it's not en/node/18 but lang/en/node/18. ------------------------------------------------------------------------ Thu, 15 Sep 2005 14:56:11 +0000 : Goba Attachment: http://drupal.org/files/issues/Drupal_path_alias_alternate_by_goba.patch (2.12 KB) Chx, making the URL look uglier just to make it work in any order is not an option IMHO. The solution needs to be better. Here is another alternate version. What it does very closely resembles Jose's last patch, keeping conf_url_rewrite (albeit with a different name custom_url_rewrite) and with different parameterization. I made it to be in line with the drupal_lookup_path() function in parameter order and meaning, giving a third parameter to signal if the path is already aliased (this might be badly needed in solutions where an already aliased path should not be further aliased, we had enough discussion about this in the past). The pluses of this patch is that the architecture nearly stays the same, the performance decrease should be negligible (a function_exists() is always evaluated even if there is no such function), but we know have an option to alias already aliased URLs (but we don't need to). The custom_url_rewrite() function can actually do a foreach on hooks, if someone wants to do that. Preparing this patch, it struck me, that we don't actually need drupal_get_normal_path() or drupal_get_path_alias(), as they are just very thin wrappers around drupal_lookup_path(). In fact, by incorporating their contents to drupal_lookup_path(), we can (page-level) cache the results in the $map array. That is given that the results of the custom_url_rewrite() are time insensitive (there is no parameter to determine the return value other then those passed). Opinions? ------------------------------------------------------------------------ Thu, 15 Sep 2005 18:05:05 +0000 : Jose A Reyero Attachment: http://drupal.org/files/issues/Drupal_path_alias_alternate_by_goba_02.patch (2.08 KB) Goba, I like your latest patch Just another small twist, which is passing the actual original path, more 'inexpensive' info..
The custom_url_rewrite() function can actually do a foreach on hooks, if someone wants to do that.
This one, I think, is a great idea :-) However I'd still like to know how 'expensive' one more hook for each outgoing link is...