On 22 Jul 2006, at 19:32, Larry Garfield wrote:
On Saturday 22 July 2006 10:30, Dries Buytaert wrote:
1. Build a caching algorithm that uses an heuristic to pre-load frequently used URL aliases.
* Advantages: transparent, no configuration required
* Disadvantages: it a heuristic, we don't know how it would perform, it might be tricky to implement, and MySQL does this implicitly (but not as aggressive).
I really like LRU conceptually, but I don't know how we'd implement it. If done in the database, we'd have to write the last-access-time back to the database each time an alias is accessed, doubling the number of queries (unless someone know of a portable update-on-access field in SQL?).
You'd only to that once in a while ... like, at most once every 10 minutes, you'd update the 'frequency counter' (or whatnot).
Why would you need a textarea and regexes? Just add a "pre-cache" checkbox to the edit-alias screen. Then the first time the alias lookup is called, it does a quick "SELECT ... FROM ... WHERE precache=1". That gets you what the admin thinks is the most common aliases, and both the UI and code couldn't get any simpler.
That would solve the 'user complexity' problem.
Disadvantage: That's assuming the admin has any idea what the most common aliases are. :-)
That might actually be a very fair assumption. :)
On the subject of black-listing, though, does anyone ever alias a path that's under admin/? The biggest drain from the aliasing now that I see is all of the queries to look up paths that aren't aliased in the first place.
I don't but - apparently - people who localize Drupal also choose to localize admin/* URLs. I think that, by now, we all agree on the fact that we can't make any assumption about how people use the URL alias functionality. Let's keep that fact in mind during the remainder of this discussion.
4. Stop doing SQL queries when you cached all possible URL aliases.
* Advantages: transparent, no configuration required, can co- exist with (1), (2), (3) and (5).
* Disadvantages: only works for a subset of all Drupal sites, not a solution for larger Drupal sites.
Also doesn't take into account the order that the page is built. If you only have 5 aliases, but they're all primary links, those are built rather late (I think?). So the system wouldn't finish loading all aliases until it was nearly done with the page anyway.
That is correct. It would only stop executing SQL queries once drupal_lookup_path()'s local cache is 'complete'. If this is a simple change (changing a couple lines of code), this might be well worth implementing. If it gets tricky, this optimization probably isn't worth it. It's worth investigating. In a default Drupal install there is exactly one URL alias: modules/system/system.install: $ret[] = update_sql("INSERT INTO {url_alias} (src, dst) VALUES ('node/feed', 'rss.xml')"); modules/system/system.install: $ret[] = update_sql("INSERT INTO {url_alias} (src, dst) VALUES ('rss.xml', 'node/feed')"); $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}')); Combined with the above SQL query taken from drupal_lookup_path(), this means that the check "$count > 0" in drupal_lookup_path() will _always_ fail, and that we always query the database for each URL. If $count == 0, Drupal will not query the url_alias table. But unless you actually remove that one alias, you'll never take advantage of that check. As an extra optimization, I suggest that we get rid of that rss.xml URL alias, so that by default, Drupal doesn't generates hundreds of SQL queries, AND that for people who don't use path aliases, drupal_lookup_path() will never trigger any SQL queries. To get rid of the path alias, we could hardcode 'rss.xml' in the code (rather than much uglier 'node/feed') and extend the legacy.module to make the old URL work. It's also not clear why we insert two aliases ... looks like a bug in system.install. I'm wondering -- before we continue discussion this for hours and hours -- is someone willing to work on this? We've come at a point where some things could actually be implemented/fixed. -- Dries Buytaert :: http://www.buytaert.net/