[development] Caching, caching, caching...
Earl Miles
merlin at logrus.com
Sat Jul 22 18:05:54 UTC 2006
Dries Buytaert wrote:
>
> On 22 Jul 2006, at 19:32, Larry Garfield wrote:
>
>> On Saturday 22 July 2006 10:30, Dries Buytaert wrote:
>>
>>> 1. Build a caching algorithm that uses an heuristic to pre-load
>>> frequently used URL aliases.
>>>
>>> * Advantages: transparent, no configuration required
>>>
>>> * Disadvantages: it a heuristic, we don't know how it would
>>> perform, it might be tricky to implement, and MySQL does this
>>> implicitly (but not as aggressive).
>>
>>
>> I really like LRU conceptually, but I don't know how we'd implement
>> it. If
>> done in the database, we'd have to write the last-access-time back to
>> the
>> database each time an alias is accessed, doubling the number of queries
>> (unless someone know of a portable update-on-access field in SQL?).
>
>
> You'd only to that once in a while ... like, at most once every 10
> minutes, you'd update the 'frequency counter' (or whatnot).
>
>> Why would you need a textarea and regexes? Just add a "pre-cache"
>> checkbox to
>> the edit-alias screen. Then the first time the alias lookup is
>> called, it
>> does a quick "SELECT ... FROM ... WHERE precache=1". That gets you
>> what the
>> admin thinks is the most common aliases, and both the UI and code
>> couldn't
>> get any simpler.
>
>
> That would solve the 'user complexity' problem.
>
>> Disadvantage: That's assuming the admin has any idea what the most
>> common
>> aliases are. :-)
>
>
> That might actually be a very fair assumption. :)
>
>> On the subject of black-listing, though, does anyone ever alias a
>> path that's
>> under admin/? The biggest drain from the aliasing now that I see is
>> all of
>> the queries to look up paths that aren't aliased in the first place.
>
>
> I don't but - apparently - people who localize Drupal also choose to
> localize admin/* URLs. I think that, by now, we all agree on the fact
> that we can't make any assumption about how people use the URL alias
> functionality. Let's keep that fact in mind during the remainder of
> this discussion.
>
>>> 4. Stop doing SQL queries when you cached all possible URL aliases.
>>>
>>> * Advantages: transparent, no configuration required, can co-
>>> exist with (1), (2), (3) and (5).
>>>
>>> * Disadvantages: only works for a subset of all Drupal sites, not
>>> a solution for larger Drupal sites.
>>
>>
>> Also doesn't take into account the order that the page is built. If
>> you only
>> have 5 aliases, but they're all primary links, those are built rather
>> late (I
>> think?). So the system wouldn't finish loading all aliases until it was
>> nearly done with the page anyway.
>
>
> That is correct. It would only stop executing SQL queries once
> drupal_lookup_path()'s local cache is 'complete'. If this is a simple
> change (changing a couple lines of code), this might be well worth
> implementing. If it gets tricky, this optimization probably isn't
> worth it. It's worth investigating.
>
> In a default Drupal install there is exactly one URL alias:
>
> modules/system/system.install: $ret[] = update_sql("INSERT INTO
> {url_alias} (src, dst) VALUES ('node/feed', 'rss.xml')");
> modules/system/system.install: $ret[] = update_sql("INSERT INTO
> {url_alias} (src, dst) VALUES ('rss.xml', 'node/feed')");
>
> $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
>
> Combined with the above SQL query taken from drupal_lookup_path(), this
> means that the check "$count > 0" in drupal_lookup_path() will _always_
> fail, and that we always query the database for each URL. If $count ==
> 0, Drupal will not query the url_alias table. But unless you actually
> remove that one alias, you'll never take advantage of that check.
>
> As an extra optimization, I suggest that we get rid of that rss.xml URL
> alias, so that by default, Drupal doesn't generates hundreds of SQL
> queries, AND that for people who don't use path aliases,
> drupal_lookup_path() will never trigger any SQL queries. To get rid of
> the path alias, we could hardcode 'rss.xml' in the code (rather than
> much uglier 'node/feed') and extend the legacy.module to make the old
> URL work.
>
> It's also not clear why we insert two aliases ... looks like a bug in
> system.install.
>
>
> I'm wondering -- before we continue discussion this for hours and hours
> -- is someone willing to work on this? We've come at a point where
> some things could actually be implemented/fixed.
Please look at http://drupal.org/node/40860
Many of you have looked at it, but it's been awhile. I wrote this patch
six months ago, because I identified the url alias query as an obvious
optimization problem.
The patch I wrote for this is quite a bit simpler than what Dries is
proposing here, but I found that it worked extremely well, and really
reduced the url_alias queries that were being used. Most of the
url_alias queries were coming from the menu system (surprise).
In my patch, the system is vaguely heuristic -- it datestamps the alias
cache and when the cache is 'full' it removes items based on what hasn't
been checked recently.
All the patch needs is some code cleanup (there were some objections to
my use of a function reference, though I think it's perfectly valid, and
variable naming) and it needs to update the cache when url aliases are
actually changed in path.module. All of these are easy.
The hard part is probably updating the patch to HEAD, but even that may
not be all that bad, I've never tried.
More information about the development
mailing list