[development] Caching, caching, caching...

Earl Miles merlin at logrus.com
Sat Jul 22 18:05:54 UTC 2006


Dries Buytaert wrote:
> 
> On 22 Jul 2006, at 19:32, Larry Garfield wrote:
> 
>> On Saturday 22 July 2006 10:30, Dries Buytaert wrote:
>>
>>> 1. Build a caching algorithm that uses an heuristic to pre-load
>>> frequently used URL aliases.
>>>
>>>     * Advantages: transparent, no configuration required
>>>
>>>     * Disadvantages: it a heuristic, we don't know how it would
>>> perform, it might be tricky to implement, and MySQL does this
>>> implicitly (but not as aggressive).
>>
>>
>> I really like LRU conceptually, but I don't know how we'd implement  
>> it.  If
>> done in the database, we'd have to write the last-access-time back  to 
>> the
>> database each time an alias is accessed, doubling the number of  queries
>> (unless someone know of a portable update-on-access field in SQL?).
> 
> 
> You'd only to that once in a while ... like, at most once every 10  
> minutes, you'd update the 'frequency counter' (or whatnot).
> 
>> Why would you need a textarea and regexes?  Just add a "pre-cache"  
>> checkbox to
>> the edit-alias screen.  Then the first time the alias lookup is  
>> called, it
>> does a quick "SELECT ... FROM ... WHERE precache=1".  That gets you  
>> what the
>> admin thinks is the most common aliases, and both the UI and code  
>> couldn't
>> get any simpler.
> 
> 
> That would solve the 'user complexity' problem.
> 
>> Disadvantage: That's assuming the admin has any idea what the most  
>> common
>> aliases are. :-)
> 
> 
> That might actually be a very fair assumption. :)
> 
>> On the subject of black-listing, though, does anyone ever alias a  
>> path that's
>> under admin/?  The biggest drain from the aliasing now that I see  is 
>> all of
>> the queries to look up paths that aren't aliased in the first place.
> 
> 
> I don't but - apparently - people who localize Drupal also choose to  
> localize admin/* URLs.  I think that, by now, we all agree on the  fact 
> that we can't make any assumption about how people use the URL  alias 
> functionality.  Let's keep that fact in mind during the  remainder of 
> this discussion.
> 
>>> 4. Stop doing SQL queries when you cached all possible URL aliases.
>>>
>>>     * Advantages: transparent, no configuration required, can co-
>>> exist with (1), (2), (3) and (5).
>>>
>>>     * Disadvantages: only works for a subset of all Drupal sites, not
>>> a solution for larger Drupal sites.
>>
>>
>> Also doesn't take into account the order that the page is built.   If 
>> you only
>> have 5 aliases, but they're all primary links, those are built  rather 
>> late (I
>> think?).  So the system wouldn't finish loading all aliases until  it was
>> nearly done with the page anyway.
> 
> 
> That is correct.  It would only stop executing SQL queries once  
> drupal_lookup_path()'s local cache is 'complete'.  If this is a  simple 
> change (changing a couple lines of code), this might be well  worth 
> implementing.  If it gets tricky, this optimization probably  isn't 
> worth it.  It's worth investigating.
> 
> In a default Drupal install there is exactly one URL alias:
> 
> modules/system/system.install:  $ret[] = update_sql("INSERT INTO  
> {url_alias} (src, dst) VALUES ('node/feed', 'rss.xml')");
> modules/system/system.install:  $ret[] = update_sql("INSERT INTO  
> {url_alias} (src, dst) VALUES ('rss.xml', 'node/feed')");
> 
> $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
> 
> Combined with the above SQL query taken from drupal_lookup_path(),  this 
> means that the check "$count > 0" in drupal_lookup_path() will  _always_ 
> fail, and that we always query the database for each URL.   If $count == 
> 0, Drupal will not query the url_alias table.  But  unless you actually 
> remove that one alias, you'll never take  advantage of that check.
> 
> As an extra optimization, I suggest that we get rid of that rss.xml  URL 
> alias, so that by default, Drupal doesn't generates hundreds of  SQL 
> queries, AND that for people who don't use path aliases,  
> drupal_lookup_path() will never trigger any SQL queries.  To get rid  of 
> the path alias, we could hardcode 'rss.xml' in the code (rather  than 
> much uglier 'node/feed') and extend the legacy.module to make  the old 
> URL work.
> 
> It's also not clear why we insert two aliases ... looks like a bug in  
> system.install.
> 
> 
> I'm wondering -- before we continue discussion this for hours and  hours 
> -- is someone willing to work on this?  We've come at a point  where 
> some things could actually be implemented/fixed.

Please look at http://drupal.org/node/40860

Many of you have looked at it, but it's been awhile. I wrote this patch 
six months ago, because I identified the url alias query as an obvious 
optimization problem.

The patch I wrote for this is quite a bit simpler than what Dries is 
proposing here, but I found that it worked extremely well, and really 
reduced the url_alias queries that were being used. Most of the 
url_alias queries were coming from the menu system (surprise).

In my patch, the system is vaguely heuristic -- it datestamps the alias 
cache and when the cache is 'full' it removes items based on what hasn't 
been checked recently.

All the patch needs is some code cleanup (there were some objections to 
my use of a function reference, though I think it's perfectly valid, and 
variable naming) and it needs to update the cache when url aliases are 
actually changed in path.module. All of these are easy.

The hard part is probably updating the patch to HEAD, but even that may 
not be all that bad, I've never tried.


More information about the development mailing list