[development] Caching, caching, caching...

Sat Jul 22 17:57:50 UTC 2006

On 22 Jul 2006, at 19:32, Larry Garfield wrote:

> On Saturday 22 July 2006 10:30, Dries Buytaert wrote:
>
>> 1. Build a caching algorithm that uses an heuristic to pre-load
>> frequently used URL aliases.
>>
>>     * Advantages: transparent, no configuration required
>>
>>     * Disadvantages: it a heuristic, we don't know how it would
>> perform, it might be tricky to implement, and MySQL does this
>> implicitly (but not as aggressive).
>
> I really like LRU conceptually, but I don't know how we'd implement  
> it.  If
> done in the database, we'd have to write the last-access-time back  
> to the
> database each time an alias is accessed, doubling the number of  
> queries
> (unless someone know of a portable update-on-access field in SQL?).

You'd only to that once in a while ... like, at most once every 10  
minutes, you'd update the 'frequency counter' (or whatnot).

> Why would you need a textarea and regexes?  Just add a "pre-cache"  
> checkbox to
> the edit-alias screen.  Then the first time the alias lookup is  
> called, it
> does a quick "SELECT ... FROM ... WHERE precache=1".  That gets you  
> what the
> admin thinks is the most common aliases, and both the UI and code  
> couldn't
> get any simpler.

That would solve the 'user complexity' problem.

> Disadvantage: That's assuming the admin has any idea what the most  
> common
> aliases are. :-)

That might actually be a very fair assumption. :)

> On the subject of black-listing, though, does anyone ever alias a  
> path that's
> under admin/?  The biggest drain from the aliasing now that I see  
> is all of
> the queries to look up paths that aren't aliased in the first place.

I don't but - apparently - people who localize Drupal also choose to  
localize admin/* URLs.  I think that, by now, we all agree on the  
fact that we can't make any assumption about how people use the URL  
alias functionality.  Let's keep that fact in mind during the  
remainder of this discussion.

>> 4. Stop doing SQL queries when you cached all possible URL aliases.
>>
>>     * Advantages: transparent, no configuration required, can co-
>> exist with (1), (2), (3) and (5).
>>
>>     * Disadvantages: only works for a subset of all Drupal sites, not
>> a solution for larger Drupal sites.
>
> Also doesn't take into account the order that the page is built.   
> If you only
> have 5 aliases, but they're all primary links, those are built  
> rather late (I
> think?).  So the system wouldn't finish loading all aliases until  
> it was
> nearly done with the page anyway.

That is correct.  It would only stop executing SQL queries once  
drupal_lookup_path()'s local cache is 'complete'.  If this is a  
simple change (changing a couple lines of code), this might be well  
worth implementing.  If it gets tricky, this optimization probably  
isn't worth it.  It's worth investigating.

In a default Drupal install there is exactly one URL alias:

modules/system/system.install:  $ret[] = update_sql("INSERT INTO  
{url_alias} (src, dst) VALUES ('node/feed', 'rss.xml')");
modules/system/system.install:  $ret[] = update_sql("INSERT INTO  
{url_alias} (src, dst) VALUES ('rss.xml', 'node/feed')");

$count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));

Combined with the above SQL query taken from drupal_lookup_path(),  
this means that the check "$count > 0" in drupal_lookup_path() will  
_always_ fail, and that we always query the database for each URL.   
If $count == 0, Drupal will not query the url_alias table.  But  
unless you actually remove that one alias, you'll never take  
advantage of that check.

As an extra optimization, I suggest that we get rid of that rss.xml  
URL alias, so that by default, Drupal doesn't generates hundreds of  
SQL queries, AND that for people who don't use path aliases,  
drupal_lookup_path() will never trigger any SQL queries.  To get rid  
of the path alias, we could hardcode 'rss.xml' in the code (rather  
than much uglier 'node/feed') and extend the legacy.module to make  
the old URL work.

It's also not clear why we insert two aliases ... looks like a bug in  
system.install.

I'm wondering -- before we continue discussion this for hours and  
hours -- is someone willing to work on this?  We've come at a point  
where some things could actually be implemented/fixed.

--
Dries Buytaert  ::  http://www.buytaert.net/