[development] One-to-one tables considered harmful

Walt Daniels wdlists at optonline.net
Sun Jun 3 22:25:48 UTC 2007


For a common example of a user noticeable slow query, go to modules and look
at all the contrib modules available for download. Or perhaps all the time
is formating that much information.

-----Original Message-----
From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
On Behalf Of David Strauss
Sent: Sunday, June 03, 2007 5:12 PM
To: development at drupal.org
Subject: Re: [development] One-to-one tables considered harmful

Earl Miles wrote:
> David Strauss wrote:
>> == What's bad about the current approach ==
>>
>> The arguments for splitting table into one-to-one relationships are:
>>
>> * Improving performance on systems that can only use table-level 
>> locking
>> * Minimizing invalidations of the query cache
> 
> You *completely* missed the real argument:
> 
> Drupal is modular. Modules tend to have their own tables. Often these 
> are 1::1 tables.

You misinterpreted my post. I said the 1::1 tables created by content types
are fine. We're talking about core tables here, like
{node_comment_statistics} that have columns often used in WHERE or ORDER BY
criteria.

> I fail to understand how it is MySQL is unable to optimize for the 
> 1::1 join case, especially when that 1::1 join is on the primary key. 
> What am I missing that makes this join worse than other basic joins?

I also explained this in my original post. Indexes cannot span JOINs. If you
don't understand precisely how databases use indexes to fulfill WHERE and
ORDER BY criteria, I'm afraid a full explanation is beyond what I could
include in an email. I'll try to give a brief one.

But here's what happens when you have criteria spanning a JOIN:
1. The database tries to minimize the rows involved in the JOIN using as
many criteria as it can.
2. The database performs a Cartesian JOIN on the remaining rows. This
creates a giant temporary table on disk.
3. The database runs remaining restrictions on the results.
4. The database sorts the results. This uses a filesort.
5. The database returns a range of the sorted results.

For a site like Drupal.org, the running time is about 30-45 seconds.

Here's what happens when all criteria (WHERE and ORDER BY are in an index on
one table):
1. The database uses the index to fully satisfy WHERE and ORDER BY criteria.
This does not use a temp table or a filesort.
2. The database runs JOINs on other tables to collect data for remaining
columns.
3. The results are already filtered and sorted, so the database returns a
range of the results.

For a site like Drupal.org, the running time is 30-45 milliseconds. The
difference is really that order of magnitude, and the difference only
increases as the site grows in size.




More information about the development mailing list