[development] One-to-one tables considered harmful

Earl Miles merlin at logrus.com
Mon Jun 4 06:05:09 UTC 2007

David Strauss wrote:
> Earl Miles wrote:
>> David Strauss wrote:
>>> == What's bad about the current approach ==
>>> The arguments for splitting table into one-to-one relationships are:
>>> * Improving performance on systems that can only use table-level locking
>>> * Minimizing invalidations of the query cache
>> You *completely* missed the real argument:
>> Drupal is modular. Modules tend to have their own tables. Often these
>> are 1::1 tables.
> You misinterpreted my post. I said the 1::1 tables created by content
> types are fine. We're talking about core tables here, like
> {node_comment_statistics} that have columns often used in WHERE or ORDER
> BY criteria.

No, you misinterpreted mine. Drupal is modular. comment.module is a module. 
Now, it's a poor module -- it has some rather nasty integration with the rest 
of the site which, IMO, it should not. But it worries me that one of the 
solutions is to re-integrate rather than to make it properly modular.

>> I fail to understand how it is MySQL is unable to optimize for the 1::1
>> join case, especially when that 1::1 join is on the primary key. What am
>> I missing that makes this join worse than other basic joins?
> I also explained this in my original post. Indexes cannot span JOINs. If
> you don't understand precisely how databases use indexes to fulfill
> WHERE and ORDER BY criteria, I'm afraid a full explanation is beyond
> what I could include in an email. I'll try to give a brief one.
> But here's what happens when you have criteria spanning a JOIN:
> 1. The database tries to minimize the rows involved in the JOIN using as
> many criteria as it can.
> 2. The database performs a Cartesian JOIN on the remaining rows. This
> creates a giant temporary table on disk.
> 3. The database runs remaining restrictions on the results.
> 4. The database sorts the results. This uses a filesort.
> 5. The database returns a range of the sorted results.
> For a site like Drupal.org, the running time is about 30-45 seconds.
> Here's what happens when all criteria (WHERE and ORDER BY are in an
> index on one table):
> 1. The database uses the index to fully satisfy WHERE and ORDER BY
> criteria. This does not use a temp table or a filesort.
> 2. The database runs JOINs on other tables to collect data for remaining
> columns.
> 3. The results are already filtered and sorted, so the database returns
> a range of the results.

So to me, instead of merging comment data into the node table, it sounds like 
the node_comment_statistics table needs to reflect other data that is being 
filtered in some way. IMO, this would be preferable. Doing it any other way 
might end up creating a requirement to have comment.module on (or make a site 
very, very useless if comment.module is off).

More information about the development mailing list