[development] One-to-one tables considered harmful

Sun Jun 3 23:15:19 UTC 2007

Larry Garfield wrote:
> On Sunday 03 June 2007, David Strauss wrote:
>> == What's bad about the current approach ==
> 
> *snip*
> 
>> First, I'd like to adopt a development philosophy on the Drupal project:
>> screw scalability if the database running Drupal doesn't support
>> row-level locking. Sites using only table-level locks are doomed to
>> scale poorly anyway because of the over-aggressive locking. We won't be
>> able to prevent that disaster with the tiny improvements in lock
>> granularity afforded by splitting tables into one-to-one table pairs.
> 
> *snip*
> 
>> If we rely on the query cache, we're also being hypocrites with our
>> stance on the first reason because many low-end hosts don't run the
>> query cache. Either we care about scalability on low-end configurations
>> or we don't. I'm suggesting "don't," but if people truly do want to care
>> about low-end scalability, we can't use the query cache argument.
> 
> Scalability on "low-end" hosts is very important for a number of reasons.  
> 
> 1) Most sites run on them.  How many web sites run on a shared host vs. a 
> dedicated farm?  I'd venture to say most sites, Drupal or otherwise, run on a 
> shared host.  That means we can't ignore that use case.

I disagree. If we optimize for large-scale use on InnoDB or PostgreSQL,
most optimizations will still translate to small installations on
inexpensive shared hosts.

In addition, most optimizations target scalability issues associated
with large sets of data. Smaller sites simply don't encounter these
issues because they're small.

> 2) That's where people start.  If Drupal is slow and crappy unless you're 
> using InnoDB, a dedicated server, and an opcode cache, then no one is going 
> to give it a second thought.  Most people new to Drupal will try it out first 
> in a shared host, or a private dev box that no one bothered to optimize.  If 
> Drupal sucks in that case, then those people will never bother installing it 
> on high-end servers with carefully-tuned databases.

Drupal won't suck in these cases, it just won't have fully realized
performance, and the difference between ideal tuning and no tuning for a
small installation is tiny. It's more important for Drupal to reach its
performance peak with large installations than with small ones.

> True story: When I was first looking for a framework or CMS, I tried Typo3 
> before I tried Drupal.  I never actually got it installed because at the time 
> just running the installer died on my system (a stock Debian Sid PHP 
> configuration with no customization, at least at the time) because it hit the 
> default PHP memory limit.  Not knowing then what I do now about PHP 
> configuration and optimization, my response was "wtf?  What a memory hog, it 
> can't even install on a default setup!  Screw this, I'm trying Drupal."  

That's a PHP issue, and PHP changes relatively less than databases as
sites grow. Even huge sites run PHP with memory limits.

> Now, I can certainly accept the argument that we shouldn't try to bend over 
> backward to get every last bit of performance out of MyISAM.  There comes a 
> point where a site really does need to have a dedicated box with hand-tuned 
> databases.  But that doesn't mean "don't care" about smaller hosts.  The 
> longer those small hosts hold out, the less expensive it is to run Drupal and 
> the more people use it.  (The menu handler split / split-mode-redux stuff I'm 
> doing is aimed primarily at shared hosts, but will have an impact on larger 
> sites, too.)

I think you're misinterpreting my suggestion. Everything in Drupal can
be classified in some sort of big-O notation, like O(q*n^p + m) (and
probably other variables), where n is the cardinality of the dataset and
the other variables are constants. I'm arguing that we should care more
about a smaller p than a smaller q. If q is too big, we certainly have
to address that problem on big sites, but p dominates the running time.
Splitting tables into 1::1 relationships is akin to optimizing q.

Even if we completely focus on big-site performance, small-site
performance will generally continue to improve. Every big-site
optimization I'm currently pushing would create, at worst, a negligible
decline in performance on small sites. When n is very small, m dominates
performance considerations. Optimizing m is generally separate from
optimizing p or q.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 186 bytes
Desc: OpenPGP digital signature
Url : http://lists.drupal.org/pipermail/development/attachments/20070603/e27bfacd/attachment.pgp