[development] One-to-one tables considered harmful

Larry Garfield larry at garfieldtech.com
Mon Jun 4 00:20:45 UTC 2007


On Sunday 03 June 2007, David Strauss wrote:

> > Scalability on "low-end" hosts is very important for a number of reasons.
> >
> > 1) Most sites run on them.  How many web sites run on a shared host vs. a
> > dedicated farm?  I'd venture to say most sites, Drupal or otherwise, run
> > on a shared host.  That means we can't ignore that use case.
>
> I disagree. If we optimize for large-scale use on InnoDB or PostgreSQL,
> most optimizations will still translate to small installations on
> inexpensive shared hosts.
>
> In addition, most optimizations target scalability issues associated
> with large sets of data. Smaller sites simply don't encounter these
> issues because they're small.
>
> > 2) That's where people start.  If Drupal is slow and crappy unless you're
> > using InnoDB, a dedicated server, and an opcode cache, then no one is
> > going to give it a second thought.  Most people new to Drupal will try it
> > out first in a shared host, or a private dev box that no one bothered to
> > optimize.  If Drupal sucks in that case, then those people will never
> > bother installing it on high-end servers with carefully-tuned databases.
>
> Drupal won't suck in these cases, it just won't have fully realized
> performance, and the difference between ideal tuning and no tuning for a
> small installation is tiny. It's more important for Drupal to reach its
> performance peak with large installations than with small ones.
>
> > True story: When I was first looking for a framework or CMS, I tried
> > Typo3 before I tried Drupal.  I never actually got it installed because
> > at the time just running the installer died on my system (a stock Debian
> > Sid PHP configuration with no customization, at least at the time)
> > because it hit the default PHP memory limit.  Not knowing then what I do
> > now about PHP configuration and optimization, my response was "wtf?  What
> > a memory hog, it can't even install on a default setup!  Screw this, I'm
> > trying Drupal."
>
> That's a PHP issue, and PHP changes relatively less than databases as
> sites grow. Even huge sites run PHP with memory limits.

Yes it is.  It's also an extreme case.  I'm just pointing out that say "feh, 
small hosts" in general is going to bite us in the ass sooner or later.  

> > Now, I can certainly accept the argument that we shouldn't try to bend
> > over backward to get every last bit of performance out of MyISAM.  There
> > comes a point where a site really does need to have a dedicated box with
> > hand-tuned databases.  But that doesn't mean "don't care" about smaller
> > hosts.  The longer those small hosts hold out, the less expensive it is
> > to run Drupal and the more people use it.  (The menu handler split /
> > split-mode-redux stuff I'm doing is aimed primarily at shared hosts, but
> > will have an impact on larger sites, too.)
>
> I think you're misinterpreting my suggestion. Everything in Drupal can
> be classified in some sort of big-O notation, like O(q*n^p + m) (and
> probably other variables), where n is the cardinality of the dataset and
> the other variables are constants. I'm arguing that we should care more
> about a smaller p than a smaller q. If q is too big, we certainly have
> to address that problem on big sites, but p dominates the running time.
> Splitting tables into 1::1 relationships is akin to optimizing q.
>
> Even if we completely focus on big-site performance, small-site
> performance will generally continue to improve. Every big-site
> optimization I'm currently pushing would create, at worst, a negligible
> decline in performance on small sites. 

Yeah, that's what we said about the 4.6->4.7 path alias change. :-)

> When n is very small, m dominates 
> performance considerations. Optimizing m is generally separate from
> optimizing p or q.

I think we're talking at cross purposes. :-)  I'm not saying "feh, big sites", 
or that we shouldn't consider retuning how we structure the database schema.  
I'm saying that optimizing for big, dedicated hosts at the expense of the 
$20/month hosts is a losing proposition.  It sounded like one could easily 
interpret your "screw table-level-locking setups" comment that way, which I 
believe would be a very bad way to go.  I wasn't commenting on the specific 
changes you propose.

-- 
Larry Garfield			AIM: LOLG42
larry at garfieldtech.com		ICQ: 6817012

"If nature has made any one thing less susceptible than all others of 
exclusive property, it is the action of the thinking power called an idea, 
which an individual may exclusively possess as long as he keeps it to 
himself; but the moment it is divulged, it forces itself into the possession 
of every one, and the receiver cannot dispossess himself of it."  -- Thomas 
Jefferson


More information about the development mailing list