On Sunday 03 June 2007, David Strauss wrote:
Scalability on "low-end" hosts is very important for a number of reasons.
1) Most sites run on them. How many web sites run on a shared host vs. a dedicated farm? I'd venture to say most sites, Drupal or otherwise, run on a shared host. That means we can't ignore that use case.
I disagree. If we optimize for large-scale use on InnoDB or PostgreSQL, most optimizations will still translate to small installations on inexpensive shared hosts.
In addition, most optimizations target scalability issues associated with large sets of data. Smaller sites simply don't encounter these issues because they're small.
2) That's where people start. If Drupal is slow and crappy unless you're using InnoDB, a dedicated server, and an opcode cache, then no one is going to give it a second thought. Most people new to Drupal will try it out first in a shared host, or a private dev box that no one bothered to optimize. If Drupal sucks in that case, then those people will never bother installing it on high-end servers with carefully-tuned databases.
Drupal won't suck in these cases, it just won't have fully realized performance, and the difference between ideal tuning and no tuning for a small installation is tiny. It's more important for Drupal to reach its performance peak with large installations than with small ones.
True story: When I was first looking for a framework or CMS, I tried Typo3 before I tried Drupal. I never actually got it installed because at the time just running the installer died on my system (a stock Debian Sid PHP configuration with no customization, at least at the time) because it hit the default PHP memory limit. Not knowing then what I do now about PHP configuration and optimization, my response was "wtf? What a memory hog, it can't even install on a default setup! Screw this, I'm trying Drupal."
That's a PHP issue, and PHP changes relatively less than databases as sites grow. Even huge sites run PHP with memory limits.
Yes it is. It's also an extreme case. I'm just pointing out that say "feh, small hosts" in general is going to bite us in the ass sooner or later.
Now, I can certainly accept the argument that we shouldn't try to bend over backward to get every last bit of performance out of MyISAM. There comes a point where a site really does need to have a dedicated box with hand-tuned databases. But that doesn't mean "don't care" about smaller hosts. The longer those small hosts hold out, the less expensive it is to run Drupal and the more people use it. (The menu handler split / split-mode-redux stuff I'm doing is aimed primarily at shared hosts, but will have an impact on larger sites, too.)
I think you're misinterpreting my suggestion. Everything in Drupal can be classified in some sort of big-O notation, like O(q*n^p + m) (and probably other variables), where n is the cardinality of the dataset and the other variables are constants. I'm arguing that we should care more about a smaller p than a smaller q. If q is too big, we certainly have to address that problem on big sites, but p dominates the running time. Splitting tables into 1::1 relationships is akin to optimizing q.
Even if we completely focus on big-site performance, small-site performance will generally continue to improve. Every big-site optimization I'm currently pushing would create, at worst, a negligible decline in performance on small sites.
Yeah, that's what we said about the 4.6->4.7 path alias change. :-)
When n is very small, m dominates performance considerations. Optimizing m is generally separate from optimizing p or q.
I think we're talking at cross purposes. :-) I'm not saying "feh, big sites", or that we shouldn't consider retuning how we structure the database schema. I'm saying that optimizing for big, dedicated hosts at the expense of the $20/month hosts is a losing proposition. It sounded like one could easily interpret your "screw table-level-locking setups" comment that way, which I believe would be a very bad way to go. I wasn't commenting on the specific changes you propose. -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson