[development] development with scalability in mind
hovercrafter at earthlink.net
Fri Dec 18 16:09:40 UTC 2009
A lot is going to depend on exactly what you are planning on doing with
the site. Will there be a lot of logged in users? How often will data be
changing? Are you going to have a lot of complex queries (ie: searches,
You said yesterday that the DB size would be about 20gb. Well that there
will present a performance hit alone, with tables not being able to
really fit into memory and means the database will take up a huge chunk
of that single server.
If you don't have many logged in users and the data isn't changing all
that much, then you *may* be able to get by using Boost. If you are
going to have a bunch of logged in users then I would seriously look at
using alternative caching like Cacherouter. With a 20gb database, the
more you can keep off of it the better.
If you got a lot of images and static content then I would seriously
look at pushing that off to a CDN to remove some of the burden on the
From a development stand point, the MySQL's slow query log is your
friend, plus the devel+performance logging module. Make sure none of
your common queries are doing nasty things like resorting to filesorts
on thousands of rows and that all your queries are indexed properly.
Also when dealing with caching be very careful. One thing I have seen a
lot of is people who do "on demand" refreshing of expired caches. What
happens is that they check the expiration or some other metric when the
cache is pulled and if it fails they run the query or code to regenerate
it. This is usually used on very server intensive queries. The problem
lies in this example.
You have a query that takes 4 seconds to run
- User A hits the site at 00:00:00.00 and the cache needs refreshed so
the query is run
- User B hits the site at 00:00:01.00. The query from user A is still
running so the cache is updated and user B doesn't know this, so the
query is running again.
On a high traffic site you can see how that will snowball into a bunch
of people running the same query. From a development stand point, it's
best to put these kind of routines into a cron job so the following happens:
- User A hits the site at 00:00:00.00 and the cache needs refreshed. You
have a special "cron" table in the DB and a record is written saying
that this item needs recomputed at 00:00:00.00 and User A is hit with
the stale data.
- User B hits the site at 00:00:01.00 and the cache is still expired.
The code checks for the record in that cron table and moves on, just
serving the stale cache data.
Running cache refreshes like this on cron removes the possibilities of
the queries being called multiple times.
On a cost comparison, sometimes two servers is cheaper than one. With
the size of your database and traffic predictions you will probably end
up having to dump a lot of extra hardware into that single server to
make one "super server", where as if you have one web server and one
database server you could possible get by with a medium or large server,
since each would be tuned specifically to their job.
On 12/18/2009 10:47 AM, Walt Daniels wrote:
> I listened to the presentation and found it interesting. It was missing some
> of what I wanted. My site is too big for shared hosting but cannot afford
> going beyond one dedicated machine. Clearly cache the hell out of everything
> is probably the best advice but perhaps there are other tweaks that should
> be looked at as well. A question I had submitted before the talk did not get
> covered. I would like to see a graph, perhaps a nomogram, of something like
> max hits per hour vs. appropriate technology (both hardware and software).
> -----Original Message-----
> From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
> On Behalf Of Kieran Lal
> Sent: Friday, December 18, 2009 12:09 AM
> To: development
> Subject: Re: [development] development with scalability in mind
> On Thu, Dec 17, 2009 at 7:10 PM, Susan Stewart
> <hedgemage at binaryredneck.net> wrote:
> > On 12/17/2009 09:31 AM, Kieran Lal wrote:
> >> A more appropriate approach for a site of that size is to build a
> >> cluster of servers in a high availability configuration which provides
> >> more flexibility to use various web scaling technologies. You'll see
> >> that's an approach taken with even moderately sized Drupal sites.
> >> I'll be covering all of this in quite a bit of detail in my
> >> presentation in 2.5 hours.
> > Unfortunately, I missed it due to a client meeting...is there a
> > transcript or recording of this anywhere?
> The recorded video will be posted here:
> Keep in mind this was a one hour introductory webinar covering
> scalability and performance for Drupal. I covered a lot of material
> quickly, and tried to touch on a lot of relevant performance and
> scalability technologies and techniques.
> > --Susan
> > --
> > "We all declare for liberty; but in using the same word we do not all mean
> the same thing. With some the word liberty may mean for each man to do as he
> pleases with himself, and the product of his labor; while with others, the
> same word may mean for some men to do as they please with other men, and the
> product of other men's labor. Here are two, not only different, but
> incompatible things, called by the same name - liberty. And it follows that
> each of the things is, by the respective parties, called by two different
> and incompatible names - liberty and tyranny."
> > --Abraham Lincoln
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.716 / Virus Database: 270.14.113/2573 - Release Date: 12/18/09
More information about the development