[development] development with scalability in mind

Jamie Holly hovercrafter at earthlink.net
Fri Dec 18 16:09:40 UTC 2009

A lot is going to depend on exactly what you are planning on doing with 
the site. Will there be a lot of logged in users? How often will data be 
changing? Are you going to have a lot of complex queries (ie: searches, 

You said yesterday that the DB size would be about 20gb. Well that there 
will present a performance hit alone, with tables not being able to 
really fit into memory and means the database will take up a huge chunk 
of that single server.

If you don't have many logged in users and the data isn't changing all 
that much,  then you *may* be able to get by using Boost. If you are 
going to have a bunch of logged in users then I would seriously look at 
using alternative caching like Cacherouter. With a 20gb database, the 
more you can keep off of it the better.

If you got a lot of images and static content then I would seriously 
look at pushing that off to a CDN to remove some of the burden on the 
server also.

 From a development stand point, the MySQL's slow query log is your 
friend, plus the devel+performance logging module. Make sure none of 
your common queries are doing nasty things like resorting to filesorts 
on thousands of rows and that all your queries are indexed properly.

Also when dealing with caching be very careful. One thing I have seen a 
lot of is people who do "on demand" refreshing of expired caches. What 
happens is that they check the expiration or some other metric when the 
cache is pulled and if it fails they run the query or code to regenerate 
it. This is usually used on very server intensive queries. The problem 
lies in this example.

You have a query that takes 4 seconds to run

- User A hits the site at 00:00:00.00 and the cache needs refreshed so 
the query is run

- User B hits the site at 00:00:01.00. The query from user A is still 
running so the cache is updated and user B doesn't know this, so the 
query is running again.

On a high traffic site you can see how that will snowball into a bunch 
of people running the same query. From a development stand point, it's 
best to put these kind of routines into a cron job so the following happens:

- User A hits the site at 00:00:00.00 and the cache needs refreshed. You 
have a special "cron" table in the DB and a record is written saying 
that this item needs recomputed at 00:00:00.00 and User A is hit with 
the stale data.

- User B hits the site at 00:00:01.00 and the cache is still expired. 
The code checks for the record in that cron table and moves on, just 
serving the stale cache data.

Running cache refreshes like this on cron removes the possibilities of 
the queries being called multiple times.

On a cost comparison, sometimes two servers is cheaper than one. With 
the size of your database and traffic predictions you will probably end 
up having to dump a lot of extra hardware into that single server to 
make one "super server", where as if you have one web server and one 
database server you could possible get by with a medium or large server, 
since each would be tuned specifically to their job.

Jamie Holly

On 12/18/2009 10:47 AM, Walt Daniels wrote:
> I listened to the presentation and found it interesting. It was missing some
> of what I wanted. My site is too big for shared hosting but cannot afford
> going beyond one dedicated machine. Clearly cache the hell out of everything
> is probably the best advice but perhaps there are other tweaks that should
> be looked at as well. A question I had submitted before the talk did not get
> covered. I would like to see a graph, perhaps a nomogram, of something like
> max hits per hour vs. appropriate technology (both hardware and software).
> -----Original Message-----
> From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
> On Behalf Of Kieran Lal
> Sent: Friday, December 18, 2009 12:09 AM
> To: development
> Subject: Re: [development] development with scalability in mind
> On Thu, Dec 17, 2009 at 7:10 PM, Susan Stewart
> <hedgemage at binaryredneck.net>  wrote:
> >  On 12/17/2009 09:31 AM, Kieran Lal wrote:
> >>
> >>  A more appropriate approach for a site of that size is to build a
> >>  cluster of servers in a high availability configuration which provides
> >>  more flexibility to use various web scaling technologies.  You'll see
> >>  that's an approach taken with even moderately sized Drupal sites.
> >>  I'll be covering all of this in quite a bit of detail in my
> >>  presentation in 2.5 hours.
> >
> >  Unfortunately, I missed it due to a client meeting...is there a
> >  transcript or recording of this anywhere?
> The recorded video will be posted here:
> http://acquia.com/community/resources/recorded_webinars
> Keep in mind this was a one hour introductory webinar covering
> scalability and performance for Drupal.  I covered a lot of material
> quickly, and tried to touch on a lot of relevant performance and
> scalability technologies and techniques.
> Cheers,
> Kieran
> >
> >  --Susan
> >
> >  --
> >  "We all declare for liberty; but in using the same word we do not all mean
> the same thing. With some the word liberty may mean for each man to do as he
> pleases with himself, and the product of his labor; while with others, the
> same word may mean for some men to do as they please with other men, and the
> product of other men's labor. Here are two, not only different, but
> incompatible things, called by the same name - liberty. And it follows that
> each of the things is, by the respective parties, called by two different
> and incompatible names - liberty and tyranny."
> >  --Abraham Lincoln
> >
> >
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.716 / Virus Database: 270.14.113/2573 - Release Date: 12/18/09
> 02:35:00

More information about the development mailing list