[development] Re: enterprise needs

Benson Wong mostlygeek at gmail.com
Mon Feb 27 18:45:58 UTC 2006


> I've never used mod_gzip for performance, mainly as a cost control tool.

Performance is essentially fast you can deliver the content to the
user, including making the transport of data more efficient.

>
> > 3. Get a faster DB server....

> A local database server is far faster in terms of throughput as long as
> it is not tight on memory, and has a sizable query cache.
>

What you save in network transport, you lose in sharing resources with
the web server. In a busy environment, more hardware = more
performance. In a small environment, debating this will be splitting
hairs.

> > 4. Get faster (or more) Web Servers. Maybe not the same specs as
> > above, but fast anyways.
> If you already have the PHP cache, and database server seperated, you should concentrate
> your resources on processors and memory, fast drives are less important

I think we made the same point...?

> You will also need to figure out how you are going to handle files.
> I've played with two approaches
> 1) NFS shared files directory...
> 2) external server for file store.(requires a good bit of coding but can
> be done)
>
> > 5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load
>
> I haven't tested drupal behind a true load balancer, but did have
> difficulties with sessions and authenticate behind a geographically
> distributed squid cluster... If anyone has had success with drupal
> behind squid please share that on drupal.org.
>
> Similar file issues apply.
>
> > 6. Do MySQL replication
>
> At present you can only use mysql replication as a hot standby, drupal
> does not support mysql replication at present because it does not
> differentiate between read and write queries.
>

Going to address all of these together. I make decisions based on the
fact that > I < will have to maintain all of it. So the simplier the
better. Basically I try to maximize my coffee time (low maintenance
system).

Once you start dabbling in MySQL replication, multiple web servers,
trying to sync/share your media (image, video, etc) files between web
servers, you're basically committing to a much more complex
architecture. That also means more resources, more people and more
money. It's not a small step.

Also note that Drupal's code wasn't design to scale with multiple
servers. This is not a bad thing, I prefer to keep less vertical
scaling (complexity) out of the code.

So I would build this, if I really wanted a _FAST_ architecture.
Haven't tested this in practice, so obviously, no real experience to
offer, just research and theory.

1. Get a HTTP load balance that sits in front of a couple of proxy
servers, either:
 a) squid proxy servers or
 b) light weight apache servers + mod_proxy

2. The proxy servers
Figures out where to send requests. Need to see how configurable these
are, what you want them to do is:
 a) Request for image/static/media files -> redirect request to
http://static.drupal.org
 b) Request for dynamic pages: node/32
  - spreads across any number of _READ ONLY_ web servers (horizontal scaling)
 c) Request for writes: node/add, node/32/edit, etc,
  - sends to READ/WRITE web server
 d) cron.php -> read/write web server,
 e) etc.

3. READ ONLY web servers
 - drupal configured with a _READ_ONLY_ mysql user, and pointing at
any number of replicated mysql slaves.

4. WRITE web server
 - drupal configured with a mysql user with INSERT/UPDATE/DELETE privileges

Some vertical scaling,
I would hack session.inc so that it will use a session database. I
would define a database somewhere just for holding session data, and
give the READ web servers INSERT,UPDATE,DELETE privileges to it.
Drupal already supports multiple databases, so at most,  < 20 lines of
code.

The key is duplicating types of servers. I would probably combine the
proxy servers and the web servers, and add more as I need them.
Keeping configurations sync'd is just sysadmin scripting.

Will have 2 database servers, master and a slave. Note that master and
slave _should_ be the same hardware. With MySQL replication, your
slaves run all the same write queries as your master, otherwise it may
lag and be really out of sync. (real world advice from Cal @ flickr).

All of the above stuff would apply to almost any LAMP web application.
It's really a huge jump in cost and complexity once you get to this
point. I wouldn't bother unless you need to handle 30 to 100 requests
/ second, and you have a dedicated 100Mb/sec pipe to your cage.


More information about the development mailing list