Scalability, Load Balancing, and High Availability - Possible SoC Proposal
Hi All, I've been thinking about a possible summer of code idea, but I'm not sure it will fly so I wanted to get some feedback. The basic idea is to look into solutions for running Drupal simultaneously across multiple servers for the purposes of running extremely large/busy sites, for load balancing on multiple servers (both database and web servers), or just for high availability (if one server dies, another will kick in). The project would consist of developing and documenting ways that this can be accomplished and developing tools or other Drupal patches/modules that are needed to accomplish the various tasks. One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much. Even if this won't fly as a SoC project I am still be interested on working on this. What's already been done in this area? Maybe some of this has already been solved? Thanks for any feedback, Scott
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Scott Hadfield schrieb:
Hi All,
I've been thinking about a possible summer of code idea, but I'm not sure it will fly so I wanted to get some feedback.
The basic idea is to look into solutions for running Drupal simultaneously across multiple servers for the purposes of running extremely large/busy sites, for load balancing on multiple servers (both database and web servers), or just for high availability (if one server dies, another will kick in). The project would consist of developing and documenting ways that this can be accomplished and developing tools or other Drupal patches/modules that are needed to accomplish the various tasks.
I don't think that Drupal needs to be patched for this (you might get better performance from some pending patches, but in principle Drupal will run just fine in such a scenario). To run Drupal - or any other script - in such a scenario is generally well understood but not well documented. redLED started something about this a while ago.
One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much.
I am afraid all you'd really do is to try things and to document them. As I understand SoC this does not qualify as a project since no code is produced.
Even if this won't fly as a SoC project I am still be interested on working on this. What's already been done in this area? Maybe some of this has already been solved?
As I said: Most of this has, but since it depends on your particular situation most solutions aren't really applicable to other people's. Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/6cefg6TFvELooQRAhpVAKCp4VHhXB4FiExTRL0LvttVnAAUewCdFP99 ebEkXZAvaHTCkzKWDPtKa68= =+AIB -----END PGP SIGNATURE-----
On 3/20/07, Gerhard Killesreiter <gerhard@killesreiter.de> wrote:
I don't think that Drupal needs to be patched for this (you might get better performance from some pending patches, but in principle Drupal will run just fine in such a scenario).
To run Drupal - or any other script - in such a scenario is generally well understood but not well documented. redLED started something about this a while ago.
Agreed. It's making choices in your architecture and scaling the LAMP stack. Some other ideas: * There is the memcached module that might go in a direction to work with code * status/performance/etc. monitoring of multiple Drupal sites through a dashboard * .... sorry, it's late and I'm out of ideas... -- Boris Mann Skype borismann http://www.bryght.com
Op dinsdag 20 maart 2007 10:19, schreef Gerhard Killesreiter:
One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much.
I am afraid all you'd really do is to try things and to document them. As I understand SoC this does not qualify as a project since no code is produced.
You could turn the documents into code, when you think of producing a Drupal installation profile, or even makefiles or vendor packages (.deb, .rpm etc) that contain all the required options, settigns and configurations. That would be a concrete result. However, maintainance of such packages or profiles is a lot of work, so you will need a plan for that too, when you choose this route. -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
I'm going to be doing a SoC application for a scripting tool to quickly build a Drupal virtual machine image given configuration parameters like type and config of web server, PHP, DB, Drupal config and enabled modules, db topology, and sample data and users to load. Robert Douglass spotted this as part of a performance and scalability testing environment: http://lists.drupal.org/pipermail/development/2007-March/023034.html Allister On 20/03/07, Bèr Kessels <ber@webschuur.com> wrote:
Op dinsdag 20 maart 2007 10:19, schreef Gerhard Killesreiter:
One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much.
I am afraid all you'd really do is to try things and to document them. As I understand SoC this does not qualify as a project since no code is produced.
You could turn the documents into code, when you think of producing a Drupal installation profile, or even makefiles or vendor packages (.deb, .rpm etc) that contain all the required options, settigns and configurations. That would be a concrete result. However, maintainance of such packages or profiles is a lot of work, so you will need a plan for that too, when you choose this route. -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
On 3/19/07, Scott Hadfield <hadsie@gmail.com> wrote:
Hi All,
I've been thinking about a possible summer of code idea, but I'm not sure it will fly so I wanted to get some feedback.
The basic idea is to look into solutions for running Drupal simultaneously across multiple servers for the purposes of running extremely large/busy sites, for load balancing on multiple servers (both database and web servers), or just for high availability (if one server dies, another will kick in). The project would consist of developing and documenting ways that this can be accomplished and developing tools or other Drupal patches/modules that are needed to accomplish the various tasks.
One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much.
Even if this won't fly as a SoC project I am still be interested on working on this. What's already been done in this area? Maybe some of this has already been solved?
I think this is a good idea for a project. Jeremy will be presenting many learnings from KernelTrap and the CivicSpace ASP at the Drupal Scalability workshop on Saturday. The Drupal community has learned a lot, in particular Gerhard has captured and tested significant performance and scalability improvements in building the Drupal.org infrastructure. Having an intern document all these learnings would be great asset for the community and help in the adoption of Drupal by businesses that ultimately ask the question, "Will Drupal scale?". CivicSpace has learned a lot about high availability and scalability in building our ASP and we'd be interested in sharing what we've learned, which we have documented fairly well. I'd be happy to make much of this available if we believed that work would receive good stewardship. Topics to consider: Scaling 1) Scaling web servers horizontally. Shared file systems for horizontally scaling web servers. 2) Scaling databases. 3) Performance tuning queries. Configuring databases for load characteristics. Managing IO bottlenecks. 4) Logging - managing logs for scalability and IO tuning High Availability 4) Network availability - Bonding, CARP 5) Database fail over - replication with master and slaves 6) Restoration - text and binary database restoration, remote recovery I don't think there is anything with wrong with doing lots of research and then writing small amounts of code for tools where appropriate. My suspicion is that a well prepared project would win wide support and Google will support a project that has support of the mentoring organizations. Cheers, Kieran Thanks for any feedback,
Scott
-- To strive, to seek, to find, and not to yield.
On 3/20/07, Scott Hadfield <hadsie@gmail.com> wrote:
Hi All,
I've been thinking about a possible summer of code idea, but I'm not sure it will fly so I wanted to get some feedback.
The basic idea is to look into solutions for running Drupal simultaneously across multiple servers for the purposes of running extremely large/busy sites, for load balancing on multiple servers (both database and web servers), or just for high availability (if one server dies, another will kick in). The project would consist of developing and documenting ways that this can be accomplished and developing tools or other Drupal patches/modules that are needed to accomplish the various tasks.
One of the main problems with this as a SoC proposal is that a large majority of what needs to be done here is OS/web server/database server specific. There would probably be a lot of Drupal specific stuff, but I'm not sure exactly how much.
Even if this won't fly as a SoC project I am still be interested on working on this. What's already been done in this area? Maybe some of this has already been solved?
Thanks for any feedback, Scott
Scott Drupal.org already runs on 2 web servers + 1 db server, so the underpinnings are there. Yes, it involves a lot of tuning of the LAMP stack. See here http://2bits.com/articles/drupal-performance-tuning-and-optimization-for-lar... However, it also involves finding and eliminating bottlenecks in modules (upcoming article in the above series). These are definitely code changes, and should be contributed as patches. Offending modules that do not scale well are locale and statistics in core, as well as many in contrib (e.g. gsitemap) I would be interested in mentoring such a project, although a multi server setup is not something I have available. I can provide limited access to a very large Drupal site though (half a million page views per day). -- 2bits.com http://2bits.com Drupal development, customization and consulting.
participants (7)
-
Allister Beharry -
Boris Mann -
Bèr Kessels -
Gerhard Killesreiter -
Khalid Baheyeldin -
Kieran Lal -
Scott Hadfield