Just some thoughts based on the discussion so far, from the perpective of a large enterprise that is evaluating our future use of Drupal. For perspective, we run about 10 million page views per month across 30+ sites. - 4,6 v. 4.7. We use what is appropriate at the time our projects start, with an eye towards maintaining an upgraded path. In one case, that means we're running a security patched 4.5. In another, 4.6.5 (which is our current standard). While I was at OSCMS, I heard good, passionate reasons for upgrading to 4.7, but I can't justify it for two reasons: 1) It's still in 'beta' and I don't have the resources to help move it out of beta (more on that below); 2) We 'froze' our selection of contributed modules on November 15, 2005 for this project. The module compatibility issue would be largge if we moved to 4.7. - Drupal already _is_ ready for enterprise use: it just depends on your enterprise. Bryght, OurMedia, and NowPublic all come to mind as 'enterprise' Drupal companies. The issue (at least in my experience) is that those three are all 'new' ventures that used Drupal as the basline of their IT infrastructure. Integrating Drupal into an existing IT structure (including all the 'policies and procedures' that you currently have in place) can be quite daunting. - Commit to Drupal or not? This depends on the fundamental question: "How much time does Drupal save you versus 1) creating your own CMS; 2) using some other OSS or commercial CMS? This one, I think is a no-brainer, with one caveat, which is: - Commit to Drupal.org. After spending the week in Vancouver, I made the following report to our management team. If we are to go forward using Drupal, we need a dedicated support-and-development team of 3 people. (And my eyeball prediictions of such things are ususally pretty accurate.) The team lead will have, as one of his/her main responsibilities, the task of being our public face within the Drupal community. This includes going to events, encouraging contributions of code, and helping with documentation. This is crucial, because without being a valued member of the community, all you'll ever really have is your own Drupal fork to support. If your organization goes forward with the idea of contributing patches to core, helping document Drupal, and so forth, then you will have a voice in future Drupal development. /climbs off soapbox. All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.) And, wouldn't you know, solving the scalability problem conflicts with our current IT culture.
Ken Rickard wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
And, wouldn't you know, solving the scalability problem conflicts with our current IT culture.
That's a lot of great background on your decision-making process and needs. Please keep us updated as the decisions unfold. -Robert
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour. Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase. 1. Use a PHP cache: I found that using APC speeds up Drupal by a lot, 3 to 5 times the pages view per second. This was _literally_ a 5 minute install (on FreeBSD) for a 300% to 500% performance improvement. I think at that point it was my dev servers SATA HDDs were the bottle neck. It sits beside me and when I hit it with ab, I can hear the HDDs wrrrrr like crazy. 2. Use mod_gzip (or ob_compress or whatever it is in php, I prefer mod_gzip, or mod_deflate in Apache2) The benefits of this are amazing, considering the minimal effort it takes to implement. If doesn't matter if it takes Drupal 0.002 seconds to generate a 40K of html, if it takes like 1 to 2 seconds for a client to download it (more if using a modem). mod_gzip usually gives a 10% to 80% compression depending on the size of pages. Amazing results for 10 minutes of work. 3. Get a faster DB server. I'm thinking of 3x15K SCSI (raid 5), dual way xeon mysql server from freebsdsystems.com for my next installation. These things rip . Expensive (~$5K to $7K) but fast. An average Drupal dev charges like $100USD these days right? A super fast db server is still more bang for your buck than 50 to 70hrs of code performance tuning. 4. Get faster (or more) Web Servers. Maybe not the same specs as above, but fast anyways. 5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load 6. Do MySQL replication 7. Profile / tune Drupal's code (shudder). Ben.
Benson Please write this as a handbook page and link to it from the performance forum http://drupal.org/forum/49 On 2/26/06, Benson Wong <mostlygeek@gmail.com> wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase.
1. Use a PHP cache:
I found that using APC speeds up Drupal by a lot, 3 to 5 times the pages view per second. This was _literally_ a 5 minute install (on FreeBSD) for a 300% to 500% performance improvement. I think at that point it was my dev servers SATA HDDs were the bottle neck. It sits beside me and when I hit it with ab, I can hear the HDDs wrrrrr like crazy.
2. Use mod_gzip (or ob_compress or whatever it is in php, I prefer mod_gzip, or mod_deflate in Apache2)
The benefits of this are amazing, considering the minimal effort it takes to implement. If doesn't matter if it takes Drupal 0.002 seconds to generate a 40K of html, if it takes like 1 to 2 seconds for a client to download it (more if using a modem). mod_gzip usually gives a 10% to 80% compression depending on the size of pages. Amazing results for 10 minutes of work.
3. Get a faster DB server. I'm thinking of 3x15K SCSI (raid 5), dual way xeon mysql server from freebsdsystems.com for my next installation. These things rip . Expensive (~$5K to $7K) but fast. An average Drupal dev charges like $100USD these days right? A super fast db server is still more bang for your buck than 50 to 70hrs of code performance tuning.
4. Get faster (or more) Web Servers. Maybe not the same specs as above, but fast anyways.
5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load
6. Do MySQL replication
7. Profile / tune Drupal's code (shudder).
Ben.
6. Do MySQL replication
By this do you mean geographically seperate web servers?
7. Profile / tune Drupal's code (shudder).
Drupal's base is great. I work with "so called" enterprise software for a living and that code will make you run away screaming. The only thing Drupal lacks for enterprise acceptance is a marketing team and a price tag. Pat
pat@linuxcolumbus.com wrote:
The only thing Drupal lacks for enterprise acceptance is a marketing team and a price tag. Yes, marketing directed at enterprises would help, if its more enterprise users that Drupal wants. (IMHO, most won't contribute back to the community, Drupal service providers excepted).
I also work for a large enterprise (multinational investment bank), and when we selected a CMS last year, Drupal didn't make the short list. We ended up with a very expensive system because it offered the following features that either Drupal doesn't do, or maybe not to the level we needed for our (granted industry-specific) requirements. 1. True multi-language support. All content and interfaces in multiple languages, with workflow for helping translators. 2. Content staging and approval workflow. 3. Version control of everything (content, templates, images, etc.). We needed to be able to see what was on our site on a given day. 4. Multiple dev teams with their own dev servers, mastering their own content. 5. Multi-target publishing, with atomic copies and rollback. 6. LDAP authentication and roles-based authorisation, or integration with a product like Netegrity. What we got in the end was basically a pimped up rcs/rsync (except for the price tag of course!) I really like Drupal, and am quite happy that it doesn't have all of these features. I don't think they would apply to the majority of users. Cheers, Simon
On 26-Feb-06, at 5:54 PM, Simon Croome wrote:
pat@linuxcolumbus.com wrote:
The only thing Drupal lacks for enterprise acceptance is a marketing team and a price tag. Yes, marketing directed at enterprises would help, if its more enterprise users that Drupal wants. (IMHO, most won't contribute back to the community, Drupal service providers excepted).
Well...these enterprise users actually usually contribute indirectly, through one of two methods: 1. They work with Drupal consultants who get paid to modify/module/ whatever and themselves work contributing into their support/pricing structure 2. They commit to Drupal as a platform and get the most value from integrating/contributing directly There is a longer discussion about some of these issues (and educating enterprise on the values of "community ROI") on the consultant's mailing list.
I also work for a large enterprise (multinational investment bank), and when we selected a CMS last year, Drupal didn't make the short list. We ended up with a very expensive system because it offered the following features that either Drupal doesn't do, or maybe not to the level we needed for our (granted industry-specific) requirements.
Yep. There are lots of "high end" features for which Drupal is not necessarily the right choice...some notes on how this might be implemented, merely for interest's sake.
1. True multi-language support. All content and interfaces in multiple languages, with workflow for helping translators.
Work in progress. Having true multi-language support in a CMS that can be run in a shared hosting account would make Drupal truly shine.
2. Content staging and approval workflow.
Staging site with publish-subscribe. Approval puts content into a publish queue that can get pushed to 1 or more live sites.
3. Version control of everything (content, templates, images, etc.). We needed to be able to see what was on our site on a given day.
Education has some similar requirements, but more around archiving. Could be done with a non-Drupal solution -- i.e. SVN.
4. Multiple dev teams with their own dev servers, mastering their own content.
I like publish-subscribe for this. Yes, I need to invest some time/ money into feedback/funding for JVD to do more with this, or for some other people to dive in.
5. Multi-target publishing, with atomic copies and rollback.
Could also be pub-sub, and the rollback feature could potentially be added.
6. LDAP authentication and roles-based authorisation, or integration with a product like Netegrity.
LDAP "works today", although configuring it is non-trivial.
What we got in the end was basically a pimped up rcs/rsync (except for the price tag of course!)
I really like Drupal, and am quite happy that it doesn't have all of these features. I don't think they would apply to the majority of users.
Yep. All about picking the right tools for the job. -- Boris Mann Vancouver 778-896-2747 San Francisco 415-367-3595 SKYPE borismann http://www.bryght.com
On 27 Feb 2006, at 02:34, pat@linuxcolumbus.com wrote:
6. Do MySQL replication
By this do you mean geographically seperate web servers?
Not necessarily. You can use MySQL replication and clustering for (at least) two reasons: 1. Redundancy. You can use MySQL's replication functionality to have a backup database. It's a "hot spare" so it can take over instantly without downtime. No need to restore a backup from tape. Depending on the amount of "replication traffic" and your internet connection, different database servers could be in geographically separate locations. 2. Performance. You can use MySQL's replication functionality to scale your database layer. You can use it to distribute the workload among multiple database servers that are exact copies of one another (load balancing). Occasionally companies use geographically separate servers to improve performance; by bringing the data closer to the user's geographical location you can eliminate some network latency. -- Dries Buytaert :: http://www.buytaert.net/
2. Performance. You can use MySQL's replication functionality to scale your database layer. You can use it to distribute the workload among multiple database servers that are exact copies of one another (load balancing).
Now, MySQL 5 has NDB engine (mysql cluster) which I found promising by reading about it but have not tested it yet much less with Drupal. Anyone?
Karoly Negyesi wrote:
2. Performance. You can use MySQL's replication functionality to scale your database layer. You can use it to distribute the workload among multiple database servers that are exact copies of one another (load balancing).
Now, MySQL 5 has NDB engine (mysql cluster) which I found promising by reading about it but have not tested it yet much less with Drupal. Anyone?
Looks nice, but requires the entire DB to be held in RAM. Which rather limits its applications :) jh
On Mon, 2006-02-27 at 08:51 +0100, Dries Buytaert wrote:
On 27 Feb 2006, at 02:34, pat@linuxcolumbus.com wrote:
6. Do MySQL replication
By this do you mean geographically seperate web servers?
Not necessarily. You can use MySQL replication and clustering for (at least) two reasons:
1. Redundancy. You can use MySQL's replication functionality to have a backup database. It's a "hot spare" so it can take over instantly without downtime. No need to restore a backup from tape. Depending on the amount of "replication traffic" and your internet connection, different database servers could be in geographically separate locations.
2. Performance. You can use MySQL's replication functionality to scale your database layer. You can use it to distribute the workload among multiple database servers that are exact copies of one another (load balancing). Occasionally companies use geographically separate servers to improve performance; by bringing the data closer to the user's geographical location you can eliminate some network latency.
-- Dries Buytaert :: http://www.buytaert.net/
I will point to the fact that currently drupal will not support mysql's replication technology since we haven't implemented a concept of slave/master... Two quick ways I can think of adding it are... 1) further db abstraction ala, db_select, db_insert, db_update..... 2) and using a if (!strpos('SELECT', $query)) { db_set_active('master') } in db_query. with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer. .darrel.
On 27 Feb 2006, at 6:11 PM, Darrel O'Pry wrote:
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
How long before it makes sense to just use a third party db abstraction library ? I'm not hot on the idea of re-inventing the wheel yet again. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On Mon, 2006-02-27 at 18:25 +0200, Adrian Rossouw wrote:
On 27 Feb 2006, at 6:11 PM, Darrel O'Pry wrote:
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
How long before it makes sense to just use a third party db abstraction library ?
I'm not hot on the idea of re-inventing the wheel yet again.
Neither am I unless we can improve on the original. Do you have any that have been in mind that will work for drupal? I think db_rewrite_url will be hard to find an equivalent for but I'm sure we can hack that in... I'd ideally want support for 1) mysql 2) postgresql 3) oracle 4) db2 5) sqlite features I'd like to see are 1) replication/cluster support 2) query result caching 3) transaction support
2) query result caching
modules are welcome to use the cache_set/cache_get API for this. forum.module used to o this, but it was removed when node_access went in. this really makes sense only for expensive queries that aren't personalized and i can't think of any of those in core right now.
Moshe Weitzman wrote:
2) query result caching
modules are welcome to use the cache_set/cache_get API for this. forum.module used to o this, but it was removed when node_access went in. this really makes sense only for expensive queries that aren't personalized and i can't think of any of those in core right now.
The forum index page is both personalized and *massively* expensive. jh
On Mon, 2006-02-27 at 11:49 -0500, Moshe Weitzman wrote:
2) query result caching
modules are welcome to use the cache_set/cache_get API for this. forum.module used to o this, but it was removed when node_access went in. this really makes sense only for expensive queries that aren't personalized and i can't think of any of those in core right now.
I'm just thinking per run. I thought db_query already did that... But looking at the code in head it doesn't... I would at the least like to see some form of transaction support, and support for master/slave database clustering and replication. I should probably stand down from this discussion as won't be implementing anything related to the db abstraction until the file handling is is somewhat better than palatable.
On 2/27/06, Darrel O'Pry <dopry@thing.net> wrote:
On Mon, 2006-02-27 at 18:25 +0200, Adrian Rossouw wrote:
On 27 Feb 2006, at 6:11 PM, Darrel O'Pry wrote:
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
How long before it makes sense to just use a third party db abstraction library ?
I'm not hot on the idea of re-inventing the wheel yet again.
Neither am I unless we can improve on the original. Do you have any that have been in mind that will work for drupal?
I think db_rewrite_url will be hard to find an equivalent for but I'm sure we can hack that in... I'd ideally want support for
1) mysql 2) postgresql 3) oracle 4) db2 5) sqlite
features I'd like to see are 1) replication/cluster support 2) query result caching 3) transaction support
I think we are going overboard here. Support in core for all these bells and whistles is overkill. If this is pluggable, then I am not against it, but to be a standard feature, even for shared hosts, then I am. As for a db abstraction layer, I am conflicted on this. - It creates an external dependancy (e.g. PEAR DB) - We may be able to bundle it to get over this. - Remember the xmlrpc fiasco? We relied on something external, then had security issues, and re-wrote it inhouse. On the other hand: - Why reinvent the wheel? - NIH (Not invented here)? - We can do with better support for other databases, e.g. SQLite, better support for PostgreSQL, and whatever will happen because of the Inno/Sleepcat/Oracle thing.
Adrian Rossouw wrote:
On 27 Feb 2006, at 6:11 PM, Darrel O'Pry wrote:
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
How long before it makes sense to just use a third party db abstraction library ?
I'm not hot on the idea of re-inventing the wheel yet again.
-- Adrian Rossouw
I think that may be a tougher question than it appears on the surface. I would say we should use a third party db abstraction just as soon as one with a good fit and performance is written. :-) Right now, most db abstractions are written to be ultimately generic, so that as many people as possible might be able to use them. The number which were written intentionally to be lightweight and for web applications is very small. In the former category are things like ADOdb and PEAR DB. In the latter category is PHPLIB's. ADOdb and PEAR DB might have as many lines of code by themselves as all of Drupal core -- they're that big. None of them abstract the database completely, really. You still have to write database-specific SQL statements. A real abstraction would allow you to encapsulate the database objects as objects, and the code reading and writing the attributes of those objects would never need to use SQL. I'm not sure if we want to go that far, or not. (Incidentally, I'm writing such a "real" object abstraction for MySQL and PostgreSQL for my employer right now.) Are there any other db abstraction libraries out there we should look at? I certainly would be interested in taking a look at them myself. ..chris
On 27 Feb 2006, at 7:15 PM, Chris Johnson wrote:
Are there any other db abstraction libraries out there we should look at? I certainly would be interested in taking a look at them myself.
I'm not saying we have to have to have to use one. I'm just saying we should put some research time into the question. I _want_ something lightweight that doesn't try to abstract objects etc. That kind of abstraction will happen with the views query builder and the like, it shouldn't be part of the db api. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
On Mon, Feb 27, 2006 at 11:15:01AM -0600, Chris Johnson wrote:
None of them abstract the database completely, really. You still have to write database-specific SQL statements.
If abstraction libraries are being looked at, or the abstraction layer in Drupal will be refactored, I encourage folks to look at PEAR's new MDB2 pacakge. Very good abstraction, including table creation and alterations.
A real abstraction would allow you to encapsulate the database objects as objects,
Personally, I'm prefer directly writing SQL over data objects. --Dan -- T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y data intensive web and database programming http://www.AnalysisAndSolutions.com/ 4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
On Monday 27 February 2006 11:15, Chris Johnson wrote:
A real abstraction would allow you to encapsulate the database objects as objects, and the code reading and writing the attributes of those objects would never need to use SQL. I'm not sure if we want to go that far, or not. (Incidentally, I'm writing such a "real" object abstraction for MySQL and PostgreSQL for my employer right now.)
Are there any other db abstraction libraries out there we should look at? I certainly would be interested in taking a look at them myself.
Having written a Drupal-esque db abstraction layer at work, I found that abstracting insert, update, and delete statements is quite easy. They're all very regular, and in fact the abstracted form I had takes associative arrays which makes them far easier to work with, even without database abstraction. (I'm not sure if something similar would be of interest to Drupal, but I'd be happy to code it up once HEAD opens again.) Select statements are the tricky one, since they can get very complicated. I've not figured out how to properly abstract those. That's where the major DB libraries get so big, and where stuff like PEAR::DB::DataObjects becomes attractive. Has anyone found a good abstraction model for select statements that they like? -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
I will point to the fact that currently drupal will not support mysql's replication technology since we haven't implemented a concept of slave/master...
Two quick ways I can think of adding it are... 1) further db abstraction ala, db_select, db_insert, db_update..... 2) and using a if (!strpos('SELECT', $query)) { db_set_active('master') } in db_query.
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
Are you suggesting that we add a connection pooling mechanism to Drupal? I'm not convinced that is a good idea. There are both software and hardware load balancer that do exactly that. You open a MySQL connection to db-server.example.com (the load balancer) which map it onto db-server-1.example.com, db-server-2.example.com, etc. These load balancers can do 'health checks' to see if the database servers are still running, whether they are in a consistent state, whether they are properly replicated, etc. Good health checks can be complicated and therefore this could be tricky to implement in PHP ... -- Dries Buytaert :: http://www.buytaert.net/
On Mon, 2006-02-27 at 17:37 +0100, Dries Buytaert wrote:
I will point to the fact that currently drupal will not support mysql's replication technology since we haven't implemented a concept of slave/master...
Two quick ways I can think of adding it are... 1) further db abstraction ala, db_select, db_insert, db_update..... 2) and using a if (!strpos('SELECT', $query)) { db_set_active('master') } in db_query.
with 1) you may get slightly better performance since you aren't constantly parsing strings, but you start making some major changes to drupal's db_abstraction layer.
Are you suggesting that we add a connection pooling mechanism to Drupal? I'm not convinced that is a good idea. There are both software and hardware load balancer that do exactly that. You open a MySQL connection to db-server.example.com (the load balancer) which map it onto db-server-1.example.com, db-server-2.example.com, etc. These load balancers can do 'health checks' to see if the database servers are still running, whether they are in a consistent state, whether they are properly replicated, etc. Good health checks can be complicated and therefore this could be tricky to implement in PHP ...
-- Dries Buytaert :: http://www.buytaert.net/
In no way am I suggesting adding connection pooling. That's far outside of the scope of my suggestion... Mysql's replication only supports a single master/write server, so drupal needs to know to which server to query for write operations. I can implement my own mysql load balancing through sqlrelay, round-robin dns, or LVS. But none of them will get my UPDATE, INSERT, CREATE, ALTER, etc queries to my mysql master server. The db abstraction layer is where I've placed this logic in the past.
On Sun, 2006-02-26 at 20:34 -0500, pat@linuxcolumbus.com wrote:
6. Do MySQL replication
By this do you mean geographically seperate web servers?
http://dev.mysql.com/doc/refman/5.0/en/replication-intro.html Better response time for clients can be achieved by splitting the load for processing client queries between the master and slave servers. SELECT queries may be sent to the slave to reduce the query processing load of the master. Statements that modify data should still be sent to the master so that the master and slave do not get out of synchrony. This load-balancing strategy is effective if non-updating queries dominate, but that is the normal case. -- brian@brianpuccio.net GPG Key ID 0xBBD2401F
pat@linuxcolumbus.com wrote:
7. Profile / tune Drupal's code (shudder).
Drupal's base is great. I work with "so called" enterprise software for a living and that code will make you run away screaming.
The only thing Drupal lacks for enterprise acceptance is a marketing team and a price tag.
And about 100 database indexes. :) jh
On 27 Feb 2006, at 01:34, Benson Wong wrote:
1. Use a PHP cache:
I found that using APC speeds up Drupal by a lot, 3 to 5 times the pages view per second. This was _literally_ a 5 minute install (on FreeBSD) for a 300% to 500% performance improvement. I think at that point it was my dev servers SATA HDDs were the bottle neck. It sits beside me and when I hit it with ab, I can hear the HDDs wrrrrr like crazy.
Small nit: sounds like lack of memory could be the bottleneck, not necessarily your disks. With enough memory, both Drupal's source files and your database is kept in memory. -- Dries Buytaert :: http://www.buytaert.net/
Benson Wong wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase.
Nice summary. However, this isn't really Drupal specific, but applies to all PHP applications or even web apps. Cheers, Gerhard
On 2/27/06, Gerhard Killesreiter <gerhard@killesreiter.de> wrote:
Benson Wong wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase.
Nice summary. However, this isn't really Drupal specific, but applies to all PHP applications or even web apps.
It is called the law of diminishing returns. You spend 10 and get 100, then spend 20, and get 60, then spend 40 to get 30. Once you reach a certain point it is too expensive since you are not getting much in return.
On Sun, 2006-02-26 at 16:34 -0800, Benson Wong wrote:
All that said, we still haven't committed to Drupal long-term because for us there is big issue that hasn't been fully solved: Drupal scalability. I'm pretty confident that Drupal scales, but optimizing servers for Drupal is still a pretty rare skill. (And, yes, we are diving into the forums on the topic.)
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Here's my scaling tree. As you progress up the tree, you will find that time, money, maintenance, headaches will all increase.
1. Use a PHP cache:
I found that using APC speeds up Drupal by a lot, 3 to 5 times the pages view per second. This was _literally_ a 5 minute install (on FreeBSD) for a 300% to 500% performance improvement. I think at that point it was my dev servers SATA HDDs were the bottle neck. It sits beside me and when I hit it with ab, I can hear the HDDs wrrrrr like crazy.
This one works... really works... is a must for any busy drupal site.
2. Use mod_gzip (or ob_compress or whatever it is in php, I prefer mod_gzip, or mod_deflate in Apache2)
The benefits of this are amazing, considering the minimal effort it takes to implement. If doesn't matter if it takes Drupal 0.002 seconds to generate a 40K of html, if it takes like 1 to 2 seconds for a client to download it (more if using a modem). mod_gzip usually gives a 10% to 80% compression depending on the size of pages. Amazing results for 10 minutes of work.
I've never used mod_gzip for performance, mainly as a cost control tool.
3. Get a faster DB server. I'm thinking of 3x15K SCSI (raid 5), dual way xeon mysql server from freebsdsystems.com for my next installation. These things rip . Expensive (~$5K to $7K) but fast. An average Drupal dev charges like $100USD these days right? A super fast db server is still more bang for your buck than 50 to 70hrs of code performance tuning.
This will be most effective for large consistant throughput with multiple webservers. A local database server is far faster in terms of throughput as long as it is not tight on memory, and has a sizable query cache. That being said, if you are going to implement a seperate DB server it is best down over a private gigabit network.
4. Get faster (or more) Web Servers. Maybe not the same specs as above, but fast anyways. If you already have the PHP cache, and database server seperated, you should concentrate your resources on processors and memory, fast drives are less important at this juncture of the configuration since most everything will be cached...
You will also need to figure out how you are going to handle files. I've played with two approaches 1) NFS shared files directory... 2) external server for file store.(requires a good bit of coding but can be done)
5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load
I haven't tested drupal behind a true load balancer, but did have difficulties with sessions and authenticate behind a geographically distributed squid cluster... If anyone has had success with drupal behind squid please share that on drupal.org. Similar file issues apply.
6. Do MySQL replication
At present you can only use mysql replication as a hot standby, drupal does not support mysql replication at present because it does not differentiate between read and write queries.
7. Profile / tune Drupal's code (shudder).
I think the community as whole works at tuning the code, and caching the results of complex functions where possible. Removing/disabling all unused modules may help a teensy bit.
Ben.
memcache and sqlrelay are other possibly helpful tools in various configurations. .darrel.
6. Do MySQL replication
At present you can only use mysql replication as a hot standby, drupal does not support mysql replication at present because it does not differentiate between read and write queries.
Can't you use master-master configurations? Not sure if that will work properly (never played with it); I guess masters can be out of sync at times. I understood what you wrote in your previous e-mail now; we should separate read from write queries so we can support master-slave configurations (and get most out of database replication). -- Dries Buytaert :: http://www.buytaert.net/
On Mon, 2006-02-27 at 17:46 +0100, Dries Buytaert wrote:
6. Do MySQL replication
At present you can only use mysql replication as a hot standby, drupal does not support mysql replication at present because it does not differentiate between read and write queries.
Can't you use master-master configurations? Not sure if that will work properly (never played with it); I guess masters can be out of sync at times.
Master-Master replication in mysql is asking for trouble. You can try it, but it won't be pretty. Mysql 5's nbd clustering is promising, but is for in memory only databases... Hopefully that technology will be built on. Maybe one of the postgre folks can chime in on postgresql's clustering and replication technologies.
I understood what you wrote in your previous e-mail now; we should separate read from write queries so we can support master-slave configurations (and get most out of database replication).
-- Dries Buytaert :: http://www.buytaert.net/
Somewhere in the list of diminishing returns of optimizations to make are also these two items, which were mentioned in my Drupal Enterprise-wide session in Vancouver: 1. Tune your MySQL or PostgreSQL database. There are lots of knobs that can be twiddled, and just a couple of them can make a big difference. 2. Tune your operating system for the task at hand -- e.g. web server or database server, or some hybrid if on the unfortunate end. Yes, it is true these things are general to all kinds of applications, not just Drupal. Kieran recently posted a large PHP snippet that helped with tuning MySQL. It's highly worthwhile as a starting point; I've already used it to tweak my MySQL tuning for my non-Drupal database applications. ..chrisxj
On Mon, 2006-02-27 at 11:22 -0600, Chris Johnson wrote:
Somewhere in the list of diminishing returns of optimizations to make are also these two items, which were mentioned in my Drupal Enterprise-wide session in Vancouver:
1. Tune your MySQL or PostgreSQL database. There are lots of knobs that can be twiddled, and just a couple of them can make a big difference.
2. Tune your operating system for the task at hand -- e.g. web server or database server, or some hybrid if on the unfortunate end.
Yes, it is true these things are general to all kinds of applications, not just Drupal.
Kieran recently posted a large PHP snippet that helped with tuning MySQL. It's highly worthwhile as a starting point; I've already used it to tweak my MySQL tuning for my non-Drupal database applications.
..chrisxj
Ohh... I keep thinking wouldn't it be nice to have a Drupal mysql tuning module.
Chris Johnson wrote:
Kieran recently posted a large PHP snippet that helped with tuning MySQL. It's highly worthwhile as a starting point; I've already used it to tweak my MySQL tuning for my non-Drupal database applications.
+1 With a caveat regarding InnoDB use, it's a very nice all-on-one-page summary of what MySQL's up to. jh
I've never used mod_gzip for performance, mainly as a cost control tool.
Performance is essentially fast you can deliver the content to the user, including making the transport of data more efficient.
3. Get a faster DB server....
A local database server is far faster in terms of throughput as long as it is not tight on memory, and has a sizable query cache.
What you save in network transport, you lose in sharing resources with the web server. In a busy environment, more hardware = more performance. In a small environment, debating this will be splitting hairs.
4. Get faster (or more) Web Servers. Maybe not the same specs as above, but fast anyways. If you already have the PHP cache, and database server seperated, you should concentrate your resources on processors and memory, fast drives are less important
I think we made the same point...?
You will also need to figure out how you are going to handle files. I've played with two approaches 1) NFS shared files directory... 2) external server for file store.(requires a good bit of coding but can be done)
5. Get a load balancer, or a reverse HTTP proxy (squid) to distribute the load
I haven't tested drupal behind a true load balancer, but did have difficulties with sessions and authenticate behind a geographically distributed squid cluster... If anyone has had success with drupal behind squid please share that on drupal.org.
Similar file issues apply.
6. Do MySQL replication
At present you can only use mysql replication as a hot standby, drupal does not support mysql replication at present because it does not differentiate between read and write queries.
Going to address all of these together. I make decisions based on the fact that > I < will have to maintain all of it. So the simplier the better. Basically I try to maximize my coffee time (low maintenance system). Once you start dabbling in MySQL replication, multiple web servers, trying to sync/share your media (image, video, etc) files between web servers, you're basically committing to a much more complex architecture. That also means more resources, more people and more money. It's not a small step. Also note that Drupal's code wasn't design to scale with multiple servers. This is not a bad thing, I prefer to keep less vertical scaling (complexity) out of the code. So I would build this, if I really wanted a _FAST_ architecture. Haven't tested this in practice, so obviously, no real experience to offer, just research and theory. 1. Get a HTTP load balance that sits in front of a couple of proxy servers, either: a) squid proxy servers or b) light weight apache servers + mod_proxy 2. The proxy servers Figures out where to send requests. Need to see how configurable these are, what you want them to do is: a) Request for image/static/media files -> redirect request to http://static.drupal.org b) Request for dynamic pages: node/32 - spreads across any number of _READ ONLY_ web servers (horizontal scaling) c) Request for writes: node/add, node/32/edit, etc, - sends to READ/WRITE web server d) cron.php -> read/write web server, e) etc. 3. READ ONLY web servers - drupal configured with a _READ_ONLY_ mysql user, and pointing at any number of replicated mysql slaves. 4. WRITE web server - drupal configured with a mysql user with INSERT/UPDATE/DELETE privileges Some vertical scaling, I would hack session.inc so that it will use a session database. I would define a database somewhere just for holding session data, and give the READ web servers INSERT,UPDATE,DELETE privileges to it. Drupal already supports multiple databases, so at most, < 20 lines of code. The key is duplicating types of servers. I would probably combine the proxy servers and the web servers, and add more as I need them. Keeping configurations sync'd is just sysadmin scripting. Will have 2 database servers, master and a slave. Note that master and slave _should_ be the same hardware. With MySQL replication, your slaves run all the same write queries as your master, otherwise it may lag and be really out of sync. (real world advice from Cal @ flickr). All of the above stuff would apply to almost any LAMP web application. It's really a huge jump in cost and complexity once you get to this point. I wouldn't bother unless you need to handle 30 to 100 requests / second, and you have a dedicated 100Mb/sec pipe to your cage.
One thing that has helped us not worry about taking the next step (distributed webservers, etc.) is the use of a RAID with a RAM front end. We use an Apple Xserve RAID (~$7,000, http://www.apple.com/xserve/raid/). Advantages: - uses standard SATA drives so replacement drives are cheap (I mount them on the hotswappable drive carrier myself in about 5 minutes) - 512MB RAM in front of the drives, transparent to the server. Battery backup protects the data in the RAM and onboard drive write cache is off. What this means for our setup is that Drupal is nearly always doing I/O with RAM. Of course, this is dependent on your setup and use case.
On Mon, 2006-02-27 at 10:45 -0800, Benson Wong wrote:
I've never used mod_gzip for performance, mainly as a cost control tool.
Performance is essentially fast you can deliver the content to the user, including making the transport of data more efficient.
3. Get a faster DB server....
A local database server is far faster in terms of throughput as long as it is not tight on memory, and has a sizable query cache.
What you save in network transport, you lose in sharing resources with the web server. In a busy environment, more hardware = more performance. In a small environment, debating this will be splitting hairs.
Everyone in a large environment had to transition from a small environment. Not everyone has the resources to go from, oh I have everything on one machine, to I have a database cluster feeding my web cluster, with load balancer in front, proxied by a squid cluster. And not all web sites and communities, explode, some just grow gradually and consistently. The normal requirements I try to meet during a cluster build out is to maintain active failover capabilities, and at least minimal load sharing capabilities. That said my transitions normally go something like... 1) stand alone server working all by itself... - gotta start somewhere 2) 2 servers... 1 db master/http, 1 db slave/http - provides a failover HA cluster, that can provide performance gains through RR dns. You will need nfs or some other solution to keep file systems in sync at this juncture. (I prefer nfs and a rsync backup). 3) 3 servers... 1) db master, 2 http/db slaves - Here you split all your write access db, 4) 4 servers... at this juncture I would suggest playing with different configurations and technologies... completely seperate db cluster, load balancing, proxies, seperate static server, etc. I haven't really had to progress past 3, without getting the budget I needed to get 10 servers in the mix and play... There are also different ideas some people just expand on the 2 servers idea adding more servers as db slave/http servers... Its easy to replicate. A monkey can setup system imager, and you can have an easily scalable cluster until you get to the point that you have to seperate services to get any more effectiveness out of it, and the administrative issues are easy... Oh something is broken, re image it. oh the master is dead... ok well re-ip slave one(hearbeat and mon can automate this)... There are administrative, economical, and technical factors which must all be satisfied which will probably be unique for each individual build out. My experience with HA build outs and web clustering is a bit more general than drupal. I'd like to hear from more drupalites who have built out large clusters about their progression and growth paths... I think it is something that can benefit a lot of people.
On Feb 26, 2006, at 4:34 PM, Benson Wong wrote:
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Benson could you review this Drupal performance tuning page and provide me with feedback. http://drupal.org/node/2601 Cheers, Kieran
On Mon, 2006-02-27 at 09:42 -0800, Kieran Lal wrote:
On Feb 26, 2006, at 4:34 PM, Benson Wong wrote:
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Benson could you review this Drupal performance tuning page and provide me with feedback.
Cheers, Kieran
On the mysql page(http://drupal.org/node/51263) I'd drop the pconnect, especially now that locks are being used in some places. I'd also advise people to increase their mysql's max clients appropriately if they are going to use pconnect... Suggested reading... http://www.php.net/manual/en/features.persistent-connections.php This all amounts to: Use pconnect with caution...
Potential issue with pconnect for those who want to use it: the last sequence inserted may not be correct. Check this thread http://lists.drupal.org/pipermail/development/2005-December/thread.html#1209... On 2/27/06, Darrel O'Pry <dopry@thing.net> wrote:
On Mon, 2006-02-27 at 09:42 -0800, Kieran Lal wrote:
On Feb 26, 2006, at 4:34 PM, Benson Wong wrote:
I think I blabbed about this at OSCMS in Vancouver, at Drupal for the Enterprise discussion. My philosophy go for the lowest hanging fruit, and work your way up the tree. You can probably quadrupal Drupal's pages per second in under a half hour.
Benson could you review this Drupal performance tuning page and provide me with feedback.
Cheers, Kieran
On the mysql page(http://drupal.org/node/51263) I'd drop the pconnect, especially now that locks are being used in some places. I'd also advise people to increase their mysql's max clients appropriately if they are going to use pconnect...
Suggested reading... http://www.php.net/manual/en/features.persistent-connections.php
This all amounts to: Use pconnect with caution...
Khalid B wrote:
Potential issue with pconnect for those who want to use it: the last sequence inserted may not be correct.
Check this thread
http://lists.drupal.org/pipermail/development/2005-December/thread.html#1209...
I have to say, having tried it on MBR.org in an extreme-load environment, it really doesn't make much difference either way. Query and index optimisation (or rather, our severe lack thereof) renders the whole issue moot for now. jh
I am trying to get the pay pal module to use tax. It does not appear to be passing the tax to paypal even though I have defined 7% tax for Canada. Does anyone know how to get the tax working or should I hack it :)
Sorry for hijacking the last thread! My mistake. I am trying to get the pay pal module to use tax. It does not appear to be passing the tax to paypal even though I have defined 7% tax for Canada. Does anyone know how to get the tax working or should I hack it :)
stuff@trackingsolutions.ca wrote:
Sorry for hijacking the last thread! My mistake.
I am trying to get the pay pal module to use tax. It does not appear to be passing the tax to paypal even though I have defined 7% tax for Canada. Does anyone know how to get the tax working or should I hack it :)
This is not a support forum. Please use the drupal.org forums. jh
It is development if I see code like: 'item_name' => $item_name, 'item_number' => $txnid, 'amount' => $txn->subtotal, 'shipping' => $txn->shipping_cost, 'no_shipping' => 1, 'return' => $return_url, 'cancel' => $cancel_url, 'currency_code' => variable_get('paypal_currency_code', 'USD') Should there not be an entry like: 'tax' =>$txn->tax, Or should I go and ask this in support where people that have read the manual can answer me? Is this not a developer question? Will I get one straight answer here? Or just deferals? Why do most of my questions get pushed by some quick answer? Thanks On February 27, 2006 11:44 am, John Handelaar wrote:
stuff@trackingsolutions.ca wrote:
Sorry for hijacking the last thread! My mistake.
I am trying to get the pay pal module to use tax. It does not appear to be passing the tax to paypal even though I have defined 7% tax for Canada. Does anyone know how to get the tax working or should I hack it :)
This is not a support forum.
Please use the drupal.org forums.
jh
stuff@trackingsolutions.ca wrote:
Is this not a developer question?
No, it's a support issue, or it's a bug report which you should file in the bug system at drupal.org under the module concerned.
Will I get one straight answer here? Or just deferals? Why do most of my questions get pushed by some quick answer?
Because you refuse to do as you're advised. jh
Thanks for all your help John. On February 27, 2006 12:00 pm, John Handelaar wrote:
stuff@trackingsolutions.ca wrote:
Is this not a developer question?
No, it's a support issue, or it's a bug report which you should file in the bug system at drupal.org under the module concerned.
Will I get one straight answer here? Or just deferals? Why do most of my questions get pushed by some quick answer?
Because you refuse to do as you're advised.
jh
Hi, On Mon, 2006-02-27 at 11:51 -0700, stuff@trackingsolutions.ca wrote:
It is development if I see code like:
'item_name' => $item_name, 'item_number' => $txnid, 'amount' => $txn->subtotal, 'shipping' => $txn->shipping_cost, 'no_shipping' => 1, 'return' => $return_url, 'cancel' => $cancel_url, 'currency_code' => variable_get('paypal_currency_code', 'USD')
Should there not be an entry like:
'tax' =>$txn->tax,
This does not exist in 4.6, and is not going to exist in 4.7. The problem is that tax just changes gross, so it really can't be tracked. what is going to happen in 4.7 is there is going to be an area for miscellaneous transactions so that any kind or additional transaction can be tracked. This will help with the problem of passing tax to paypal.
Or should I go and ask this in support where people that have read the manual can answer me?
Is this not a developer question?
This is not the area to be asking specific E-Commerce questions.
Will I get one straight answer here? Or just deferals? Why do most of my questions get pushed by some quick answer?
The best place is most likely in the E-Commerce project. There are a lot of people who are following development and helping other people. Gordon.
Thanks Gordon, This helps a lot. Thanks On February 27, 2006 02:34 pm, Gordon Heydon wrote:
Hi,
On Mon, 2006-02-27 at 11:51 -0700, stuff@trackingsolutions.ca wrote:
It is development if I see code like:
'item_name' => $item_name, 'item_number' => $txnid, 'amount' => $txn->subtotal, 'shipping' => $txn->shipping_cost, 'no_shipping' => 1, 'return' => $return_url, 'cancel' => $cancel_url, 'currency_code' => variable_get('paypal_currency_code', 'USD')
Should there not be an entry like:
'tax' =>$txn->tax,
This does not exist in 4.6, and is not going to exist in 4.7. The problem is that tax just changes gross, so it really can't be tracked.
what is going to happen in 4.7 is there is going to be an area for miscellaneous transactions so that any kind or additional transaction can be tracked.
This will help with the problem of passing tax to paypal.
Or should I go and ask this in support where people that have read the manual can answer me?
Is this not a developer question?
This is not the area to be asking specific E-Commerce questions.
Will I get one straight answer here? Or just deferals? Why do most of my questions get pushed by some quick answer?
The best place is most likely in the E-Commerce project. There are a lot of people who are following development and helping other people.
Gordon.
participants (22)
-
Adrian Rossouw -
Benson Wong -
Boris Mann -
Brian Puccio -
Chris Johnson -
Daniel Convissor -
Darrel O'Pry -
Dries Buytaert -
Gerhard Killesreiter -
Gordon Heydon -
John Handelaar -
John VanDyk -
Karoly Negyesi -
Ken Rickard -
Khalid B -
Kieran Lal -
Larry Garfield -
Moshe Weitzman -
pat@linuxcolumbus.com -
Robert Douglass -
Simon Croome -
stuff@trackingsolutions.ca