[development] No suitable nodes available at RackspaceCloud (Mosso)

Tomáš Fülöpp (vacilando.org) tomi at vacilando.org
Thu Feb 18 09:42:41 UTC 2010


Hi,

At RackspaceCloud (former Mosso) I've been plagued with a very unfortunate
problem that i crippling both my work and the work of my clients -- namely
the infamous error message "Unfortunately there were no suitable nodes
available to serve this request." Those of you at RS Cloud must have bumped
into it. It is cryptic and happens unpredictably. The cloud is very stable
and scalable, but for any a little bit heavier Drupal installation people do
start getting these errors.

*Basically, it is a generic error thrown by load balanced systems that
occurs as a result of a script exceeding a maximum timeout value (not the
PHP timeout value!) If a client connection does not receive a response from
the server after approximately 30 to 60 seconds the load balancer will close
the connection and the client will immediately receive the error message. In
most cases, the script will continue to execute until it reaches completion,
throws an error, or times out on the server, but the client will not see the
page load as expected and will instead receive this error.*

I've used Boost for anonymous pages, Parallel, Memcache, etc., all of which
helped and anonymous users *usually* don't get this error. The problem is
with admin or any other a bit heavier work of logged in users. Even for
basic Drupal websites with not too many modules! Pages like the list of
modules, or the status page, i.e. heavy database or file requests, or API
calls in PHP, are very likely to time out.

Over the past year I've had a number of discussions with techs and admins at
that cloud, but the situation is unresolved. They recognize the problem but
maintain this is due to the special/unusual setup they use for their cloud.
It is not a problem for some other CMS / frameworks. E.g. a very heavy
MediaWiki installation runs just fine. Drupal seems to be less compatible
with their system, somehow, somewhere.

*Now, why do I mention all this in the development list? I've been intrigued
by one little ray of hope in their words: "if a client connection does not
receive a response from the server after approximately 30 to 60 seconds the
load balancer will close the connection and the client will immediately
receive the error message". Their techs said if I were able to emit any kind
of intermediary response to the client during rendering of the page, then
this would be solved. *
Indeed, a bit like the Batch API works in Drupal (with that I often run
night-long scripts without problems). I wonder, maybe this is a more generic
problem for any system that employs load balancers?

*So my questions to you, colleagues, is -- do you see any place in Drupal
processing chain that could be used, and approximately how, to make sure
that the load balancer keeps the connection opened.* If you have any ideas,
wild or proven, I will be happy to test and develop them further and bring
them back to the community, of course. If this succeeds, I think many of us
will be relieved (and able to focus on development again!)

Thank you for any ideas - on and off this list.

Best regards,

Tomáš / Vacilando
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20100218/98a812eb/attachment.html 


More information about the development mailing list