I've read at length the manual about this module but in order to mantain my site alive I need to understand better what the module does concretely, so this is a list of questions and a request for a few hints.
What I'd like to know is about the concrete difference between the different throttle levels. From what I read it seems that the throttle disables the blocks and module that I checked when it reaches the maximum throttle level (5). So what's the difference between the previous levels? The module does something else to optimize the site beside disabling those modules and blocks? And it disables all the modules all at once at level 5 or there's a progession regulated in some way?
My guess is that the throttle also enables and controls the use of chached pages even in the case the site is set to not use the cache, is this true? And if I've set the site to always use the cache by default the throttle is basically useless (aside when it reaches the level 5)?
It does something else behind the scenes like slowing down the php and mySQL requests?
I ask because my hosting company disabled my account for more than a day because Drupal made the whole server crash due to an high load. I cannot understand if it was because of an unrelated problem or because my site was linked on an extremely popular board (which was), bringing an insane amount of guests all at once. So I don't know if the crash was simply due to the load, or if the load produced an overflow in the php that made a process go crazy and use up all the resources till the crash.
I've attached to the mail a report from the server that the provider sent me, I'd be glad if someone can give it a glance and tell me what could have been the problem, if it was the load, if it was a bug.
For example: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 28678 admin137 35 10 11788 11M 4532 R N 43.1 1.1 0:00 0 /usr/bin/php 28682 admin137 33 10 6812 6808 4432 R N 38.3 0.6 0:00 0 /usr/bin/php
The mySQL is not mentioned but I read that in general an high load depends on it instead that the php itself. So these php processes I've pasted here include already the CPU usage of mySQL or the problem was exclusively in the php?
Someone can help me to figure out all this? I absolutely do not have bandwidth problems, it's all about the resources usage.
-HRose / Abalieno
What I'd like to know is about the concrete difference between the different throttle levels. From what I read it seems that the throttle disables the blocks and module that I checked when it reaches the maximum throttle level (5). So what's the difference between the previous levels? The module does something else to optimize the site beside disabling those modules and blocks? And it disables all the modules all at once at level 5 or there's a progession regulated in some way?
In 4.6, the throttle has been simplified. It only has two values: on and off.
In 4.5 and earlier, the different throttle levels were not really utilized. Level 5 is when blocks and modules would be auto-disabled.
My guess is that the throttle also enables and controls the use of chached pages even in the case the site is set to not use the cache, is this true? And if I've set the site to always use the cache by default the throttle is basically useless (aside when it reaches the level 5)?
No, the throttle does not affect the cache in any way.
It does something else behind the scenes like slowing down the php and mySQL requests?
No, it does not.
I ask because my hosting company disabled my account for more than a day because Drupal made the whole server crash due to an high load. I cannot understand if it was because of an unrelated problem or because my site was linked on an extremely popular board (which was), bringing an insane amount of guests all at once. So I don't know if the crash was simply due to the load, or if the load produced an overflow in the php that made a process go crazy and use up all the resources till the crash.
A link from a popular site can certainly cause a cpu spike. Enabling the cache can help quite a bit, though every time a user leaves a comment the cache is flushed so if you have active discussions the benefit of the cache can be minimal.
If a server crashed due to/in response to CPU load, it sounds like they've got some configuration issues that they need to address.
What modules do you have enabled? Different modules require different resources. If you're using contributed modules, this is an unknown, the module may or may not be resource intensive. I believe from core, the forum module is probably the most CPU intensive (but I'm basing this on a discussion I seem to remember from a couple of years ago, so I could be totally wrong...)
Any modules you don't actually use should be disabled. Any modules you don't absolutely need should be set to auto-throttle so they can be disabled when the site becomes busy.
Here are some tips that might help you to tune a pre-4.6 throttle module: http://drupal.org/node/4342
-Jeremy
In 4.6, the throttle has been simplified. It only has two values: on and off.
Bah! I'm tweaking the module to take advantage of all the levels and now you tell me that they are being wiped with the next version.
I can still use my modified module by replacing it when the new release will be out or there are other dependences?
Right now I've set the 'default_nodes_main' to adjust dynamically with the load so that at level 0 the mainpage will show 25 nodes, and progressing at -5 for each level till it reaches just 5 nodes at level 5. I believe this modification I did could improve dramatically the performance since I'm tuning dynamically how the main page is crowded.
No, the throttle does not affect the cache in any way.
It does something else behind the scenes like slowing down the php and mySQL requests?
No, it does not.
This is actually impossibe. Or there's something seriously wrong with drupal calculation. For my experiments I set the detection accuracy to 100% (so a check every hit) and the auto-throttle at 1. With these settings Drupal is supposed to go in emergency mode after five *refresh* of the page.
Even with my crappy connection I'm able to refresh an example page every 4-5 seconds and I kept doing that CONTINUOUSLY. My installation of Drupal has the cache disabled.
Every 15-20 refreshes of the page I kept seeing the throttle level *lowers* from 5 to 4. How is this possible if I was there refreshing continuously and if Drupal parses all my requests?
Something happens, my refreshes are counted till a point, then they go ignored.
Also: it could be useful to tell the throttle module to enable the caching in the case the site reaches the fifth level? It would be better than having the cache always disabled?
A link from a popular site can certainly cause a cpu spike. Enabling the cache can help quite a bit, though every time a user leaves a comment the cache is flushed so if you have active discussions the benefit of the cache can be minimal.
No, in this specific case it was just about peoples hitting a specific node all at once.
Any modules you don't actually use should be disabled. Any modules you don't absolutely need should be set to auto-throttle so they can be disabled when the site becomes busy.
But if I have the forum module enabled but noone is browsing the forum nor there are block related to it on the sitebar, it matters if it's active or not?
-HRose / Abalieno
I can still use my modified module by replacing it when the new release will be out or there are other dependences?
Possibly. It depends on how you've implemented it, and how many core modules you're willing to patch. Any core modules that use the throttle module would have to be updated to recognize multiple throttle levels
Right now I've set the 'default_nodes_main' to adjust dynamically with the load so that at level 0 the mainpage will show 25 nodes, and progressing at -5 for each level till it reaches just 5 nodes at level 5. I believe this modification I did could improve dramatically the performance since I'm tuning dynamically how the main page is crowded.
Yes, things like this benefit from having multiple throttle levels. But it was decided that the configuration and concepts were too difficult, thus it was simplified: http://lists.drupal.org/archives/drupal-devel/2004-08/msg00486.html
Here's a followup thread showing what other modules are affected: http://lists.drupal.org/archives/drupal-devel/2004-11/msg00375.html
And here's a thread discussing another simplification of the module (I believe in 4.6): http://lists.drupal.org/archives/drupal-devel/2004-10/msg00414.html
This is actually impossibe. Or there's something seriously wrong with drupal calculation. For my experiments I set the detection accuracy to 100% (so a check every hit) and the auto-throttle at
- With these settings Drupal is supposed to go in emergency mode
after five *refresh* of the page.
Even with my crappy connection I'm able to refresh an example page every 4-5 seconds and I kept doing that CONTINUOUSLY. My installation of Drupal has the cache disabled.
Every 15-20 refreshes of the page I kept seeing the throttle level *lowers* from 5 to 4. How is this possible if I was there refreshing continuously and if Drupal parses all my requests?
Something happens, my refreshes are counted till a point, then they go ignored.
This is by design, well, the old design. Essentially, at throttle level 5 the throttle tuned itself to no longer perform database queries. See here: http://lists.drupal.org/archives/drupal-devel/2003-12/msg00218.html
In any case, this was determined to be too complex, and is no longer true in 4.6+.
Also: it could be useful to tell the throttle module to enable the caching in the case the site reaches the fifth level? It would be better than having the cache always disabled?
Such applications are certainly possible, and have been discussed. However this is not currently done.
A link from a popular site can certainly cause a cpu spike. Enabling the cache can help quite a bit, though every time a user leaves a comment the cache is flushed so if you have active discussions the benefit of the cache can be minimal.
No, in this specific case it was just about peoples hitting a specific node all at once.
Once a page is cached, performance for anonymous guests should be significantly better. If you're making lots of modifications to the core code, this may no longer be true.
Any modules you don't actually use should be disabled. Any modules you don't absolutely need should be set to auto-throttle so they can be disabled when the site becomes busy.
But if I have the forum module enabled but noone is browsing the forum nor there are block related to it on the sitebar, it matters if it's active or not?
It should not cause undue overhead in this case. I just used it as an example...
-Jeremy
I can still use my modified module by replacing it when the new release will be out or there are other dependences?
Possibly. It depends on how you've implemented it, and how many core modules you're willing to patch. Any core modules that use the throttle module would have to be updated to recognize multiple throttle levels
Sigh.. I cannot. I cannot mantain an alternate version since I know zero of PHP.
But it would be possible to just modify the throttle.module to add more dynamic settings to it? My idea isn't to change every module, but just to add fuctions on top of what Drupal will do with the next version.
Yes, things like this benefit from having multiple throttle levels. But it was decided that the configuration and concepts were too difficult, thus it was simplified: http://lists.drupal.org/archives/drupal-devel/2004-08/msg00486.html
I rather dislike this continuous loss of features to add "simplification". Designing a good system doesn't mean removing its potential but just to better organize and show it. But this is a personal opinion that is not important.
The original patch suggested by Njivy was *wonderful* because it allowed to finally customize all the levels but then Dries asked to simply wipe what matters to remove directly *all* the strengths of the module. Great work.
You know what could work wonderfully? To still have each of the six levels (0-5) and then let the user to type directly the number of pages for each. This would allow to use all the five levels or just the first and the last. Depending on what the admin chooses.
Anyway, how does the new system work? I tried to follow your links but I didn't grasp on what the new switch is based. The current throttle checks the hits from time to time depending on how it's set. The link tells that the new module (4.6.0) doesn't use anymore the access log, so what does it use?
Side-question: why a spider bot is counted as "many different users" and not just as the same one requesting different pages? If I'm an anonymous user and if I keep refreshing the page I'm still counted as "1" in the "Who's online" block. So why Dries wants to base the throttle on the number of anonymous users and NOT on the number of pages served? I don't understand how it could work, that block doesn't seem consistent. If there's just one users that is savagely spamming my site the method Dries proposes won't detect that. I also do not understand what he writes in that link, it actually sounds like the OPPOSITE should happen. If the throttle checks how many pages are served in a minute it is supposed to easily locate the spam, while if it is based on the "Who's online" block it will simply ignore an emergency.
In the first case the site will resist, in the second case it will be swamped.
" The past months drupal.org often got hit by Evil Crawlers. Unlike Good Crawlers (i.e. Googlebot) they don't adapt their request rate to the sender, and request thousands of pages like there is no tomorrow. Clearly, they can put a lot of pressure on the server. "
That's why it *makes sense* to enable the throttle if the spider is putting stress on the site. Who cares if it's not a "normal" load to cause the switch? I care for the server, I want my site to survive. Why the throttle should let the spider do its work unaffected if that damages the server?
Something happens, my refreshes are counted till a point, then they go ignored.
This is by design, well, the old design. Essentially, at throttle level 5 the throttle tuned itself to no longer perform database queries. See here: http://lists.drupal.org/archives/drupal-devel/2003-12/msg00218.html
I've read that message but so how the throttle know when to switch back at level 4? If no more checks are performed why the site doesn't stay at level 5 forever once it reaches it?
It wouldn't be better to add a setting to the throttle module in order to flag the "emergency status" on a time-based period?
This is my proposal: Let the admin set the number of hits before the throttle switches on, then add a field where the admin can choose for how long the emergency module will remain active. No matter of usage. When this timeout is over (for example after ten minutes) a new check is performed to decide if there are the conditions to go back at the "normal" mode.
This would allow to completely remove the overhead because the checks are performed only between the timeouts.
Basing the throttle module on the "Who's online" block is NOT CONSISTENT.
In any case, this was determined to be too complex, and is no longer true in 4.6+.
Drupal 4.6.0 is due out next week. I guess the system should be already well tested and defined, or not?
Once a page is cached, performance for anonymous guests should be significantly better. If you're making lots of modifications to the core code, this may no longer be true.
Yes, but I don't want by default to always use the cache on my site. This is why I asked if it would be useful to dynamically switch it on and off by checking the throttle status. That's what I coded myself into the current module, I hope it will work:
if ($throttle_new < $throttle) { variable_set('throttle_level', $throttle - 1); variable_set('default_nodes_main', $safenet + 5); watchdog($type, t('Throttle: %hits hits in past minute; throttle decreased to level %level.', array('%hits' => "<em>$hits</em>", '%level' => '<em>'. ($throttle - 1) .'</em>'))); if ($throttle_new == 0) { variable_set('default_nodes_main', 25); variable_set('cache', 0); } }
if ($throttle_new > $throttle) { variable_set('throttle_level', $throttle + 1); variable_set('default_nodes_main', $safenet - 5); watchdog($type, t('Throttle: %hits hits in past minute; throttle increased to level %level.', array('%hits' => "<em>$hits</em>", '%level' => '<em>'. ($throttle + 1) .'</em>'))); if ($throttle_new == 5) { variable_set('default_nodes_main', 5); variable_set('cache', 1); } }
-HRose / Abalieno
But it would be possible to just modify the throttle.module to add more dynamic settings to it? My idea isn't to change every module, but just to add fuctions on top of what Drupal will do with the next version.
Go for it. If you feel you've made useful modifications, you can submit patches via Drupal's project tracker.
I rather dislike this continuous loss of features to add "simplification". Designing a good system doesn't mean removing its potential but just to better organize and show it. But this is a personal opinion that is not important.
The system has to be usable. This was a complaint of the previous throttle system, that it was too complicated for people to be able to understand/use it.
You know what could work wonderfully? To still have each of the six levels (0-5) and then let the user to type directly the number of pages for each. This would allow to use all the five levels or just the first and the last. Depending on what the admin chooses.
You are asking for added complexity, which makes it much more confusing to the end user. As the author of the throttle module, I have mixed feelings about this. My personally preference is for everything to be configurable. That is, until I have to support it. Then simplicity is preferable.
Anyway, how does the new system work? I tried to follow your links but I didn't grasp on what the new switch is based. The current throttle checks the hits from time to time depending on how it's set. The link tells that the new module (4.6.0) doesn't use anymore the access log, so what does it use?
Take a look at throttle_exit(). It counts the number of anonymous users, and the number of registered users that are currently online, using the sessions table. If the number of either is greater than the configured maximum, the throttle is enabled. If the number is lower than both, then it is disabled. This test is done for every page load.
Side-question: why a spider bot is counted as "many different users" and not just as the same one requesting different pages? If I'm an anonymous user and if I keep refreshing the page I'm still counted as "1" in the "Who's online" block. So why Dries wants to base the throttle on the number of anonymous users and NOT on the number of pages served? I don't understand how it could work, that block doesn't seem consistent. If there's just
Spiders generally use a large number of IP addresses, not just one IP address. The more aggressive the bot, generally the larger the number of IP addresses that are being used.
That's not say a spammer/spider couldn't use just one IP address.
one users that is savagely spamming my site the method Dries proposes won't detect that. I also do not understand what he writes in that link, it actually sounds like the OPPOSITE should happen. If the throttle checks how many pages are served in a minute it is supposed to easily locate the spam, while if it is based on the "Who's online" block it will simply ignore an emergency.
A normal Drupal user is only aware of the # that is displayed in the Who's online block. That's their view of what's happening on the site. Whether or not this is sufficient for the throttle is yet to be seen once 4.6 is released. In particular, Dries found that the old method was too difficult to configure and in practice didn't work for drupal.org. I have not heard any complaints of the new method, but we'll see how it goes once 4.6 is released. I suspect the change will be met favorably.
I've read that message but so how the throttle know when to switch back at level 4? If no more checks are performed why the site doesn't stay at level 5 forever once it reaches it?
In pre-4.6 it was a cron event. In 4.6+, the throttle does one or two queries per page load regardless of if the throttle is enabled or not. (The one or two is dependent on if the throttle is testing the number of users, the number of guests, or both.)
This is my proposal: Let the admin set the number of hits before the throttle switches on, then add a field where the admin can choose for how long the emergency module will remain active. No matter of usage. When this timeout is over (for example after ten minutes) a new check is performed to decide if there are the conditions to go back at the "normal" mode.
That's how the cron implementation worked. Look at throttle_cron() in the pre-4.6 version of the throttle module.
Basing the throttle module on the "Who's online" block is NOT CONSISTENT.
Writing in capital letters is annoying.
Yes, but I don't want by default to always use the cache on my site. This is why I asked if it would be useful to dynamically switch it on and off by checking the throttle status. That's what I coded myself into the current module, I hope it will work:
I can't imagine why you'd not want the cache enabled. However, it's simple enough to tie it to the throttle, as you've done.
-Jeremy