[development] Cpu usage: something is very wrong (caching).

Augustin (Beginner) drupal.beginner at wechange.org
Tue Aug 1 14:31:14 UTC 2006


Hello,

See below for some data from my live site.

I just reviewed the Caching, Caching, Caching thread, but I didn't see that it 
addressed an issue that I recently learned about and that I find somewhat 
shocking: the whole {cache} is being emptied whenever a node is created or 
edited, meaning that even a medium-traffic website with many nodes but only 
one or two added every day will see a very high cpu usage, because the 100s 
of nodes will always be purged from {cache} before Drupal has a chance to 
serve the cached page to a second visitor.

This approach defeats the whole caching purpose.

I may have missed other threads addressing the issues I mention below. Please 
accept my apologies if I did.



Below is the kind of cpu usage  I get for my sites.
I must give you some background information first, so that you can understand 
the figures.
I am hosted at a nice little hosting-cooperative (i.e. I am host, and hosted, 
I "own" part of the hosting coop -- en français: http://ouvaton.coop/).

I think there are about 5~6000 sites hosted, and I have two sites there. 
My sites are very low traffic (~20 pages views a day, only).

Almost all CMS are represented in those 4000 sites (phpBB, MediaWiki, phpNuke, 
you name it...). 
A large proportion of sites use Spip, though, which has an excellent on-file 
caching method (bypassing php and SQL).
Drupal in under-represented (I wonder if I am the only one using Drupal, 
there).

Now, I have an interesting bit of data in my panel, which is the CPU usage, 
the Bandwidth, and the number of hits IN PERCENTAGE of the total of all the 
sites co-hosted there.

What I found out is that since the beginning, my CPU usage is way above the 
average compared to the other sites.

Here is what I have today for one of my two sites (I have very similar figures 
for the other site):
(use fixed fonts)

---------------------------------------------
          |  For the week  | For the day    |
          |  rank  -  %    |  rank  -  %    |
--------------------------------------------|
cpu       | 127th - 0.153% |  38th - 0.425% |
hit       | 538th - 0.032% | 577th - 0.030% |
Bandwidth | 449th - 0.041% | 496th - 0.037% |
---------------------------------------------

I don't have any gallery nor much graphics, so it could have explained the low 
Bandwidth ranking, but it is in par with the Hit ranking. 
The CPU ranking however is constantly one order of magnitude above the average 
of the other 5-6000 sites.

What's worse, the CPU ranking is very low for the last day. There is not new 
content every day, and I observe such a spike each time content is added. 

Out of the 576 sites that have a HIGHER hit figure than my site, 539 sites 
needed LESS CPU power than I did!

Now, I understand why the cluster of high-end drupal.org servers is having 
troubles to keep up when a new node is created every couple of minutes!

I have observed that my own stats have been getting worse and worse as the 
number of nodes increases. At the beginning, with a dozen nodes or so, the 
CPU usage never got very high, because the ratio of pages served from the 
cache was very high. Now, with a meager ~150 nodes, I find out that my CPU 
usage never goes below a certain level because all nodes have to be 
recomputed every couple of days, whether they have changed or not. 


While I am looking forward to have more visitors, and especially more 
contributors, I shudder at the thought of the potential cost in CPU power.


The formidable strength of Drupal is its flexibility. Everything (everything!) 
can be customized, down to form elements, and link attributes! There is no 
HTML in core and everything is abstracted. 

Of course, it comes at a price, which is the price of CPU computing power. 
Instead of printing a link(<a href="">click</a>) directly, Drupal has to 
parse large arrays, check the hooks, etc... before printing this simple piece 
of html. Same for forms and everything else...

I would have thought that this was acceptable if the computing was done once 
and the cache served many times, if not from file, at least from the DB.


The algorithm can be improved in places, yes.
SQL can be streamlined, of course.
But all of this still require CPU crunching.

So, I would like to officially join the ranks behind Gerhard for an improved 
caching system. 
I also think we should seriously consider a file-caching mechanism (there was 
a patch proposed and used on a site, using mod_rewrite magic... we could 
think of a per-block file/DB caching for those blocks that are often updated 
--e.g. active forum topics-- while block rarely changing -- e.g. primary 
links, syndicate block, etc. -- can be hard coded within the cached page.)


The bottom line is that the cache should be more durable, and the ration 
page-served-from-cache/page-computed-on-the-fly should be much higher than 
what it is now.

What can I do?
I am not a coder as experienced as you are, so I don't really know where to 
start.
But if someone makes a start, I can follow the issue and test the patches.
Once a fairly complete and stable patch comes to light, I can test it on my 
live site, and see what difference it makes when comparing my own little 
Drupal site against the thousands of non-Drupal sites hosted at the same 
place.


Also, why not introduce some amount of caching (at least for some blocks, or 
the body of old nodes) for registered users?

yours,

Augustin.

P.S.
If you think your site's users would be interested, I wouldn't mind a 
charitable plugin to my site wechange.org :)



-- 
http://www.wechange.org/
Because we and the world need to change.
 
http://www.reuniting.info/
Intimate Relationships, peace and harmony in the couple.


More information about the development mailing list