[development] Cpu usage: something is very wrong (caching).
Augustin (Beginner)
drupal.beginner at wechange.org
Tue Aug 1 14:31:14 UTC 2006
Hello,
See below for some data from my live site.
I just reviewed the Caching, Caching, Caching thread, but I didn't see that it
addressed an issue that I recently learned about and that I find somewhat
shocking: the whole {cache} is being emptied whenever a node is created or
edited, meaning that even a medium-traffic website with many nodes but only
one or two added every day will see a very high cpu usage, because the 100s
of nodes will always be purged from {cache} before Drupal has a chance to
serve the cached page to a second visitor.
This approach defeats the whole caching purpose.
I may have missed other threads addressing the issues I mention below. Please
accept my apologies if I did.
Below is the kind of cpu usage I get for my sites.
I must give you some background information first, so that you can understand
the figures.
I am hosted at a nice little hosting-cooperative (i.e. I am host, and hosted,
I "own" part of the hosting coop -- en français: http://ouvaton.coop/).
I think there are about 5~6000 sites hosted, and I have two sites there.
My sites are very low traffic (~20 pages views a day, only).
Almost all CMS are represented in those 4000 sites (phpBB, MediaWiki, phpNuke,
you name it...).
A large proportion of sites use Spip, though, which has an excellent on-file
caching method (bypassing php and SQL).
Drupal in under-represented (I wonder if I am the only one using Drupal,
there).
Now, I have an interesting bit of data in my panel, which is the CPU usage,
the Bandwidth, and the number of hits IN PERCENTAGE of the total of all the
sites co-hosted there.
What I found out is that since the beginning, my CPU usage is way above the
average compared to the other sites.
Here is what I have today for one of my two sites (I have very similar figures
for the other site):
(use fixed fonts)
---------------------------------------------
| For the week | For the day |
| rank - % | rank - % |
--------------------------------------------|
cpu | 127th - 0.153% | 38th - 0.425% |
hit | 538th - 0.032% | 577th - 0.030% |
Bandwidth | 449th - 0.041% | 496th - 0.037% |
---------------------------------------------
I don't have any gallery nor much graphics, so it could have explained the low
Bandwidth ranking, but it is in par with the Hit ranking.
The CPU ranking however is constantly one order of magnitude above the average
of the other 5-6000 sites.
What's worse, the CPU ranking is very low for the last day. There is not new
content every day, and I observe such a spike each time content is added.
Out of the 576 sites that have a HIGHER hit figure than my site, 539 sites
needed LESS CPU power than I did!
Now, I understand why the cluster of high-end drupal.org servers is having
troubles to keep up when a new node is created every couple of minutes!
I have observed that my own stats have been getting worse and worse as the
number of nodes increases. At the beginning, with a dozen nodes or so, the
CPU usage never got very high, because the ratio of pages served from the
cache was very high. Now, with a meager ~150 nodes, I find out that my CPU
usage never goes below a certain level because all nodes have to be
recomputed every couple of days, whether they have changed or not.
While I am looking forward to have more visitors, and especially more
contributors, I shudder at the thought of the potential cost in CPU power.
The formidable strength of Drupal is its flexibility. Everything (everything!)
can be customized, down to form elements, and link attributes! There is no
HTML in core and everything is abstracted.
Of course, it comes at a price, which is the price of CPU computing power.
Instead of printing a link(<a href="">click</a>) directly, Drupal has to
parse large arrays, check the hooks, etc... before printing this simple piece
of html. Same for forms and everything else...
I would have thought that this was acceptable if the computing was done once
and the cache served many times, if not from file, at least from the DB.
The algorithm can be improved in places, yes.
SQL can be streamlined, of course.
But all of this still require CPU crunching.
So, I would like to officially join the ranks behind Gerhard for an improved
caching system.
I also think we should seriously consider a file-caching mechanism (there was
a patch proposed and used on a site, using mod_rewrite magic... we could
think of a per-block file/DB caching for those blocks that are often updated
--e.g. active forum topics-- while block rarely changing -- e.g. primary
links, syndicate block, etc. -- can be hard coded within the cached page.)
The bottom line is that the cache should be more durable, and the ration
page-served-from-cache/page-computed-on-the-fly should be much higher than
what it is now.
What can I do?
I am not a coder as experienced as you are, so I don't really know where to
start.
But if someone makes a start, I can follow the issue and test the patches.
Once a fairly complete and stable patch comes to light, I can test it on my
live site, and see what difference it makes when comparing my own little
Drupal site against the thousands of non-Drupal sites hosted at the same
place.
Also, why not introduce some amount of caching (at least for some blocks, or
the body of old nodes) for registered users?
yours,
Augustin.
P.S.
If you think your site's users would be interested, I wouldn't mind a
charitable plugin to my site wechange.org :)
--
http://www.wechange.org/
Because we and the world need to change.
http://www.reuniting.info/
Intimate Relationships, peace and harmony in the couple.
More information about the development
mailing list