[drupal-devel] [feature] Add timing information to statistics page
    kbahey 
    drupal-devel at drupal.org
       
    Wed May 11 22:39:41 UTC 2005
    
    
  
Issue status update for http://drupal.org/node/22623
 Project:      Drupal
 Version:      cvs
 Component:    statistics.module
 Category:     feature requests
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  Dries
 Updated by:   kbahey
 Status:       patch
Useful info for sure.
I am with Jeremy that this needs to be parameterized.
Also, are these figures real? Does the average time to generate the
forum page is 30 seconds? I find this hard to swallow.
kbahey
Previous comments:
------------------------------------------------------------------------
May 11, 2005 - 15:05 : Dries
Attachment: http://drupal.org/files/issues/timer.patch (8.56 KB)
I extended the current timer_start() function in bootstrap.inc and added
two new function, timer_read() and timer_stop().  I've used these to
profile drupal.org.  (A small change will be needed to devel.module.)
I've also extended the accesslog-table to keep track of page generation
times.  The information is shown on the "top pages" page
(admin/logs/pages).  It shows each page's average page generation time,
and the total page generation time.  This lets one identify (i) slow
pages and (ii) resource heavy pages.  This is the most essential and
most useful information when tuning your Drupal site.
Please review, or I'll commit it as is.
------------------------------------------------------------------------
May 11, 2005 - 15:13 : Dries
Attachment: http://drupal.org/files/issues/timer.pdf (22 KB)
Screenshot.
------------------------------------------------------------------------
May 11, 2005 - 16:04 : Dries
Setting to patch.  Take a look at the patch + screenshot please.
------------------------------------------------------------------------
May 11, 2005 - 16:19 : moshe weitzman
looks ok ... note that there are some unrelated (i think) variable_set()
changes in the patch.
------------------------------------------------------------------------
May 11, 2005 - 16:20 : Amazon
It's great to have this level of logging granularity.
I am concerned that all the logging would add over head when having
multiple crawlers crawling the site during testing.  Would it be
possible to support batching the log entries and submitting them to the
database as batch query at set increments instead of for every page.
For example I routinely run several wget crawls for different roles
when testing an upgrade.  If I ran 5 crawlers and each of 10, 000+
pages were being crawled then I am concerned the performance would be
impacted.
I understand this isn't the intent of this patch but non intrusive
logging is important in generating accurate performance measurements.
Thank-you for your consideration.
Kieran
------------------------------------------------------------------------
May 11, 2005 - 16:20 : moshe weitzman
in the pdf, i think you are swapping 'total' and 'average' column
headers
------------------------------------------------------------------------
May 11, 2005 - 16:34 : Bèr Kessels
I like the idea. a lot. But from the pdf it is not clear that timer
deals with the generation tilme o a page.
* Are we not boing too geeky here (again)? devel.module does farily
granual timing already. Is Joe User really waiting for timers on how
long it took to make a page. I think not.
* If we really want this in core, it should at least be made very clear
that this is the time it took to make a page. I presented the screenshot
to a client, as a small test. And just like i thought, his reaction was
" Wow, this is handy, I can now see how long a user spent at a certain
page. This will tell me what posts are actually read, and which are
only opened in a browser only to be closed right away". I almost made
the same mistake, but the geek in me told me to look at the code and
see that it is page generation times. Joe user will not ever look at
the code.
* What is the penalty. accesslog already is a heavy load table. Do we
really want aditional data in that table.
* We have access hooks (called after page is loaded) to do aditional
statistics magic. IT is not userd by anything else then the stats in
core. Why cant this go into an advanced stats module?
------------------------------------------------------------------------
May 11, 2005 - 17:03 : Dries
Moshe: the column names are not swapped.  What makes you think they are?
 What can be done to make it more clear?
Moshe: the variable_get() was removed intentionally.  The 'page' timer
is always started.  It's cheap.
Ber: I'm aware of the possible confusion and I agree that we need to
clarify this. Unfortunately, I couldn't think of better column names. 
The problem is that these names should be short (or the table will look
ugly).  Suggestions?
Ber: the devel.module does not provide this functionality.  It doesn't
track page generation times for other users, it doesn't compute
averages, nor does it compute the total time spent generating a
particular page.  The devel.module doesn't have an overview.  This is
important information for anyone who has to tune his or her Drupal
site.  Without this patch, you simply don't know which pages are eating
your resources.
Ber, Kieran: the performance impact is small.  The timer values are
stored as integers and fit in 2 bytes.  So if there are 200.000 entries
in your accesslog-table like on drupal.org, the database overhead is
400kb (a fraction of the table's total size).  No additional SQL
queries are introduced.  As such, performance remains comparable with
the current implementation.  I can benchmark this if you want but I'm
confident it is OK.
I could easily compile a list of the users/IPs that eat most resources.
 This can help us track evil crawlers.  I'm not sure yet how or where to
integrate this UI-wise so I haven't (yet).  One step at a time.
Making it easy to inspect the performance characteristics of your
Drupal site is a good thing.  It creates awareness which in turn will
encourage people to tune and tweak Drupal's performance.  It also lets
you identify problems -- like a module performing badly.  Compare it
with your operating system's "activity/process monitor" (or "top"). 
You don't really need it, but you can't without it either.
------------------------------------------------------------------------
May 11, 2005 - 17:35 : Jeremy at kerneltrap.org
I love the idea very much, and think this will prove extremely useful. 
What about making an option that enables/disables these columns?  The
only reason I suggest this is because they are inherently confusing
without using a lot of verbage.  "Average time" needs to be something
like "Average time for Drupal engine to generate this page", and "Total
time" needs to be something like "Total time to generate".  Or, it could
be fully explained at the configurable that turns the display on and
off...?
""I could easily compile a list of the users/IPs that eat most
resources."
"
Yes, do it!  :)  That would be very useful.
""Would it be possible to support batching the log entries and
submitting them to the database as batch query at set increments
instead of for every page."
"
Where would the data be stored until it was committed to the table?
It would be possible to do something similar to what the banner module
does, writing to multiple cache files.  But this isn't friendly for
distributed web servers.
Do writes to the accesslog slow down measurably when the acceslog table
grows?  If so, then there could be an  accesslog_cache table to which
data is first written, and which is flushed to the main accesslog every
n minutes, or every time it reaches a certain size, etc.  This would
only be useful if writing to a very large table is measurably slower
than writing to a very small table.
    
    
More information about the drupal-devel
mailing list