[drupal-devel] [feature] Filtering of statistics ("real" visitors only)

Jeremy Epstein jazepstein at gmail.com
Tue Aug 23 06:38:15 UTC 2005


A resounding big +1 from me.

One of the first things I noticed when I began using Drupal was that
it logged useless statistics (i.e. those from crawlers such as
googlebot). This was a huge problem for me, since my site displays a
hit counter for every node, and I wanted these counters to be
accurate.

That's why one of my first Drupal hacks ever was of the statistics
module, to check AT THE POINT OF LOGGING whether the hit is from a
crawler, and if so, to not log the visit. I also did a hack that
checks if a particular visitor has already been logged for hitting
that page, thus making the hit counters show only UNIQUE, NON-BOT
VISITS. I've been using these hacks forever, and I would never
consider going back to a statistics system that lacks point-of-logging
filtering.

The big thing with Drupal's statistics system, is that the
node_counter table just gets its values incremented whenever a new hit
is logged. This is a counter, not a list of hits (for performance
reasons, I assume). So once that value has been incremented, the
damage is done. It's just a raw aggregate number. There's no way to
'filter' a raw number at display time - the number has to be kept
accurate by only being incremented when it really should be, and never
for useless hits.

I've been planning to write a 'botstopper' module for some time, which
would basically be a cleaned up version of my dirty hacks. But mike's
already released some modules that go towards that end (you always
seem to be on the same wavelength as me, mike - you beat me to
pathauto, and now to this! :-) ). And I think having this in core
would be a great boon to all Drupal users.

I suspect that the number of Drupal users interested in crawler hits
is minimal. And I can't see why anyone would want their node counters
to reflect crawler hits.

Jeremy Epstein
www.greenash.net.au - a site where bots are never logged!

On 8/22/05, mikeryan <drupal-devel at drupal.org> wrote:
> Issue status update for
> http://drupal.org/node/29328
> Post a follow up:
> http://drupal.org/project/comments/add/29328
> 
>  Project:      Drupal
>  Version:      cvs
>  Component:    statistics.module
>  Category:     feature requests
>  Priority:     normal
>  Assigned to:  Anonymous
>  Reported by:  mikeryan
>  Updated by:   mikeryan
>  Status:       patch (code needs review)
>  Attachment:   http://drupal.org/files/issues/statistics.module_2.patch (6.73 KB)
> 
> Justification:
> 
> 
> For all but the most heavily-trafficked sites, the statistics reported
> by Drupal are severely skewed by visits from crawlers, and from the
> administrators themselves. Assuming that the purpose of the statistics
> is to inform administrators about visits from human beings other than
> themselves, it is highly desirable to do our best to ignore other
> visits. To that end, I developed the statistics_filter module [1] (and
> its spinoff, the browscap module [2]).
> 
> 
> Why core?
> 
> 
> There's enough concern over the logging the statistics module does in
> the exit hook for the performance issues to be detailed in the help. To
> work as a contributed module, the statistics_filter module needs to undo
> what the statistics module did, essentially doubling the overhead for
> accesses that are meant to be ignored. If incorporated into the
> statistics module directly, the filtering functionality will actually
> reduce the database overhead (no database queries at all for ignored
> roles).
> 
> 
> Open issue
> 
> 
> Ignoring crawlers (which are the biggest part of the issue for most
> sites - my own site, with modest volume, gets 40% of its raw traffic
> from the Google crawler) requires the browscap database to identify
> crawlers. Currently I have maintenance of the browscap data (as well as
> provision for browser/crawler statistcs) encapsulated in a separate
> module. Should this support be submitted to core as a separate module,
> or integrated into the statistics module?
> 
> 
> Attached is a patch to statistics.module implementing filtering by
> roles, with filtering out crawlers dependent on an external browscap
> module. I hope this patch can be accepted into Drupal 4.7 - if the
> feeling is that the browscap code should be incorporated into
> statistics.module, I can do that.
> 
> 
> Thanks.
> [1] http://drupal.org/node/18013
> [2] http://drupal.org/node/26569
> 
> 
> 
> 
> mikeryan
> 
>



More information about the drupal-devel mailing list