A resounding big +1 from me. One of the first things I noticed when I began using Drupal was that it logged useless statistics (i.e. those from crawlers such as googlebot). This was a huge problem for me, since my site displays a hit counter for every node, and I wanted these counters to be accurate. That's why one of my first Drupal hacks ever was of the statistics module, to check AT THE POINT OF LOGGING whether the hit is from a crawler, and if so, to not log the visit. I also did a hack that checks if a particular visitor has already been logged for hitting that page, thus making the hit counters show only UNIQUE, NON-BOT VISITS. I've been using these hacks forever, and I would never consider going back to a statistics system that lacks point-of-logging filtering. The big thing with Drupal's statistics system, is that the node_counter table just gets its values incremented whenever a new hit is logged. This is a counter, not a list of hits (for performance reasons, I assume). So once that value has been incremented, the damage is done. It's just a raw aggregate number. There's no way to 'filter' a raw number at display time - the number has to be kept accurate by only being incremented when it really should be, and never for useless hits. I've been planning to write a 'botstopper' module for some time, which would basically be a cleaned up version of my dirty hacks. But mike's already released some modules that go towards that end (you always seem to be on the same wavelength as me, mike - you beat me to pathauto, and now to this! :-) ). And I think having this in core would be a great boon to all Drupal users. I suspect that the number of Drupal users interested in crawler hits is minimal. And I can't see why anyone would want their node counters to reflect crawler hits. Jeremy Epstein www.greenash.net.au - a site where bots are never logged! On 8/22/05, mikeryan <drupal-devel@drupal.org> wrote:
Issue status update for http://drupal.org/node/29328 Post a follow up: http://drupal.org/project/comments/add/29328
Project: Drupal Version: cvs Component: statistics.module Category: feature requests Priority: normal Assigned to: Anonymous Reported by: mikeryan Updated by: mikeryan Status: patch (code needs review) Attachment: http://drupal.org/files/issues/statistics.module_2.patch (6.73 KB)
Justification:
For all but the most heavily-trafficked sites, the statistics reported by Drupal are severely skewed by visits from crawlers, and from the administrators themselves. Assuming that the purpose of the statistics is to inform administrators about visits from human beings other than themselves, it is highly desirable to do our best to ignore other visits. To that end, I developed the statistics_filter module [1] (and its spinoff, the browscap module [2]).
Why core?
There's enough concern over the logging the statistics module does in the exit hook for the performance issues to be detailed in the help. To work as a contributed module, the statistics_filter module needs to undo what the statistics module did, essentially doubling the overhead for accesses that are meant to be ignored. If incorporated into the statistics module directly, the filtering functionality will actually reduce the database overhead (no database queries at all for ignored roles).
Open issue
Ignoring crawlers (which are the biggest part of the issue for most sites - my own site, with modest volume, gets 40% of its raw traffic from the Google crawler) requires the browscap database to identify crawlers. Currently I have maintenance of the browscap data (as well as provision for browser/crawler statistcs) encapsulated in a separate module. Should this support be submitted to core as a separate module, or integrated into the statistics module?
Attached is a patch to statistics.module implementing filtering by roles, with filtering out crawlers dependent on an external browscap module. I hope this patch can be accepted into Drupal 4.7 - if the feeling is that the browscap code should be incorporated into statistics.module, I can do that.
Thanks. [1] http://drupal.org/node/18013 [2] http://drupal.org/node/26569
mikeryan