Issue status update for http://drupal.org/node/28924 Post a follow up: http://drupal.org/project/comments/add/28924 Project: Drupal Version: cvs Component: statistics.module Category: bug reports Priority: minor Assigned to: Jeremy@kerneltrap.org Reported by: Jeremy@kerneltrap.org Updated by: drumm Status: patch (code needs review) MySQL's insert delayed extension would be perfect for this. Unfortunately it is not ANSI SQL. The inserts are done in the exit hook so the extra time is not usually passed on to the user (when drupal_goto() is used the exit hook execution happens before the redirect is sent). Although I'm guessing the total index maintenance time may be larger than the savings on the statistics pages. drumm Previous comments: ------------------------------------------------------------------------ Mon, 15 Aug 2005 17:26:07 +0000 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/statistics.module-cvs_0.patch (615 bytes) Displaying statistics pages can be slow. The attached patch removes one of the "GROUP BY" columns to increase the speed of generating the "Top pages in the past n days" page. (Grouping by 'title' does not work, as there are many different paths that can have the same title. Instead, 'path' is unique for each page, so it is a more logical column to group by. I tested this on my active webpage to verify that the resulting page was what I expected.) ------------------------------------------------------------------------ Tue, 16 Aug 2005 18:08:55 +0000 : drumm How about a key on the path column? ------------------------------------------------------------------------ Tue, 16 Aug 2005 19:48:46 +0000 : Dries Committed to HEAD. Marking this active until we clarifie the path index thing. ------------------------------------------------------------------------ Wed, 17 Aug 2005 00:18:51 +0000 : Jeremy@kerneltrap.org In which case several of the columns should have keys. I was going to add it to my earlier patch, but haven't had time. ------------------------------------------------------------------------ Sun, 21 Aug 2005 11:36:43 +0000 : Cvbge SQL code as in the patch won't work with postgresql: dt=> SELECT COUNT(path) AS hits, path, title, AVG(timer) AS average_time, SUM(timer) AS total_time FROM accesslog GROUP BY path; ERROR: column "accesslog.title" must appear in the GROUP BY clause or be used in an aggregate function ------------------------------------------------------------------------ Sat, 27 Aug 2005 16:12:59 +0000 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/statistics_9.patch (1.16 KB) The attached adds three keys that I confirmed are used. The keys are on path, url and uid. Before adding a key for "path": EXPLAIN SELECT a.aid, a.timestamp, a.url, a.uid, u.name FROM accesslog a LEFT JOIN users u ON a.uid = u.uid WHERE a.path LIKE 'node/1%%' table type possible_keys key key_len ref rows Extra a ALL 75 Using where u eq_ref PRIMARY PRIMARY 4 a.uid 1 After: table type possible_keys key key_len ref rows Extra a range path path 256 3 Using where u eq_ref PRIMARY PRIMARY 4 a.uid 1 And another query that gains from the "path" key: EXPLAIN SELECT COUNT(DISTINCT(path)) FROM accesslog; table type possible_keys key key_len ref rows Extra accesslog index path 256 75 Using index Before adding the key for "url": EXPLAIN SELECT COUNT(DISTINCT(url)) FROM accesslog WHERE url '' AND url NOT LIKE '%%node%%'; le type possible_keys key key_len ref rows Extra accesslog ALL 75 Using where And after: table type possible_keys key key_len ref rows Extra accesslog index url 256 75 Using where; Using index Before adding the key for "uid": EXPLAIN SELECT COUNT(DISTINCT(uid)) FROM accesslog; Result table type possible_keys key key_len ref rows Extra accesslog ALL 75 After: table type possible_keys key key_len ref rows Extra accesslog index uid 5 75 Using index ------------------------------------------------------------------------ Fri, 02 Sep 2005 15:37:05 +0000 : moshe weitzman sure, these indices speed up the admin pages. but remember that every index needsb to be maintained for every insert. since accesslog is inserted into on every view, this is potentially harmful to a lot more people than admins. i'm not sure how to measure this tradeoff. one approach is to copy the access log table to a read only table and do admin pages off of the copy. that means we have a copy dedicated to reading a different one dedicated to writing.