Drupal.org slow queries analyzed with explain: CVS tables
Hi, here are 3 queries that are showing up in the MySQL slow query log. Worst case: you get a "using filesort" which means is writing* a special table to disk to execute the query. MySQL on Drupal.org: Tmp Table Size : 536870912 The query type is all which means it's not using the index and reading every row from the table for some reason. Cause: query on CVS_messages is against all 29468 rows and then it's probably done for both repositories listed in cvs_repositories. What are the two entries in the cvs_repository table, is a table with 2 rows? mysql> EXPLAIN SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created DESC LIMIT 20280, 10; +-------+--------+---------------+---------+---------+-------+------- +---------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+------- +---------------------------------+ | m | ALL | uid,rid | NULL | NULL | NULL | 29468 | Using temporary; Using filesort | | r | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using where | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | | +-------+--------+---------------+---------+---------+-------+------- +---------------------------------+ 3 rows in set (0.00 sec) Worst case: using filesort. The first query is of type index which means the full index is being read, which is better than all but still very bad. Cause: Group By causes the where. Temp table can't be done in memory so it results in file sort. mysql> EXPLAIN SELECT COUNT(DISTINCT(m.cid)) AS count, m.uid, u.name FROM cvs_messages m INNER JOIN users u ON m.uid = u.uid AND m.uid != 0 AND m.created > 1150344858 GROUP BY m.uid ORDER BY count DESC LIMIT 0, 15; +-------+--------+---------------+---------+---------+-------+------- +----------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+------- +----------------------------------------------+ | m | index | uid | uid | 4 | NULL | 29468 | Using where; Using temporary; Using filesort | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | | +-------+--------+---------------+---------+---------+-------+------- +----------------------------------------------+ 2 rows in set (0.00 sec) mysql> EXPLAIN SELECT COUNT(DISTINCT m.cid) FROM cvs_messages m INNER JOIN users u ON m.uid = u.uid; +-------+--------+---------------+---------+---------+-------+------- +-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+------- +-------------+ | m | ALL | uid | NULL | NULL | NULL | 29468 | | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | Using index | +-------+--------+---------------+---------+---------+-------+------- +-------------+ 2 rows in set (0.00 sec) Any suggestions? Kieran TABLE SCHEMAS desc cvs_repositories; +------------+---------------------+------+-----+--------- +----------------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------------+------+-----+--------- +----------------+ | rid | int(10) unsigned | | PRI | NULL | auto_increment | | name | varchar(255) | | UNI | | | | root | varchar(255) | | | | | | modules | varchar(255) | | | | | | diffurl | varchar(255) | | | | | | newurl | varchar(255) | | | | | | trackerurl | varchar(255) | | | | | | method | tinyint(2) unsigned | | | 0 | | | updated | int(11) unsigned | | | 0 | | +------------+---------------------+------+-----+--------- +----------------+ 9 rows in set (0.00 sec) desc cvs_messages; +---------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+-------+ | cid | int(10) unsigned | | PRI | 0 | | | rid | int(10) unsigned | | MUL | 0 | | | created | int(10) unsigned | | | 0 | | | user | varchar(255) | | | | | | message | text | YES | | NULL | | | uid | int(10) unsigned | | MUL | 0 | | +---------+------------------+------+-----+---------+-------+ 6 rows in set (0.10 sec)
Kieran Lal wrote:
Hi, here are 3 queries that are showing up in the MySQL slow query log.
Worst case: you get a "using filesort" which means is writing* a special table to disk to execute the query. MySQL on Drupal.org: Tmp Table Size : 536870912 The query type is all which means it's not using the index and reading every row from the table for some reason. Cause: query on CVS_messages is against all 29468 rows and then it's probably done for both repositories listed in cvs_repositories. What are the two entries in the cvs_repository table, is a table with 2 rows?
mysql> EXPLAIN SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created DESC LIMIT 20280, 10; +-------+--------+---------------+---------+---------+-------+-------+---------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+-------+---------------------------------+
| m | ALL | uid,rid | NULL | NULL | NULL | 29468 | Using temporary; Using filesort | | r | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using where | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | | +-------+--------+---------------+---------+---------+-------+-------+---------------------------------+
3 rows in set (0.00 sec)
I am not sure we need the DISTINCT here. At least, we probably do not need if to act on all of the selects.
Worst case: using filesort. The first query is of type index which means the full index is being read, which is better than all but still very bad. Cause: Group By causes the where. Temp table can't be done in memory so it results in file sort.
mysql> EXPLAIN SELECT COUNT(DISTINCT(m.cid)) AS count, m.uid, u.name FROM cvs_messages m INNER JOIN users u ON m.uid = u.uid AND m.uid != 0 AND m.created > 1150344858 GROUP BY m.uid ORDER BY count DESC LIMIT 0, 15; +-------+--------+---------------+---------+---------+-------+-------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+-------+----------------------------------------------+
| m | index | uid | uid | 4 | NULL | 29468 | Using where; Using temporary; Using filesort | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | | +-------+--------+---------------+---------+---------+-------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Shouldn't m.cid be distinct on its own?
mysql> EXPLAIN SELECT COUNT(DISTINCT m.cid) FROM cvs_messages m INNER JOIN users u ON m.uid = u.uid; +-------+--------+---------------+---------+---------+-------+-------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+---------------+---------+---------+-------+-------+-------------+
| m | ALL | uid | NULL | NULL | NULL | 29468 | | | u | eq_ref | PRIMARY | PRIMARY | 4 | m.uid | 1 | Using index | +-------+--------+---------------+---------+---------+-------+-------+-------------+
2 rows in set (0.00 sec)
Same here.
Any suggestions?
Kieran
TABLE SCHEMAS
desc cvs_repositories; +------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra | +------------+---------------------+------+-----+---------+----------------+
| rid | int(10) unsigned | | PRI | NULL | auto_increment | | name | varchar(255) | | UNI | | | | root | varchar(255) | | | | | | modules | varchar(255) | | | | | | diffurl | varchar(255) | | | | | | newurl | varchar(255) | | | | | | trackerurl | varchar(255) | | | | | | method | tinyint(2) unsigned | | | 0 | | | updated | int(11) unsigned | | | 0 | | +------------+---------------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)
desc cvs_messages; +---------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+-------+ | cid | int(10) unsigned | | PRI | 0 | |
Yeah, it is the primary key. Cheers, Gerhard
On Jul 30, 2006, at 9:41 PM, Kieran Lal wrote:
mysql> EXPLAIN SELECT DISTINCT m.*, r.*, u.name, u.uid FROM cvs_messages m INNER JOIN cvs_repositories r ON m.rid = r.rid INNER JOIN users u ON m.uid = u.uid ORDER BY m.created
Here's what I think is happening. cvs_messages has 25K rows and is joining on cvs_repositories which has two rows, one for core and one for contrib. Then that is joined against users table which has approximately 80K users. 25K joined on 2 joined on 80K. I am wondering if we can just get the values from the cvs_repositories into a PHP array and then join cvs_messages to users on uid. Any ideas would be welcome. Cheers, Kieran
On Jul 31, 2006, at 2:57 PM, Kieran Lal wrote:
Here's what I think is happening.
cvs_messages has 25K rows and is joining on cvs_repositories which has two rows, one for core and one for contrib.
Then that is joined against users table which has approximately 80K users.
25K joined on 2 joined on 80K. I am wondering if we can just get the values from the cvs_repositories into a PHP array and then join cvs_messages to users on uid. Any ideas would be welcome.
i had killes run this query w/ and w/o the INNER JOIN on {cvs_repositories}. he posted the results in http://drupal.org/node/ 74238 (the issue in the cvs.module's queue about fixing all this stuff). the results are that we end up with a tablesort on the query in both cases. so, i don't think re-organizing the code to do the equivalent of the INNER JOIN in php code is going to help. i think this is just a big, nasty, slow query, and i'm not sure there's much we can do about it. furthermore, killes already added caching, so if we're trying to do the identical query, we just lookup the results from the cache. apparently there are something like 320 cached queries in d.o's DB in the last day. so, at least that much is working. any further insight into this problem would be most appreciated (i'm still relatively new to all this DB query optimization stuff). feel free to post ideas (and patches!) directly into http://drupal.org/ node/74238. thanks, -dww
Derek Wright wrote:
On Jul 31, 2006, at 2:57 PM, Kieran Lal wrote:
Here's what I think is happening.
cvs_messages has 25K rows and is joining on cvs_repositories which has two rows, one for core and one for contrib.
Then that is joined against users table which has approximately 80K users.
25K joined on 2 joined on 80K. I am wondering if we can just get the values from the cvs_repositories into a PHP array and then join cvs_messages to users on uid. Any ideas would be welcome.
i had killes run this query w/ and w/o the INNER JOIN on {cvs_repositories}. he posted the results in http://drupal.org/node/74238 (the issue in the cvs.module's queue about fixing all this stuff). the results are that we end up with a tablesort on the query in both cases. so, i don't think re-organizing the code to do the equivalent of the INNER JOIN in php code is going to help. i think this is just a big, nasty, slow query, and i'm not sure there's much we can do about it.
furthermore, killes already added caching,
Well, turns out my caching didn't cache a single query. :p The cache keys for insert and retrieval did not match. :p But it does now work, so I think we can put this case to rest.
so if we're trying to do the identical query, we just lookup the results from the cache. apparently there are something like 320 cached queries in d.o's DB in the last day. so, at least that much is working.
That number was the number of executed _and_ logged queries. We only log only every tenth query, though. Cheers, Gerhard
On 01 Aug 2006, at 02:39, Derek Wright wrote:
i had killes run this query w/ and w/o the INNER JOIN on {cvs_repositories}. he posted the results in http://drupal.org/ node/74238 (the issue in the cvs.module's queue about fixing all this stuff). the results are that we end up with a tablesort on the query in both cases. so, i don't think re-organizing the code to do the equivalent of the INNER JOIN in php code is going to help. i think this is just a big, nasty, slow query, and i'm not sure there's much we can do about it.
Did you try to change the order of the JOINs? I don't think it will make a big difference but it is worth a try. -- Dries Buytaert :: http://www.buytaert.net/
participants (4)
-
Derek Wright -
Dries Buytaert -
Gerhard Killesreiter -
Kieran Lal