Bug #47914 | Performance Issue with Subquery | ||
---|---|---|---|
Submitted: | 8 Oct 2009 9:10 | Modified: | 26 Mar 2012 19:30 |
Reporter: | Xinhong Pu | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 4.1 and 5.1 | OS: | Windows (Windows Server 2003) |
Assigned to: | CPU Architecture: | Any | |
Tags: | subquery |
[8 Oct 2009 9:10]
Xinhong Pu
[8 Oct 2009 10:48]
Valeriy Kravchuk
Thank you for the problem report. Please, send exact CREATE TABLE statement(s) and EXPLAIN results for the exact query from recent MySQL 5.1 (5.1.39).
[9 Oct 2009 3:02]
Xinhong Pu
To repeat the result by doing the following steps in mysql command line client: use test create table X (ID int auto_increment primary key, A int, key A (A)) engine=MyISAM; delimiter // create procedure insrecs (n int) begin set @i = n; repeat insert into X (A) values (@i); set @i = @i - 1; until @i <= 0 end repeat; end // delimiter ; call insrecs(100000); drop procedure insrecs; insert into X (A) values (10); select ID, A from X where A in (select A from X group by A having count(*) > 1); The last query will hang the system. CPU usage will go up to 100% for single processor machine and no result is returned after waiting for quite a long time. The query "explain extended select ID, A from X where A in (select A from X group by A having count(*) > 1)" will return the following results: +----+--------------------+-------+-------+---------------+------+---------+------+--------+-------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+-------+---------------+------+---------+------+--------+-------------+-------------+ | 1 | PRIMARY | x | ALL | NULL | NULL | NULL | NULL | 100001 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | x | index | NULL | A | 5 | NULL | 1 | 10000100.00 | Using index | +----+--------------------+-------+-------+---------------+------+---------+------+--------+-------------+-------------+ 2 rows in set, 1 warning (0.00 sec)
[9 Oct 2009 3:51]
Valeriy Kravchuk
I've got the same results with recent 5.1.40. It is a known problem of MySQL that considers some subqueries as dependent while they are NOT. The problem was fixed long time ago in 6.0.x and should be fixed in recent 5.4.x. Optimizer feature is called "materialization". Look: 77-52-242-160:6.0-codebase openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 6.0.14-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create table X (ID int auto_increment primary key, A int, key A (A)) engine=MyISAM; Query OK, 0 rows affected (0.42 sec) mysql> delimiter // mysql> create procedure insrecs (n int) -> begin -> set @i = n; -> repeat -> insert into X (A) values (@i); -> set @i = @i - 1; -> until @i <= 0 -> end repeat; -> end -> // Query OK, 0 rows affected (0.01 sec) mysql> delimiter ; mysql> call insrecs(100000); Query OK, 0 rows affected (55.00 sec) mysql> drop procedure insrecs; Query OK, 0 rows affected (0.00 sec) mysql> insert into X (A) values (10); Query OK, 1 row affected (0.01 sec) mysql> explain -> extended select ID, A from X where A in (select A from X group by A having count(*) > 1)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: X type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 100001 filtered: 100.00 Extra: Using where *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: X type: index possible_keys: NULL key: A key_len: 5 ref: NULL rows: 100001 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.00 sec) mysql> show variables like 'optimizer_sw%'\G *************************** 1. row *************************** Variable_name: optimizer_switch Value: firstmatch=on,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=on,semijoin=on 1 row in set (0.00 sec) mysql> select ID, A from X where A in (select A from X group by A having count(*) > 1)\G *************************** 1. row *************************** ID: 99991 A: 10 *************************** 2. row *************************** ID: 100001 A: 10 2 rows in set (0.74 sec) It is unlikely that the fix will be backported to 5.1. So, please, wait for new releases of 5.4.
[9 Oct 2009 8:50]
Roy Lyseng
There are possible workarounds that will let this query run more efficiently: 1. Add a WHERE clause in the subquery: select ID, A from X where A in (select A from X as Y where X.A=Y.A group by A having count(*) > 1); 2. or rewrite as EXISTS: select ID, A from X where exists (select * from X as Y where X.A=Y.A group by A having count(*) > 1); In both cases, the where clause in the subquery will direct the optimizer to use the index in each subquery evaluation. It is almost as fast as the materialization technique.
[26 Mar 2012 19:30]
Paul DuBois
Noted in 5.6.5 changelog. Several subquery performance issues were resolved through the implementation of semi-join subquery optimization strategies.