MySQL Bugs: #24972: Join and sub-query execution differences

Bug #24972	Join and sub-query execution differences
Submitted:	11 Dec 2006 21:55	Modified:	5 Sep 2012 16:35
Reporter:	Jim Bullington	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.0.27, 6.0.14	OS:	Any (Windows)
Assigned to:		CPU Architecture:	Any

Description:
There appears to be a large differential between a join query and a sub-query that produce equivalent results:

t1 has 1000 records, t2 has 100000 records

select b
from t1
join t2 on t2.c = t1.a
where d = 1000
order by b;

Completes in 0.1 seconds

However,

select b
from t1
where a in (select c from t2 where d = 1000)
order by b;

Completes in 44 seconds

Should there be such a large performance difference between these 2 equivalent queries?

How to repeat:
create table t1 (
    a int,
    b int,
    primary key(a)
);

delimiter $

create procedure build_t1(p1 int)
begin
    set @x = 0;
    repeat 
        set @x = @x + 1; 
        insert into t1 values(@x, floor(1+(rand()*10000)));
    until @x > p1 
    end repeat;
end $

delimiter ;

call build_t1(1000);

create table t2 (
    a int,
    c int,
    d int,
    primary key(a)
);

delimiter $

create procedure build_t2(p1 int)
begin
    set @x = 0;
    repeat 
        set @x = @x + 1; 
        insert into t2 values(@x, floor(1+(rand()*1000)), floor(1+(rand()*10000)));
    until @x > p1 
    end repeat;
end $

delimiter ;

call build_t2(100000);

select b
from t1
join t2 on t2.c = t1.a
where d = 1000
order by b;

select b
from t1
where a in (select c from t2 where d = 1000)
order by b;

Thank you for the bug report. This is a well known problem with sub-queries. Will be fixed in MySQL 5.2 (work in progress already).

My goodness!!  5.2 is a long ways off!  Does the commercial MySQL version suffer from the same problem?

The problem is, actually, even worse with 6.0.14:

mysql> select b
    -> from t1
    -> where a in (select c from t2 where d = 1000)
    -> order by b;

+------+
| b    |
+------+
| 1517 |
| 2187 |
| 2417 |
| 2556 |
| 2644 |
| 2796 |
| 3106 |
| 3220 |
| 3785 |
| 4782 |
| 5975 |
| 8733 |
+------+
12 rows in set (8 min 11.35 sec)

mysql> explain select b
    -> from t1
    -> where a in (select c from t2 where d = 1000)
    -> order by b;
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | PRIMARY     | t1    | ALL  | PRIMARY       | NULL | NULL    | NULL |   1001 | Using temporary; Using filesort |
|  1 | PRIMARY     | t2    | ALL  | NULL          | NULL | NULL    | NULL | 100001 | Using where; FirstMatch(t1)     |
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
2 rows in set (0.00 sec)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 6.0.14-alpha-debug |
+--------------------+
1 row in set (0.00 sec)

mysql> explain select b
    -> from t1
    -> join t2 on t2.c = t1.a
    -> where d = 1000
    -> order by b;
+----+-------------+-------+--------+---------------+---------+---------+-----------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows   | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+-----------+--------+----------------------------------------------+
|  1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL      | 100001 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t2.c |      1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------+--------+----------------------------------------------+
2 rows in set (0.01 sec)

mysql> select b from t1 join t2 on t2.c = t1.a where d = 1000 order by b;
+------+
| b    |
+------+
| 1517 |
| 2187 |
| 2417 |
| 2556 |
| 2644 |
| 2796 |
| 3106 |
| 3220 |
| 3785 |
| 4782 |
| 5975 |
| 8733 |
+------+
12 rows in set (0.07 sec)

Fixed with subquery optimizations in 5.6.5.