| Bug #20199 | Join slower than exists | ||
|---|---|---|---|
| Submitted: | 1 Jun 2006 13:11 | Modified: | 8 Dec 2009 16:04 |
| Reporter: | Andre Timmer | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
| Version: | 5.0.18, 6.0.14 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[1 Jun 2006 17:41]
Jorge del Conde
Thanks for your bug report. I tested this under FC5:
mysql> select count(*) from tab1;
+----------+
| count(*) |
+----------+
| 529911 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from tab2;
+----------+
| count(*) |
+----------+
| 529910 |
+----------+
1 row in set (0.00 sec)
mysql>
mysql> explain
-> select count(*)
-> from tab1 aa
-> , tab2 bb
-> where aa.id = bb.id;
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
| 1 | SIMPLE | bb | index | PRIMARY | PRIMARY | 4 | NULL | 529910 | Using index |
| 1 | SIMPLE | aa | eq_ref | PRIMARY | PRIMARY | 4 | test.bb.id | 1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
2 rows in set (0.00 sec)
mysql> explain
-> select count(*)
-> from tab1 aa
-> where exists (
-> select ''
-> from tab2 bb
-> where aa.id = bb.id
-> );
+----+--------------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
| 1 | PRIMARY | aa | index | NULL | PRIMARY | 4 | NULL | 613689 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | bb | eq_ref | PRIMARY | PRIMARY | 4 | test.aa.id | 1 | Using index |
+----+--------------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
2 rows in set (0.01 sec)
mysql>
mysql> select count(*)
-> from tab1 aa
-> , tab2 bb
-> where aa.id = bb.id;
+----------+
| count(*) |
+----------+
| 621794 |
+----------+
1 row in set (7.06 sec)
mysql> select count(*)
-> from tab1 aa
-> where exists (
-> select ''
-> from tab2 bb
-> where aa.id = bb.id
-> );
+----------+
| count(*) |
+----------+
| 718545 |
+----------+
1 row in set (8.51 sec)
mysql>
[1 Jun 2006 17:42]
Jorge del Conde
I reproduced this using 5.0.23bk
[1 Jun 2006 17:43]
Jorge del Conde
my results have a ~1.5 sec difference, but the difference is still considerable between both queries.
[31 Aug 2006 16:44]
Igor Babaev
This performance problem will be fixed in the frame of WL #2980 "Subquery optimizations".
[1 Dec 2009 4:26]
Valeriy Kravchuk
With 6.0.14 I've got the same results as Jorge long time ago:
mysql> select count(*) from tab1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (1.00 sec)
mysql> select count(*) from tab2;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.19 sec)
mysql>
mysql> explain
-> select count(*)
-> from tab1 aa
-> , tab2 bb
-> where aa.id = bb.id;
+----+-------------+-------+--------+---------------+---------+---------+------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------+---------+-------------+
| 1 | SIMPLE | aa | index | PRIMARY | PRIMARY | 4 | NULL | 1000000 | Using index |
| 1 | SIMPLE | bb | eq_ref | PRIMARY | PRIMARY | 4 | test.aa.id | 1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+------------+---------+-------------+
2 rows in set (0.00 sec)
mysql> explain
-> select count(*)
-> from tab1 aa
-> where exists (
-> select ''
-> from tab2 bb
-> where aa.id = bb.id
-> );
+----+--------------------+-------+--------+---------------+---------+---------+------------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | aa | index | NULL | PRIMARY | 4 | NULL | 1000000 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | bb | eq_ref | PRIMARY | PRIMARY | 4 | test.aa.id | 1 | Using index |
+----+--------------------+-------+--------+---------------+---------+---------+------------+---------+--------------------------+
2 rows in set (0.00 sec)
mysql> select count(*)
-> from tab1 aa
-> , tab2 bb
-> where aa.id = bb.id;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (22.04 sec)
mysql> select count(*)
-> from tab1 aa
-> where exists (
-> select ''
-> from tab2 bb
-> where aa.id = bb.id
-> );
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (27.63 sec)
mysql> select version();
+--------------------+
| version() |
+--------------------+
| 6.0.14-alpha-debug |
+--------------------+
1 row in set (0.00 sec)
So, join is faster(!). What exactly is a problem here, other than a request for further, more exotic optimization maybe?
[1 Dec 2009 13:49]
Andre Timmer
Thank you for testing. Let's say it's indeed a wish for an exotic tuning optimization. Maybe good idea to close this call.
[1 Dec 2009 14:13]
Valeriy Kravchuk
As join is faster than subquery with exists now, I'd say that optimizer should be made clever enough, eventually, to rewrite the query using join (or, at least, consider this way of optimization among others).
[8 Dec 2009 16:05]
Omer Barnir
Duplicate of bug#24972 (based on Valeriy's comment above

Description: Join slower than exists. It gets really worth for tables with many rows and many columns. The MySQL server could execute queries like this more efficient. How to repeat: delimiter ; drop table if exists tab1; drop table if exists tab2; create table tab1 (id integer, col2 varchar(200), primary key (id)); create table tab2 (id integer, col2 varchar(200), primary key (id)); drop procedure if exists fill_tab; delimiter // create procedure fill_tab() begin declare v_max int default 1000 * 1000; declare v_commit int default 1000; declare v_cnt int default 0; l:loop set v_cnt = v_cnt + 1; if (v_cnt > v_max) then leave l; end if; if (v_cnt/1000 = v_commit) then commit; end if; insert into tab1 (id, col2) values (v_cnt, 'ddddddddddddddkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkeeekkkkkkkkkkkkkk'); insert into tab2 (id, col2) values (v_cnt, 'ddddddddddddddkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkeeekkkkkkkkkkkkkk'); end loop; commit; end; // delimiter ; call fill_tab(); analyze table tab1; analyze table tab2; select count(*) from tab1; select count(*) from tab2; explain select count(*) from tab1 aa , tab2 bb where aa.id = bb.id; explain select count(*) from tab1 aa where exists ( select '' from tab2 bb where aa.id = bb.id ); select count(*) from tab1 aa , tab2 bb where aa.id = bb.id; select count(*) from tab1 aa where exists ( select '' from tab2 bb where aa.id = bb.id ); ------ The first query takes 30s, second 35s. With the same kind of tables but more data the difference is 4 minutes and 10 minutes on our system!!! It should not be necessairy to rewrite the queries. Suggested fix: Optimizer should know, based on meta data, that a joining one row will result in at most 1 row. So the plan could be: - recognizing it's a special join, at most one row - recognizing count(*) is the same as count(tab1.id) because this is a NOT null column - read rows from tab1 - search if there is a corresponding row in tab2 BUT don't add the row to the internal result, this is not necessairy