Description:
Server returns wrong result on a query of the form
SELECT count(*) FROM tb1 WHERE tb1.f1 IN (SELECT tb2.f1 FROM tb2 WHERE tb2.f2 IN (SELECT tb3.f2 FROM tb3));
if it is executed with semijoin=on (which is default), and execution time soars up even on small tables.
In the test case below, select count(*) with semijoin=on returns
+----------+
| count(*) |
+----------+
| 262144 |
+----------+
which is wrong, there are only 64 records in the table.
Explain extended says
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| 1 | PRIMARY | tb1 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | |
| 1 | PRIMARY | tb2 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | Using where |
| 1 | PRIMARY | tb3 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | Using where; FirstMatch(tb1); Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
And show warnings
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select count(0) AS `count(*)` from `test`.`tb1` semi join (`test`.`tb3` join `test`.`tb2`) where ((`test`.`tb2`.`f1` = `test`.`tb1`.`f1`) and (`test`.`tb3`.`f2` = `test`.`tb2`.`f2`))
The second select count(*), with semijoin=off, produces correct result
+----------+
| count(*) |
+----------+
| 64 |
+----------+
explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | tb1 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | Using where |
| 2 | SUBQUERY | tb2 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | Using where |
| 3 | SUBQUERY | tb3 | ALL | NULL | NULL | NULL | NULL | 64 | 100.00 | |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select count(0) AS `count(*)` from `test`.`tb1` where <in_optimizer>(`test`.`tb1`.`f1`,`test`.`tb1`.`f1` in ( <materialize> (select `test`.`tb2`.`f1` AS `f1` from `test`.`tb2` where <in_optimizer>(`test`.`tb2`.`f2`,`test`.`tb2`.`f2` in ( <materialize> (select `test`.`tb3`.`f2` AS `f2` from `test`.`tb3` ), <primary_index_lookup>(`test`.`tb2`.`f2` in <temporary table> on distinct_key where ((`test`.`tb2`.`f2` = `materialized subselect`.`f2`))))) ), <primary_index_lookup>(`test`.`tb1`.`f1` in <temporary table> on distinct_key where ((`test`.`tb1`.`f1` = `materialized subselect`.`f1`)))))
1 row in set (0.00 sec)
As a consequence, when the query is executed with semijoin on slightly bigger tables, it takes too long -- with 256 record per table it's already ~12 seconds, and with 5000 records per table the box just stopped responding after a while.
How to repeat:
use test;
drop table if exists tb1;
drop table if exists tb2;
drop table if exists tb3;
CREATE TABLE `tb1` (
`f1` int(11) DEFAULT NULL,
`f2` char(5) DEFAULT NULL );
insert into tb1 values ( 1, 'abc' );
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
create table tb2 as select * from tb1;
create table tb3 as select * from tb1;
set optimizer_switch = 'semijoin=on';
SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
show warnings \G
set optimizer_switch = 'semijoin=off';
SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
show warnings \G