Bug #46072 Wrong result and nonviable execution time for subquery with semijoin
Submitted: 9 Jul 2009 2:32 Modified: 24 Sep 2009 13:14
Reporter: Elena Stepanova Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.4.4-alpha-debug-log OS:Any
Assigned to: CPU Architecture:Any
Tags: optimizer_switch, semijoin, subquery

[9 Jul 2009 2:32] Elena Stepanova
Description:
Server returns wrong result on a query of the form

SELECT count(*) FROM tb1 WHERE tb1.f1 IN (SELECT tb2.f1 FROM tb2 WHERE tb2.f2 IN (SELECT tb3.f2 FROM tb3));

if it is executed with semijoin=on (which is default), and execution time soars up even on small tables. 

In the test case below, select count(*) with semijoin=on returns

+----------+
| count(*) |
+----------+
|   262144 |
+----------+

which is wrong, there are only 64 records in the table.

Explain extended says

+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|  1 | PRIMARY     | tb1   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 |                                                 |
|  1 | PRIMARY     | tb2   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 | Using where                                     |
|  1 | PRIMARY     | tb3   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 | Using where; FirstMatch(tb1); Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

And show warnings 

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select count(0) AS `count(*)` from `test`.`tb1` semi join (`test`.`tb3` join `test`.`tb2`) where ((`test`.`tb2`.`f1` = `test`.`tb1`.`f1`) and (`test`.`tb3`.`f2` = `test`.`tb2`.`f2`))

The second select count(*), with semijoin=off, produces correct result

+----------+
| count(*) |
+----------+
|       64 |
+----------+

explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY     | tb1   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 | Using where |
|  2 | SUBQUERY    | tb2   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 | Using where |
|  3 | SUBQUERY    | tb3   | ALL  | NULL          | NULL | NULL    | NULL |   64 |   100.00 |             |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

show warnings \G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select count(0) AS `count(*)` from `test`.`tb1` where <in_optimizer>(`test`.`tb1`.`f1`,`test`.`tb1`.`f1` in ( <materialize> (select `test`.`tb2`.`f1` AS `f1` from `test`.`tb2` where <in_optimizer>(`test`.`tb2`.`f2`,`test`.`tb2`.`f2` in ( <materialize> (select `test`.`tb3`.`f2` AS `f2` from `test`.`tb3` ), <primary_index_lookup>(`test`.`tb2`.`f2` in <temporary table> on distinct_key where ((`test`.`tb2`.`f2` = `materialized subselect`.`f2`))))) ), <primary_index_lookup>(`test`.`tb1`.`f1` in <temporary table> on distinct_key where ((`test`.`tb1`.`f1` = `materialized subselect`.`f1`)))))
1 row in set (0.00 sec)

As a consequence, when the query is executed with semijoin on slightly bigger tables, it takes too long -- with 256 record per table it's already ~12 seconds, and with 5000 records per table the box just stopped responding after a while.

How to repeat:
use test;
drop table if exists tb1;
drop table if exists tb2;
drop table if exists tb3;
CREATE TABLE `tb1` (
  `f1` int(11) DEFAULT NULL,
  `f2` char(5) DEFAULT NULL );
insert into tb1 values ( 1, 'abc' );
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
insert into tb1 select * from tb1;
create table tb2 as select * from tb1;
create table tb3 as select * from tb1;

set optimizer_switch = 'semijoin=on'; 
SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
show warnings \G

set optimizer_switch = 'semijoin=off'; 
SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
explain extended SELECT count(*) FROM tb1 WHERE f1 IN (SELECT f1 FROM tb2 WHERE f2 IN (SELECT f2 FROM tb3));
show warnings \G
[9 Jul 2009 4:41] Valeriy Kravchuk
Thank you for the bug report. Verified just as described.
[24 Sep 2009 12:59] Evgeny Potemkin
Looks like a duplicate to the bug#45191.
[24 Sep 2009 13:14] Evgeny Potemkin
Duplicate of the bug#45191.