MySQL Bugs: #96423: Optimizing subquery is good for SELECT, is not enough for DELETE.

Bug #96423	Optimizing subquery is good for SELECT, is not enough for DELETE.
Submitted:	5 Aug 2019 3:04	Modified:	31 Mar 2020 18:21
Reporter:	Meiji Kimura	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.6, 5.7, 8.0, 5.7.27, 8.0.17, 5.6.44	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
MySQL 5.6 or later, Subquery Optimization become better than old version.

https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html
https://dev.mysql.com/doc/refman/8.0/en/subquery-optimization.html

We can see improved plan for such a query.

SELECT * FROM t1
WHERE t1.a IN (SELECT t2.b FROM t2 WHERE where_condition);

It works for SELECT, but the same condition for DELETE still choose bad plan.

DELETE FROM t1
WHERE t1.a IN (SELECT t2.b FROM t2 WHERE where_condition);

SELECT query is transformed like this,

mysql> explain SELECT * FROM t1
    -> WHERE t1.a IN (SELECT t2.b FROM t2 WHERE c = 10);
+----+-------------+-------+------------+------+---------------+---------+------
---+-----------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_l
en | ref       | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+------
---+-----------+--------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | PRIMARY       | NULL    | NULL
   | NULL      | 998918 |    10.00 | Using where |
|  1 | SIMPLE      | t1    | NULL       | ref  | PRIMARY       | PRIMARY | 4
   | test.t2.b |      1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+------
---+-----------+--------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings;
+-------+------+----------------------------------------------------------------
--------------------------------------------------------------------------------
------------------------------+
| Level | Code | Message

                              |
+-------+------+----------------------------------------------------------------
--------------------------------------------------------------------------------
------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS
 `b` from `test`.`t2` join `test`.`t1` where ((`test`.`t1`.`a` = `test`.`t2`.`b`
) and (`test`.`t2`.`c` = 10)) |
+-------+------+----------------------------------------------------------------
--------------------------------------------------------------------------------
------------------------------+
1 row in set (0.00 sec)

How to repeat:
create database if not exists test;
use test
drop table if exists t1;
create table t1(a int not null, b int not null, primary key (a,b));
drop table if exists t2;
create table t2(b int not null, c int not null, primary key(b));

drop procedure if exists test_insert_commit;
delimiter $
create procedure test_insert_commit(v_max int)
begin
declare v_id int default 0;
repeat
set v_id = v_id + 1;
insert into t1 values(v_id, v_id);
insert into t2 values(v_id, v_id);
if (mod(v_id,10000) = 0) then commit;
end if;
until v_id >= v_max
end repeat;
end$
delimiter ;

autocommit=0;
call test_insert_commit(1000000);
commit;

explain SELECT * FROM t1
WHERE t1.a IN (SELECT t2.b FROM t2 WHERE c = 10);

explain delete FROM t1
WHERE t1.a IN (SELECT t2.b FROM t2 WHERE c = 10);

mysql>  SELECT * FROM t1
    -> WHERE t1.a IN (SELECT t2.b FROM t2 WHERE c = 10);
+----+----+
| a  | b  |
+----+----+
| 10 | 10 |
+----+----+
1 row in set (1.17 sec)

mysql> delete FROM t1
    -> WHERE t1.a IN (SELECT t2.b FROM t2 WHERE c = 10);
Query OK, 1 row affected (18.20 sec)

We have to modify the above query like as below by hand :(
It become 20x faster than before.

delete t1 from `test`.`t2` join `test`.`t1` where ((`test`.`t1`.`a` = `test`.`t2`.`b`) and (`test`.`t2`.`c` = 10));
Query OK, 0 rows affected (0.95 sec)

Suggested fix:
Should be choose better plan like SELECT.

Hello Meiji-San,

Thank you for the report and test case.

regards,
Umesh

Hello. This is the same bug as: https://bugs.mysql.com/bug.php?id=72895.
A workaround is to change the DELETE to use the multi-table syntax:
instead of
DELETE FROM t1 etc;
DELETE t1.* FROM t1 etc;
then it will use the same optimization as SELECT.

Fixed in MySQL 8.0.21 as part of WL#6057. 

See same for docs info.

Closed.

Precision on the comment above: in version 8.0.21, a single-table DELETE will try to use the same optimizations as the multi-table DELETE already did: subquery materialization, or semijoin, for NOT EXISTS, and for NOT IN if nullability allows.