Bug #96334 subquery in "EXISTS (subquery)" does not use index properly
Submitted: 26 Jul 2019 4:39 Modified: 29 Jul 2019 7:06
Reporter: Seunguck Lee Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.16 OS:Any
Assigned to: CPU Architecture:Any

[26 Jul 2019 4:39] Seunguck Lee
Description:
With "EXISTS (SEELCT .. FROM ..) syntax,
when subquery condition use outer table's columns and subquery condition use equal compare, then index is used fully as expect.
But subquery condition use non-equal compare, then subquery does not use that part of index.

According to the execution plan, equal compare subquery use 11 bytes of key_length, but non-equal compare subquery only use 2 bytes(only is_target column) of key_length.

-- // This is equal compare in subquery
select * from pos p1 
where p1.is_target=0 
  and exists (select 1 from pos p2 where p2.is_target=1 and p2.x=p1.x and p2.y=p2.y)
+----+--------------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+--------------------------+
| id | select_type        | table | partitions | type | possible_keys | key       | key_len | ref                | rows   | filtered | Extra                    |
+----+--------------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+--------------------------+
|  1 | PRIMARY            | p1    | NULL       | ref  | ix_nearby     | ix_nearby | 2       | const              | 621500 |   100.00 | Using where              |
|  2 | DEPENDENT SUBQUERY | p2    | NULL       | ref  | ix_nearby     | ix_nearby | 11      | const,TB_USER.p1.x |      1 |    10.00 | Using where; Using index |
+----+--------------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+--------------------------+

-- // This is non-equal compare in subquery
select * from pos p1 
where p1.is_target=0 
  and exists (select 1 from pos p2 where p2.is_target=1 and p2.x between p1.x-9000 and p1.x+9000 and p2.y between p1.y-9000 and p1.y+9000)
+----+--------------------+-------+------------+------+---------------+-----------+---------+-------+--------+----------+--------------------------+
| id | select_type        | table | partitions | type | possible_keys | key       | key_len | ref   | rows   | filtered | Extra                    |
+----+--------------------+-------+------------+------+---------------+-----------+---------+-------+--------+----------+--------------------------+
|  1 | PRIMARY            | p1    | NULL       | ref  | ix_nearby     | ix_nearby | 2       | const | 621500 |   100.00 | Using where              |
|  2 | DEPENDENT SUBQUERY | p2    | NULL       | ref  | ix_nearby     | ix_nearby | 2       | const | 117816 |     1.23 | Using where; Using index |
+----+--------------------+-------+------------+------+---------------+-----------+---------+-------+--------+----------+--------------------------+

How to repeat:
-- // 1. Create test table
CREATE TABLE pos (
  uid char(32) COLLATE utf8mb4_general_ci DEFAULT NULL,
  x bigint(20) DEFAULT NULL,
  y bigint(20) DEFAULT NULL,
  is_target tinyint(4) DEFAULT NULL,
  KEY ix_nearby (is_target,x,y)
);

-- // 2. Insert sample data (I have tested about 1M row table)

-- // 3. Check execution plan for equal compare subquery
select * from pos p1 
where p1.is_target=0 
  and exists (select 1 from pos p2 where p2.is_target=1 and p2.x=p1.x and p2.y=p2.y);

-- // 3. Check execution plan for non-equal compare subquery
select * from pos p1 
where p1.is_target=0 
  and exists (select 1 from pos p2 where p2.is_target=1 and p2.x between p1.x-9000 and p1.x+9000 and p2.y between p1.y-9000 and p1.y+9000);

Suggested fix:
Execution plan of non-equal compare subquery also use index as equal compare subquery.
[26 Jul 2019 12:37] MySQL Verification Team
Hello Mr. Lee,

Thank you for your bug report.

However, this is not a bug.

There are two conditions in which the index can not be used.

First one is when you are using expressions, like "p2.x between p1.x-9000 and p1.x+9000" and second one is when number of rows matched is higher then 1/5 or 1/4 of the total number of rows.

Regarding the first query, what you were using is called "common column in the semi-join", for which the index can be used. This is all explained in our Reference Manual, chapter on "Optimizer".

Not a bug.
[29 Jul 2019 7:06] Seunguck Lee
Thanks Sinisa. 

>> First one is when you are using expressions, like "p2.x between p1.x-9000 and p1.x+9000" and second one is when number of rows matched is higher then 1/5 or 1/4 of the total number of rows.
You mean "-9000" and "+9000" in the condition by "using expression" ?

Regarding the first query, what you were using is called "common column in the semi-join", for which the index can be used. This is all explained in our Reference Manual, chapter on "Optimizer".
Would you please tell me the specific URL of manual ? (I have scanned manual, but I can't find).
[29 Jul 2019 12:19] MySQL Verification Team
Mr. Lee,

You are welcome.