Description:
Hi, MySQL Developers,
Please consider such two queries(SEMI/ANTI JOIN):
(1) SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
(2) SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
The ANTI JOIN query return empty set quickly(0.001), but the SEMI JOIN consumes much time(11 min 18.157 sec).
Obviously, the above tow queries always returns an empty set. I think it should return an empty set quickly. However, the SEMI JOIN query waste much time. This leads to unnecessary memory consumption and query cancellation, even though the result should be deterministically empty.
I think this is a common case in actual production scenarios. It's important to clarify that users might not intentionally perform SEMI JOIN operation on empty tables. Rather, they may be unaware that a table is empty. For example, when data has been deleted by another user or process. If MySQL can address this performance bug, it would significantly improve query efficiency and save users valuable time in such cases.
Thank you for your valuable time, looking forward to your reply!
Best regard,
Jinhui Lai
How to repeat:
mysql> use mysql;
mysql> CREATE TABLE empty_table AS SELECT * FROM time_zone_transition CROSS JOIN help_topic LIMIT 0;
mysql> CREATE TABLE not_empty_table AS SELECT * FROM time_zone_transition CROSS JOIN help_topic;
Query OK, 590075 rows affected (57.880 sec)
Records: 590075 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.765 sec)
mysql> SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.001 sec)
mysql> EXPLAIN SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: n
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 77523833
filtered: 10.00
Extra: Using where; FirstMatch(e); Using join buffer (hash join)
2 rows in set, 2 warnings (0.001 sec)
mysql> EXPLAIN SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: n
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 554537
filtered: 100.00
Extra: Using where; Not exists; Using join buffer (hash join)
2 rows in set, 2 warnings (0.001 sec)
-- the following cases is more obvious.
mysql> CREATE TABLE not_empty_table2 AS SELECT * FROM time_zone_transition CROSS JOIN help_topic;
Query OK, 82610500 rows affected (3 hours 33 min 48.432 sec)
Records: 82610500 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (11 min 18.157 sec)
mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.001 sec)