Bug #118264 Performance bug in SEMI JOIN with empty tables
Submitted: 25 May 15:04
Reporter: jinhui lai Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:9.3.0 OS:Ubuntu
Assigned to: CPU Architecture:Any

[25 May 15:04] jinhui lai
Description:
Hi, MySQL Developers,

Please consider such two queries(SEMI/ANTI JOIN): 

(1) SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);

(2) SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);

The ANTI JOIN query return empty set quickly(0.001), but the SEMI JOIN consumes much time(11 min 18.157 sec).

Obviously, the above tow queries always returns an empty set. I think it should return an empty set quickly. However, the SEMI JOIN query waste much time. This leads to unnecessary memory consumption and query cancellation, even though the result should be deterministically empty.

I think this is a common case in actual production scenarios. It's important to clarify that users might not intentionally perform SEMI JOIN operation on empty tables. Rather, they may be unaware that a table is empty. For example, when data has been deleted by another user or process. If MySQL can address this performance bug, it would significantly improve query efficiency and save users valuable time in such cases.

Thank you for your valuable time, looking forward to your reply!

Best regard,
Jinhui Lai

How to repeat:
mysql> use mysql;
mysql> CREATE TABLE empty_table AS SELECT * FROM time_zone_transition CROSS JOIN help_topic LIMIT 0;
mysql> CREATE TABLE not_empty_table AS SELECT * FROM time_zone_transition CROSS JOIN help_topic;
Query OK, 590075 rows affected (57.880 sec)
Records: 590075  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.765 sec)

mysql> SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.001 sec)

mysql> EXPLAIN SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id) \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: n
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 77523833
     filtered: 10.00
        Extra: Using where; FirstMatch(e); Using join buffer (hash join)
2 rows in set, 2 warnings (0.001 sec)

mysql> EXPLAIN SELECT * FROM empty_table AS e WHERE NOT EXISTS (SELECT 1 FROM not_empty_table AS n WHERE n.Time_zone_id = e.Time_zone_id) \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: n
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 554537
     filtered: 100.00
        Extra: Using where; Not exists; Using join buffer (hash join)
2 rows in set, 2 warnings (0.001 sec)

-- the following cases is more obvious.
mysql> CREATE TABLE not_empty_table2 AS SELECT * FROM time_zone_transition CROSS JOIN help_topic;
Query OK, 82610500 rows affected (3 hours 33 min 48.432 sec)
Records: 82610500  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (11 min 18.157 sec)

mysql> SELECT * FROM empty_table AS e WHERE EXISTS (SELECT 1 FROM not_empty_table2 AS n WHERE n.Time_zone_id = e.Time_zone_id);
Empty set (0.001 sec)