Description:
Hi, MySQL developers. I found a missed optimization in MySQL.
When a SELECT statement uses a HAVING clause without any GROUP BY or aggregate functions, the condition is semantically equivalent to a WHERE clause. However, the optimizer does not transform the HAVING condition into a table-access filter that can exploit indexes. Instead, it performs a full table scan and applies the filter afterward, leading to orders‑of‑magnitude worse performance compared to using an equivalent WHERE.
How to repeat:
Steps to Reproduce:
DELIMITER //
CREATE PROCEDURE batch_insert_numbers()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE batch_size INT DEFAULT 1000;
WHILE i <= 1000000 DO
START TRANSACTION;
WHILE i <= 1000000 AND batch_size > 0 DO
INSERT INTO t0(c0) VALUES (i);
SET i = i + 1;
SET batch_size = batch_size - 1;
END WHILE;
COMMIT;
SET batch_size = 1000;
END WHILE;
END //
DELIMITER ;
CREATE TABLE t0(c0 INT PRIMARY KEY, c1 FLOAT);
CALL batch_insert_numbers(); -- insert into t0(c0) with values: 1-1000000
SELECT * FROM t0 WHERE t0.c0 = 1; -- result: {1|0}; time: 0.001 sec
EXPLAIN SELECT * FROM t0 WHERE t0.c0 = 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Rows fetched before execution (cost=0..0 rows=1)
SELECT * FROM t0 HAVING t0.c0 = 1; -- result: {1|0}; time: 0.124 sec
EXPLAIN SELECT * FROM t0 HAVING t0.c0 = 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t0.c0 = 0) (cost=100290 rows=998968)
-> Table scan on t0 (cost=100290 rows=998968)
Suggested fix:
If a SELECT contains a HAVING clause but no GROUP BY clause and no aggregate functions in the HAVING condition, move all HAVING predicates to the WHERE clause. This is a safe, semantically correct transformation (SQL standard permits it; the result is unchanged). Once moved, the existing index selection logic will automatically choose the optimal access method (e.g., primary key lookup).
Description: Hi, MySQL developers. I found a missed optimization in MySQL. When a SELECT statement uses a HAVING clause without any GROUP BY or aggregate functions, the condition is semantically equivalent to a WHERE clause. However, the optimizer does not transform the HAVING condition into a table-access filter that can exploit indexes. Instead, it performs a full table scan and applies the filter afterward, leading to orders‑of‑magnitude worse performance compared to using an equivalent WHERE. How to repeat: Steps to Reproduce: DELIMITER // CREATE PROCEDURE batch_insert_numbers() BEGIN DECLARE i INT DEFAULT 1; DECLARE batch_size INT DEFAULT 1000; WHILE i <= 1000000 DO START TRANSACTION; WHILE i <= 1000000 AND batch_size > 0 DO INSERT INTO t0(c0) VALUES (i); SET i = i + 1; SET batch_size = batch_size - 1; END WHILE; COMMIT; SET batch_size = 1000; END WHILE; END // DELIMITER ; CREATE TABLE t0(c0 INT PRIMARY KEY, c1 FLOAT); CALL batch_insert_numbers(); -- insert into t0(c0) with values: 1-1000000 SELECT * FROM t0 WHERE t0.c0 = 1; -- result: {1|0}; time: 0.001 sec EXPLAIN SELECT * FROM t0 WHERE t0.c0 = 1 \G *************************** 1. row *************************** EXPLAIN: -> Rows fetched before execution (cost=0..0 rows=1) SELECT * FROM t0 HAVING t0.c0 = 1; -- result: {1|0}; time: 0.124 sec EXPLAIN SELECT * FROM t0 HAVING t0.c0 = 1 \G *************************** 1. row *************************** EXPLAIN: -> Filter: (t0.c0 = 0) (cost=100290 rows=998968) -> Table scan on t0 (cost=100290 rows=998968) Suggested fix: If a SELECT contains a HAVING clause but no GROUP BY clause and no aggregate functions in the HAVING condition, move all HAVING predicates to the WHERE clause. This is a safe, semantically correct transformation (SQL standard permits it; the result is unchanged). Once moved, the existing index selection logic will automatically choose the optimal access method (e.g., primary key lookup).