Bug #120683 Optimizer overestimates cost and rows for LIMIT 1 without ORDER BY on a primary key index
Submitted: 14 Jun 15:21
Reporter: jinhui lai Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:9.7.0 OS:Any
Assigned to: CPU Architecture:Any

[14 Jun 15:21] jinhui lai
Description:
Hi, MySQL developers, thanks for reading this report. I found a bug in the estimator.
When executing EXPLAIN SELECT * FROM t0 LIMIT 1 on a table with a primary key index, the optimizer reports a very high estimated cost (100277) and an estimated row count of 999000 (almost the full table size), even though the query only needs to return one row. In contrast, the same query with an explicit ORDER BY t0.c0 ASC LIMIT 1 produces a correct estimate (rows=1, cost=377e-6).

This indicates that the optimizer fails to recognize that a LIMIT 1 without an ORDER BY can stop scanning after fetching the first physical row (which is essentially the first record in the primary key order due to clustered index storage). The overestimation may lead to suboptimal query plans in more complex scenarios.

How to repeat:
DELIMITER //
CREATE PROCEDURE batch_insert_numbers()
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE batch_size INT DEFAULT 1000;
    
    WHILE i <= 1000000 DO
        START TRANSACTION;
        WHILE i <= 1000000 AND batch_size > 0 DO
            INSERT INTO t0(c0) VALUES (i);
            SET i = i + 1;
            SET batch_size = batch_size - 1;
        END WHILE;
        COMMIT;
        SET batch_size = 1000;
    END WHILE;
END //
DELIMITER ;

CREATE TABLE t0(c0 INT PRIMARY KEY);
CALL batch_insert_numbers();

EXPLAIN SELECT * FROM t0 LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=100277 rows=1)
    -> Covering index scan on t0 using PRIMARY  (cost=100277 rows=999000)

1 row in set (0.001 sec)

EXPLAIN SELECT * FROM t0 ORDER BY t0.c0 ASC LIMIT 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1 row(s)  (cost=377e-6 rows=1)
    -> Covering index scan on t0 using PRIMARY  (cost=377e-6 rows=1)

Both queries should show only 1 estimated row, because reading a single row from the primary key index should be cheap, regardless of whether ORDER BY is explicitly written.