Bug #115017 Wrong row estimate for range scan
Submitted: 15 May 2024 14:44 Modified: 15 May 2024 15:45
Reporter: Sveta Smirnova (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[15 May 2024 14:44] Sveta Smirnova
Description:
EXPLAIN in MySQL 8.0+ reports 1 row while SELECT actually returns 20. Status variable Handler_read_next is also 20 after the query execution.

Version 5.7 works as expected.

How to repeat:
Run attached test case, or:

CREATE TABLE t (
id         int PRIMARY KEY,
updated_at datetime NOT NULL,
foo        varchar(16),
UNIQUE INDEX (updated_at)
);
-- Insert data
INSERT INTO issue_375.t VALUES
   (1, '2009-09-03 14:15:00', 'a'),
   (2, '2009-09-03 14:15:10', 'b'),
   (3, '2009-09-03 14:15:20', 'c'),
   (4, '2009-09-03 14:15:40', 'd'),
...
  (97, '2009-09-06 15:01:20', 'cs'),
   (98, '2009-09-06 15:01:21', 'ct'),
   (99, '2009-09-06 15:01:22', 'cu'),
   (100, '2009-09-06 15:01:23', 'cv');
mysql> EXPLAIN SELECT * FROM t FORCE INDEX(PRIMARY) WHERE id >= 21 AND id <= 40\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0,00 sec)

mysql> FLUSH STATUS;
Query OK, 0 rows affected (0,00 sec)

mysql> SELECT * FROM t FORCE INDEX(PRIMARY) WHERE id >= 21 AND id <= 40;
+----+---------------------+------+
| id | updated_at          | foo  |
+----+---------------------+------+
| 21 | 2009-09-06 15:00:04 | u    |
| 22 | 2009-09-06 15:00:05 | v    |
| 23 | 2009-09-06 15:00:06 | w    |
| 24 | 2009-09-06 15:00:07 | x    |
| 25 | 2009-09-06 15:00:08 | y    |
| 26 | 2009-09-06 15:00:09 | z    |
| 27 | 2009-09-06 15:00:10 | aa   |
| 28 | 2009-09-06 15:00:11 | ab   |
| 29 | 2009-09-06 15:00:12 | ac   |
| 30 | 2009-09-06 15:00:13 | ad   |
| 31 | 2009-09-06 15:00:14 | ae   |
| 32 | 2009-09-06 15:00:15 | af   |
| 33 | 2009-09-06 15:00:16 | ag   |
| 34 | 2009-09-06 15:00:17 | ah   |
| 35 | 2009-09-06 15:00:18 | ai   |
| 36 | 2009-09-06 15:00:19 | aj   |
| 37 | 2009-09-06 15:00:20 | ak   |
| 38 | 2009-09-06 15:00:21 | al   |
| 39 | 2009-09-06 15:00:22 | am   |
| 40 | 2009-09-06 15:00:23 | an   |
+----+---------------------+------+
20 rows in set (0,00 sec)

mysql> SHOW STATUS LIKE 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 20    |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0,00 sec)

Suggested fix:
Return the correct number. For such a small table, the statistics should be correct.
[15 May 2024 14:44] Sveta Smirnova
Test case for MTR

Attachment: PT-2154.test (application/octet-stream, text), 4.10 KiB.

[15 May 2024 15:09] Sveta Smirnova
When tested with a larger number of rows and ranges, the estimate is always equal to 1. This is quite a serious optimizer bug because it can cause huge slowdowns for large tables.
[15 May 2024 15:45] MySQL Verification Team
Hi Mrs. Smirnova,

Thank you for your bug report.

We have managed to repeat it fully:

----------------------------------

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t	NULL	range	PRIMARY	PRIMARY	4	NULL	1	100.00	Using where
id	updated_at	foo
21	2009-09-06 15:00:04	u
22	2009-09-06 15:00:05	v
23	2009-09-06 15:00:06	w
24	2009-09-06 15:00:07	x
25	2009-09-06 15:00:08	y
26	2009-09-06 15:00:09	z
27	2009-09-06 15:00:10	aa
28	2009-09-06 15:00:11	ab
29	2009-09-06 15:00:12	ac
30	2009-09-06 15:00:13	ad
31	2009-09-06 15:00:14	ae
32	2009-09-06 15:00:15	af
33	2009-09-06 15:00:16	ag
34	2009-09-06 15:00:17	ah
35	2009-09-06 15:00:18	ai
36	2009-09-06 15:00:19	aj
37	2009-09-06 15:00:20	ak
38	2009-09-06 15:00:21	al
39	2009-09-06 15:00:22	am
40	2009-09-06 15:00:23	an
Variable_name	Value
Handler_commit	1
Handler_delete	0
Handler_discover	0
Handler_external_lock	2
Handler_mrr_init	0
Handler_prepare	0
Handler_read_first	0
Handler_read_key	1
Handler_read_last	0
Handler_read_next	20
Handler_read_prev	0
Handler_read_rnd	0
Handler_read_rnd_next	0
Handler_rollback	0
Handler_savepoint	0
Handler_savepoint_rollback	0
Handler_update	0
Handler_write	0
----------------------------------

We agree that this is a serious bug in our optimizer, affecting version 8.0 and higher.

Thank you.
[1 Jul 2024 8:46] super tom
8.0.25 can't repeat
[1 Jul 2024 11:21] MySQL Verification Team
Thank you,

That means that a bug was introduced after the release 8.0.25.