Bug #99997 Range estimates are usually off by a factor of 2 for large ranges
Submitted: 26 Jun 2020 7:58 Modified: 26 Jun 2020 12:23
Reporter: Øystein Grøvlen Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.20 OS:Any
Assigned to: CPU Architecture:Any

[26 Jun 2020 7:58] Øystein Grøvlen
Description:
While small ranges have very accurate, larger ranges are usually of by a factor of 2.  Using an DBT-3 database:

mysql> select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-17';
+----------+
| count(*) |
+----------+
|     6440 |
+----------+
1 row in set (0.01 sec)

mysql> explain select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-17';
+----+-------------+--------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+
| id | select_type | table  | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra                    |
+----+-------------+--------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+
|  1 | SIMPLE      | orders | NULL       | ref  | i_o_orderdate | i_o_orderdate | 4       | const | 6440 |   100.00 | Using where; Using index |
+----+-------------+--------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-18';
+----------+
| count(*) |
+----------+
|    12677 |
+----------+
1 row in set (0.00 sec)

mysql> explain select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-18';
+----+-------------+--------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+
| id | select_type | table  | partitions | type  | possible_keys | key           | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+--------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | orders | NULL       | range | i_o_orderdate | i_o_orderdate | 4       | NULL | 22946 |   100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

We see that while the estimate for 1 day is exact, the estimate for 2 days are almost twice as big as the reality.

How to repeat:
select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-17';
explain select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-17';
select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-18';
explain select count(*) from orders where o_orderdate between '1994-06-17' and '1994-06-18';

Suggested fix:
In btr_estimate_n_rows_in_range_low(), estimate is multiplied by 2 if range is large.  Stop that! (Or find a smaller more appropriate factor to use)
[26 Jun 2020 12:23] MySQL Verification Team
Hi Mr. Grovlen,

Thank you for your bug report.

This bug is now a duplicate of the following bug:

https://bugs.mysql.com/bug.php?id=73386