Bug #99994 Index range scan is chosen where table scan takes 40% less time
Submitted: 26 Jun 2020 7:57 Modified: 26 Jun 2020 12:10
Reporter: Øystein Grøvlen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[26 Jun 2020 7:57] Øystein Grøvlen
Description:
Using the world database, we get the following query plan:

mysql> explain select avg(population) from city where countrycode IN ('CHN', 'IND');
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | city  | NULL       | range | CountryCode   | CountryCode | 3       | NULL |  704 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

However, if we compare to table scan we see that table scan takes 40% less time.
Here is the result as recorded by performance_schema:

mysql>  select digest_text, count_star, avg_timer_wait/1000000000.0 "time (ms)", sum_rows_examined/count_star from  performance_schema.events_statements_summary_by_digest order by last_seen desc limit 2;
+-------------------------------------------------------------------------------------------------+------------+-----------+------------------------------+
| digest_text                                                                                     | count_star | time (ms) | sum_rows_examined/count_star |
+-------------------------------------------------------------------------------------------------+------------+-----------+------------------------------+
| SELECT /*+ NO_INDEX ( `city` ) */ AVG ( `population` ) FROM `city` WHERE `countrycode` IN (...) |          5 |    1.2254 |                    4079.0000 |
| SELECT AVG ( `population` ) FROM `city` WHERE `countrycode` IN (...)                            |          5 |    2.0045 |                     952.0000 |
+-------------------------------------------------------------------------------------------------+------------+-----------+------------------------------+
2 rows in set (0.00 sec)

How to repeat:
Use the world database:

truncate performance_schema.events_statements_summary_by_digest;
# Run queries below multiple times
select avg(population) from city where countrycode IN ('CHN', 'IND', 'JPN');
select /*+ NO_INDEX(city) */ avg(population) from city where countrycode IN ('CHN', 'IND', 'JPN');

select digest_text, count_star, avg_timer_wait/1000000000.0 "time (ms)", sum_rows_examined/count_star from  performance_schema.events_statements_summary_by_digest order by last_seen desc limit 2;

Suggested fix:
Adjust cost model to in favor of table scan
[26 Jun 2020 12:10] MySQL Verification Team
Hi Mr. Øystein Grøvlen,

Thank you for your bug report.

I have managed to repeat this optimiser deficiency.

Verified as reported.