Bug #96958 MySQL optimizer wrongly chooses to use a bigger and inefficient index for query
Submitted: 20 Sep 2019 21:42 Modified: 22 Sep 2019 14:23
Reporter: Leonardo Fernandes Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[20 Sep 2019 21:42] Leonardo Fernandes
Description:
MySQL optimizer wrongly chooses to use a larger index for query even when a smaller and more efficient is available.

Table:
master [localhost:26022] {msandbox} (test) > show create table test \G
*************************** 1. row ***************************
       Table: test
Create Table: CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col1` int(11) NOT NULL,
  `col2` bigint(20) NOT NULL,
  `created_at` timestamp NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_test_on_col2` (`col2`),
  KEY `index_test_on_created_at` (`created_at`),
  KEY `index_test_on_col2_col1` (`col2`,`col1`),
  KEY `test_extended_index` (`col1`,`id`),
  KEY `index_test_on_col1` (`col1`),
  KEY `index_test_on_col1_col2` (`col1`,`col2`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci STATS_SAMPLE_PAGES=65535
1 row in set (0.00 sec)

Inserted 2 million random rows:
master [localhost:26022] {msandbox} (test) > select count(*) from test; 
+----------+
| count(*) |
+----------+
|  2000000 |
+----------+
1 row in set (0.03 sec)

Query:
SELECT col1 FROM test WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;

Explain:
master [localhost:26022] {msandbox} (test) > explain SELECT col1 FROM test WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys                                                                                  | key                     | key_len | ref   | rows  | filtered | Extra                                    |
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | PRIMARY,index_test_on_col2_col1,test_extended_index,index_test_on_col1,index_test_on_col1_col2 | index_test_on_col1_col2 | 4       | const | 77572 |    50.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Notice that the optimizer chooses index_test_on_col1_col2 instead of index_test_on_col1, even if col2 is not used by the query and only the col1 part of the index is used (key_len=4). 

When forcing the index index_test_on_col1, the plan changes from type=ref to type=range, it doesn't use the filesort anymore, the query is around 10x faster, and both the field col1 and id from the index are used (key_len=8 vs key_len=4 when using index_test_on_col1_col2).

master [localhost:26022] {msandbox} (test) > explain SELECT col1 FROM test USE INDEX (index_test_on_col1) WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys      | key                | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_test_on_col1 | index_test_on_col1 | 8       | NULL | 51126 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

When looking at the optimizer for the query without the hint, we can see:

Evaluating the index index_test_on_col1 for range: 
"analyzing_range_alternatives": {
	"range_scan_alternatives": [
	[...]
		{
			"index": "index_test_on_col1",
			"ranges": [
			"50 <= col1 <= 50 AND 666666 < id"
			],
			"index_dives_for_eq_ranges": true,
			"rowid_ordered": true,
			"using_mrr": false,
			"index_only": true,
			"rows": 51126,
			"cost": 5163.5,
			"chosen": true
		}

We can see that index_test_on_col1 is chosen and it costs 5163.5.

Now, when evaluating the indexes for access_type=ref, the optimizer discards index_test_on_col1 because it considers the range access better, as it uses more keyparts ("range_uses_more_keyparts"), but it evaluates index_test_on_col1_col2 and chooses it, even though the cost is bigger (7909.4).
{
  "access_type": "ref",
  "index": "index_test_on_col1",
  "chosen": false,
  "cause": "range_uses_more_keyparts"
},
{
  "access_type": "ref",
  "index": "index_test_on_col1_col2",
  "rows": 77572,
  "cost": 7909.4,
  "chosen": true
}

Attaching the optimizer trace output and a dump of the table. 

How to repeat:
Create the table:
CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col1` int(11) NOT NULL,
  `col2` bigint(20) NOT NULL,
  `created_at` timestamp NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_test_on_col2` (`col2`),
  KEY `index_test_on_created_at` (`created_at`),
  KEY `index_test_on_col2_col1` (`col2`,`col1`),
  KEY `test_extended_index` (`col1`,`id`),
  KEY `index_test_on_col1` (`col1`),
  KEY `index_test_on_col1_col2` (`col1`,`col2`)
) ENGINE=InnoDB

Import the dump attached. 

Run the query:
explain SELECT col1 FROM test WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;

compare with:
explain SELECT col1 FROM test USE INDEX (index_test_on_col1) WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
[20 Sep 2019 21:44] Leonardo Fernandes
optimizer_trace

Attachment: optimizer_trace (application/octet-stream, text), 15.58 KiB.

[21 Sep 2019 17:18] MySQL Verification Team
Attaching the optimizer trace output and a dump of the table. Trace Ok, dump ?.
[22 Sep 2019 1:26] MySQL Verification Team
d:\dbs>d:\dbs\8.0\bin\mysql -uroot --port=3580 -p --prompt="mysql 8.0 > " --default-character-set=utf8mb4
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 36
Server version: 8.0.19 Source distribution BUILT: 2019-SEP-20

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 8.0 > USE test
Database changed
mysql 8.0 > select count(*) from test;
+----------+
| count(*) |
+----------+
|  2000000 |
+----------+
1 row in set (1.21 sec)

mysql 8.0 > explain SELECT col1 FROM test WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys                                                                                  | key                     | key_len | ref   | rows  | filtered | Extra                                    |
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | PRIMARY,index_test_on_col2_col1,test_extended_index,index_test_on_col1,index_test_on_col1_col2 | index_test_on_col1_col2 | 4       | const | 68608 |    50.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+------+------------------------------------------------------------------------------------------------+-------------------------+---------+-------+-------+----------+------------------------------------------+
1 row in set, 1 warning (0.02 sec)

mysql 8.0 > explain SELECT col1 FROM test USE INDEX (index_test_on_col1) WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys      | key                | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_test_on_col1 | index_test_on_col1 | 8       | NULL | 35028 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.50 sec)
[22 Sep 2019 3:21] MySQL Verification Team
Which 5.7 version you tested?; 

d:\dbs>d:\dbs\5.7\bin\mysql -uroot --port=3570 -p  --prompt="mysql 5.7 > "
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.29-log Source distribution BUILT: 2019-SEP-20

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > use test
Database changed
mysql 5.7 > explain SELECT col1 FROM test WHERE col1 = 50 AND id > 666666 ORDER BY id ASC LIMIT 1000;
+----+-------------+-------+------------+-------+------------------------------------------------------------------------+--------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                                                          | key                | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+------------------------------------------------------------------------+--------------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | PRIMARY,test_extended_index,index_test_on_col1,index_test_on_col1_col2 | index_test_on_col1 | 8       | NULL | 35028 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+------------------------------------------------------------------------+--------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)

mysql 5.7 >
[22 Sep 2019 14:23] MySQL Verification Team
Thank you for the bug report. With most recent source only repeatable with version 8.0, so removing 5.7 from version reported.
[30 Apr 9:09] WANG GUANGYOU
Do we must insert 2 million rows to reproduce this?
Even one rows, it should use the index_test_on_col1. However, I populate 22000 rows into the table. It can not use the best index even in 5.7.26-29-log.

mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22011
Server version: 5.7.26-29-log Source distribution

Copyright (c) 2009-2023 Percona LLC and/or its affiliates
Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> explain  SELECT col1 FROM test  WHERE col1 = 50 AND id > 10 ORDER BY id ASC LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
   partitions: NULL
         type: ref
possible_keys: PRIMARY,test_extended_index,index_test_on_col1,index_test_on_col1_col2
          key: index_test_on_col1_col2
      key_len: 4
          ref: const
         rows: 20
     filtered: 50.00
        Extra: Using where; Using index; Using filesort
1 row in set, 1 warning (0.00 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|    22000 |
+----------+
1 row in set (0.00 sec)