Bug #88127 Index not used for 'order by' query with utf8mb4 character set
Submitted: 17 Oct 2017 21:45 Modified: 19 Oct 2017 10:13
Reporter: John Morkel Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.19, 5.7.20 OS:Any
Assigned to: CPU Architecture:Any

[17 Oct 2017 21:45] John Morkel
Description:
A simple query on an indexed column encoded with the utf8mb4 character set with an "order by" clause, such as the query below, unexpectedly results in a filesort:

mysql> explain select * from test force index (B) where B = 'foo' order by B asc;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra                                    |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | B             | B    | 82      | const |    1 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

This is only observed when the connection character set is also utf8mb4. If the connection character set is set to utf8 then a sort isn't performed. Forcing the index doesn't change the plan.

I would expect that a filesort would not be required when the character sets of the index and connection are the same.

Query optimizer trace showing index_provides_order is false is attached.

How to repeat:
Step 1: Create the table

CREATE TABLE `test` (
  `A` varchar(10) COLLATE utf8mb4_bin NOT NULL,
  `B` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  PRIMARY KEY (`A`),
  KEY `B` (`B`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Step 2: Query with character_set_connection set to 'utf8mb4' and see filesort being used;

mysql> show variables like 'character\_set\_%';
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8    |
+--------------------------+---------+
7 rows in set (0.00 sec)

mysql> explain select * from test where B = 'foo' order by B asc;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra                                    |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | B             | B    | 82      | const |    1 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Forcing the index doesn't help either:

mysql> explain select * from test force index (B) where B = 'foo' order by B asc;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra                                    |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | B             | B    | 82      | const |    1 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Step 3: Query with character_set_connection set to 'utf8' and see that filesort isn't used:

mysql> set character_set_connection='utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from test where (B = 'foo') order by B asc;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | B             | B    | 82      | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
[17 Oct 2017 21:46] John Morkel
Query optimizer trace

Attachment: query-optimizer-trace.txt (text/plain), 7.21 KiB.

[19 Oct 2017 10:13] MySQL Verification Team
Hello John,

Thank you for the report and feedback.

Thanks,
Umesh
[23 Nov 2021 7:21] WANG GUANGYOU
we have the same problem in production. It may be caused by collation mismatch
In our case, when we SET NAMES utf8mb4; It choose wrong index;
If we SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci; It choose the same one

Our table
Create Table: CREATE TABLE `xx_task` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'id',
  `xxid` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'xxid',
  `name` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),

) ENGINE=InnoDB AUTO_INCREMENT=37676690 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='aaa'
[23 Nov 2021 7:22] WANG GUANGYOU
sorry for the typo. 

we have the same problem in production. It may be caused by collation mismatch
In our case, when we SET NAMES utf8mb4; It choose wrong index;
If we SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci; It choose the right one

Our table
Create Table: CREATE TABLE `xx_task` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'id',
  `xxid` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'xxid',
  `name` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),

) ENGINE=InnoDB AUTO_INCREMENT=37676690 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='aaa'