| Bug #104534 | Pattern matching using LIKE not working with UFT8MB4_BIN collation | ||
|---|---|---|---|
| Submitted: | 4 Aug 2021 13:11 | Modified: | 5 Aug 2021 13:55 |
| Reporter: | Christoph Mayr | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 8.0, 8.0.26 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | LIKE statement utf8mb4 index | ||
[5 Aug 2021 13:55]
MySQL Verification Team
Hello Christoph Mayr, Thank you for the report and test case. regards, Umesh
[12 Aug 2021 4:15]
Takuji Shimokawa
I've encountered a similar problem.
In my case, you don't need multi-byte characters to reproduce the problem.
```
CREATE TABLE table1(
id INT PRIMARY KEY AUTO_INCREMENT,
text1 TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
);
CREATE INDEX idx_text1 ON table1 (text1(100));
INSERT INTO table1 (text1) VALUES ('a\tb');
INSERT INTO table1 (text1) VALUES ('ab');
SELECT * FROM table1 WHERE text1 LIKE 'a%';
```
I expected that the both records will be found, but this results in:
```
+----+-------+
| id | text1 |
+----+-------+
| 2 | ab |
+----+-------+
1 row in set (0.00 sec)
```
This happened for MySQL 8.0.26 and 5.6.40.
By making any one of the changes below, the same SELECT query returned the both records:
- Removing the index on the text1 column
- Changing the collation to utf8mb4_generic_ci or utf8mb4_unicode_ci
Only for the version 5.6.40,
- Changing the column type to VARCHAR
[11 Oct 2023 6:49]
Tsubasa Tanaka
8.0.34 and 8.1.0 are affected too.

Description: If a varchar column in a table using collation UTF8MB4_BIN stores text with 4 byte characters and an index is defined for this column, then the LIKE operator in a select statement does not work. The issue is caused by the index, because if dropping the index, the LIKE works, if re-created the LIKE does not match again. How to repeat: Can be reproduced easily using following script. The text contains mostly chinese supplementary characters in 4 bytes. create table UTF8MB4_test( text_col1 varchar(255), text_col2 varchar(255)) default CHARSET=utf8mb4 COLLATE=utf8mb4_bin; create index text_col2_index on UTF8MB4_test(text_col2); insert into UTF8MB4_test values('Some text','Some text'); insert into UTF8MB4_test values('Some other text','Some other text'); insert into UTF8MB4_test values('この𠝹𠱓𠱸この𠲖𠳏𠳕𠴕𠵼𠵿𠸎𠸏𠹷', 'この𠝹𠱓𠱸この𠲖𠳏𠳕𠴕𠵼𠵿𠸎𠸏𠹷'); insert into UTF8MB4_test values('この𠝹𠱓𠱸この𠲖', 'この𠝹𠱓𠱸この𠲖'); Now try select on column 1, which has no index select * from UTF8MB4_test where text_col1 like 'この𠝹%' => returns 2 rows If used with LIKE on column 2, which has an index: select * from UTF8MB4_test where text_col2 like 'この𠝹%' => no data returned Dropping the index drop index text_col2_index on UTF8MB4_test; => the same select returns the correct 2 rows