Bug #104534 Pattern matching using LIKE not working with UFT8MB4_BIN collation
Submitted: 4 Aug 2021 13:11 Modified: 5 Aug 2021 13:55
Reporter: Christoph Mayr Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0, 8.0.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: LIKE statement utf8mb4 index

[4 Aug 2021 13:11] Christoph Mayr
Description:
If a varchar column in a table using collation UTF8MB4_BIN stores text with 4 byte characters and an index is defined for this column, then the LIKE operator in a select statement does not work. 
The issue is caused by the index, because if dropping the index, the LIKE works, if re-created the LIKE does not match again. 

How to repeat:
Can be reproduced easily using following script. The text contains mostly chinese supplementary characters in 4 bytes. 

create table UTF8MB4_test( text_col1 varchar(255), text_col2 varchar(255)) 
    default CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
create index text_col2_index on UTF8MB4_test(text_col2);

insert into UTF8MB4_test values('Some text','Some text');
insert into UTF8MB4_test values('Some other text','Some other text');
insert into UTF8MB4_test values('この𠝹𠱓𠱸この𠲖𠳏𠳕𠴕𠵼𠵿𠸎𠸏𠹷', 'この𠝹𠱓𠱸この𠲖𠳏𠳕𠴕𠵼𠵿𠸎𠸏𠹷');
insert into UTF8MB4_test values('この𠝹𠱓𠱸この𠲖', 'この𠝹𠱓𠱸この𠲖');

Now try select on column 1, which has no index
select * from UTF8MB4_test where text_col1 like 'この𠝹%'
=> returns 2 rows

If used with LIKE on column 2, which has an index:
select * from UTF8MB4_test where text_col2 like 'この𠝹%'
=> no data returned

Dropping the index
drop index text_col2_index on UTF8MB4_test;
=> the same select returns the correct 2 rows
[5 Aug 2021 13:55] MySQL Verification Team
Hello Christoph Mayr,

Thank you for the report and test case.

regards,
Umesh
[12 Aug 2021 4:15] Takuji Shimokawa
I've encountered a similar problem.

In my case, you don't need multi-byte characters to reproduce the problem.

```
CREATE TABLE table1(
  id INT PRIMARY KEY AUTO_INCREMENT,
  text1 TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
);

CREATE INDEX idx_text1 ON table1 (text1(100));

INSERT INTO table1 (text1) VALUES ('a\tb');
INSERT INTO table1 (text1) VALUES ('ab');

SELECT * FROM table1 WHERE text1 LIKE 'a%';
```

I expected that the both records will be found, but this results in:

```
+----+-------+
| id | text1 |
+----+-------+
|  2 | ab    |
+----+-------+
1 row in set (0.00 sec)
```

This happened for MySQL 8.0.26 and 5.6.40.

By making any one of the changes below, the same SELECT query returned the both records:
- Removing the index on the text1 column
- Changing the collation to utf8mb4_generic_ci or utf8mb4_unicode_ci
Only for the version 5.6.40,
- Changing the column type to VARCHAR
[11 Oct 2023 6:49] Tsubasa Tanaka
8.0.34 and 8.1.0 are affected too.