Bug #113537 | False match of multibyte chars when checking existence of a control character | ||
---|---|---|---|
Submitted: | 2 Jan 2024 16:00 | Modified: | 4 Jan 2024 7:57 |
Reporter: | Michael Olลกavskรฝ | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 8.0.32 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | charset, nul byte, UTF-8 |
[2 Jan 2024 16:00]
Michael Olลกavskรฝ
[3 Jan 2024 13:31]
Bernt Marius Johnsen
1: CHAR('๐2' USING utf8) does not give the string '๐2' since CHAR() expects an integer argument. (In this case it is equivalent with CHAR(0 USING utf8). See the documentation of CHAR(): https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char 2: POSITION/LOCATE does not return TRUE/FALSE but a position. See https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_locate But: There is a bug here. A simpler example: mysql> select LOCATE(x'00','abcdef'); +------------------------+ | LOCATE(x'00','abcdef') | +------------------------+ | 0 | +------------------------+ 1 row in set (0.00 sec) mysql> select LOCATE(x'00','abcdรฉf'); +-------------------------+ | LOCATE(x'00','abcdรฉf') | +-------------------------+ | 5 | +-------------------------+ 1 row in set (0.00 sec) The first select is correct, the second is not.
[4 Jan 2024 10:44]
Bernt Marius Johnsen
If one is only looking for values <= 7f, a workaround is to convert the string to binary. Bytes with value <= 7f is never part of a multibyte UTF-8 encoding. E.g: mysql> select LOCATE(x'00',convert('abcdรฉf' using binary)); +-----------------------------------------------+ | LOCATE(x'00',convert('abcdรฉf' using binary)) | +-----------------------------------------------+ | 0 | +-----------------------------------------------+ 1 row in set (0,01 sec)