Description:
when use utf8mb4_0900_ai_ci, a simple string may be truncated.
txsql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb') |
+-------------------+
| aaaⱦ |
+-------------------+
1 row in set (2 min 29.08 sec)
mysql> show variables like "%colla%";
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.028 sec)
utf8mb4_general_ci not truncated bug cann't translate to lower character.
mysql> set names 'utf8mb4' collate 'utf8mb4_general_ci';
Query OK, 0 rows affected (0.002 sec)
mysql> show variables like "%colla%";
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.005 sec)
mysql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb') |
+-------------------+
| aaaȾbbb |
+-------------------+
1 row in set (0.001 sec)
How to repeat:
See Description.
Suggested fix:
In function my_casedn_utf8mb4, src and dst point to same memory, but my_tolower_utf8mb4 may change the sizeof character.
gdb :
```
(gdb) n
(gdb) p src
$9 = 0x7f66e2fa8ce3 "��bbb"
(gdb) p dst
$10 = 0x7f66e2fa8ce3 "��bbb"
(gdb) n
(gdb) p srcres
$11 = 2
(gdb) p dstres
$12 = 3
(gdb)
```
we can see after my_tolower_utf8mb4, dstres is 3 but srcres is 2.
after use gdb hack to change srcres to 3, we got the expect result
```
gdb) p srcres
$14 = 3
(gdb) p dstres
$15 = 3
```
```
txsql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb') |
+-------------------+
| aaaⱦbb |
+-------------------+
```
Description: when use utf8mb4_0900_ai_ci, a simple string may be truncated. txsql> select lower('aaaȾbbb'); +-------------------+ | lower('aaaȾbbb') | +-------------------+ | aaaⱦ | +-------------------+ 1 row in set (2 min 29.08 sec) mysql> show variables like "%colla%"; +-------------------------------+--------------------+ | Variable_name | Value | +-------------------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | | default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci | +-------------------------------+--------------------+ 4 rows in set (0.028 sec) utf8mb4_general_ci not truncated bug cann't translate to lower character. mysql> set names 'utf8mb4' collate 'utf8mb4_general_ci'; Query OK, 0 rows affected (0.002 sec) mysql> show variables like "%colla%"; +-------------------------------+--------------------+ | Variable_name | Value | +-------------------------------+--------------------+ | collation_connection | utf8mb4_general_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | | default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci | +-------------------------------+--------------------+ 4 rows in set (0.005 sec) mysql> select lower('aaaȾbbb'); +-------------------+ | lower('aaaȾbbb') | +-------------------+ | aaaȾbbb | +-------------------+ 1 row in set (0.001 sec) How to repeat: See Description. Suggested fix: In function my_casedn_utf8mb4, src and dst point to same memory, but my_tolower_utf8mb4 may change the sizeof character. gdb : ``` (gdb) n (gdb) p src $9 = 0x7f66e2fa8ce3 "��bbb" (gdb) p dst $10 = 0x7f66e2fa8ce3 "��bbb" (gdb) n (gdb) p srcres $11 = 2 (gdb) p dstres $12 = 3 (gdb) ``` we can see after my_tolower_utf8mb4, dstres is 3 but srcres is 2. after use gdb hack to change srcres to 3, we got the expect result ``` gdb) p srcres $14 = 3 (gdb) p dstres $15 = 3 ``` ``` txsql> select lower('aaaȾbbb'); +-------------------+ | lower('aaaȾbbb') | +-------------------+ | aaaⱦbb | +-------------------+ ```