Bug #119463 some charactor may be truncated use utf8mb4_general_ci collation_connection
Submitted: 26 Nov 6:23
Reporter: zhang xiaojian Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:9.5.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: utf8mb4_0900_ai_ci

[26 Nov 6:23] zhang xiaojian
Description:
when use utf8mb4_0900_ai_ci, a simple string may be truncated.

txsql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb')  |
+-------------------+
| aaaⱦ              |
+-------------------+
1 row in set (2 min 29.08 sec)

mysql> show variables like "%colla%";
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.028 sec)

utf8mb4_general_ci not truncated bug cann't translate to lower character.

mysql> set names 'utf8mb4' collate 'utf8mb4_general_ci';
Query OK, 0 rows affected (0.002 sec)

mysql> show variables like "%colla%";
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_general_ci |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.005 sec)

mysql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb')  |
+-------------------+
| aaaȾbbb           |
+-------------------+
1 row in set (0.001 sec)

How to repeat:
See Description.

Suggested fix:
In function my_casedn_utf8mb4, src and dst point to same memory, but my_tolower_utf8mb4 may change the sizeof character. 

gdb :

```
(gdb) n
(gdb) p src
$9 = 0x7f66e2fa8ce3 "��bbb"
(gdb) p dst
$10 = 0x7f66e2fa8ce3 "��bbb"
(gdb) n
(gdb) p srcres
$11 = 2
(gdb) p dstres
$12 = 3
(gdb)   
```

we can see after my_tolower_utf8mb4, dstres is 3 but srcres is 2.

after use gdb hack to change srcres to 3, we got the expect result
```
gdb) p srcres
$14 = 3
(gdb) p dstres
$15 = 3
```

```
txsql> select lower('aaaȾbbb');
+-------------------+
| lower('aaaȾbbb')  |
+-------------------+
| aaaⱦbb            |
+-------------------+
```