Bug #105668 rpad function is not multibyte safe
Submitted: 23 Nov 2021 3:19 Modified: 23 Nov 2021 4:40
Reporter: x j Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0.26, 5.7, 8.0.27 OS:Any
Assigned to: CPU Architecture:Any

[23 Nov 2021 3:19] x j
Description:
when I use rpad function with multibyte characters, it shows the confusing result

How to repeat:
create table t (a char(20) charset utf8mb4);
insert into t values ('一');
mysql> select hex(rpad(a, 5, 0xe4ba8c)), rpad(a, 5, 0xe4ba8c) from t;
+---------------------------+----------------------+
| hex(rpad(a, 5, 0xe4ba8c)) | rpad(a, 5, 0xe4ba8c) |
+---------------------------+----------------------+
| E4B880E4BA8CE4            | 一二�                 |
+---------------------------+----------------------+
4 rows in set (0.02 sec)

0xe4ba8c is a valid utf8mb4 character, rpad pad with E4BA8CE4 which are 4 bytes.

mysql> select charset(rpad(a, 5, 0xe4ba8c)) from t;
+-------------------------------+
| charset(rpad(a, 5, 0xe4ba8c)) |
+-------------------------------+
| utf8mb4                       |
+-------------------------------+
4 rows in set (0.02 sec)

the charset if rpad is utf8mb4, so I think it should pad with 4 characters but not 4 bytes.
[23 Nov 2021 4:40] MySQL Verification Team
Hello x j,

Thank you for the report and test case.

regards,
Umesh
[24 Nov 2021 12:05] Tor Didriksen
Posted by developer:
 
Fixed by the patch for
Bug#32668730: Change resolving and execution for LPAD and RPAD 
Bug#33238711: heap-buffer-overflow in Item_func_rpad::val_str

mysql> select hex(rpad(a, 5, 0xe4ba8c)), rpad(a, 5, 0xe4ba8c) from t;
+--------------------------------+----------------------+
| hex(rpad(a, 5, 0xe4ba8c))      | rpad(a, 5, 0xe4ba8c) |
+--------------------------------+----------------------+
| E4B880E4BA8CE4BA8CE4BA8CE4BA8C | 一二二二二           |
+--------------------------------+----------------------+
1 row in set (0,00 sec)