Bug #100841 query with rpad function using Chinese string leads to messy code
Submitted: 14 Sep 12:16 Modified: 28 Sep 18:56
Reporter: Brian Yue (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:MySQL8.0.18, 8.0.21 OS:Any (rhel-7.4)
Assigned to: CPU Architecture:Any (intel x86)
Tags: Chinese, messy code, rpad

[14 Sep 12:16] Brian Yue
Description:
Dear verification team,
  A query result of rpad function unexpectedly contains messy code when there is Chines string in rpad_str.
  Please reference to `How to repeat` part for detail.

How to repeat:
1. charset of my charset configured like this

mysql> show variables like '%char%';
+--------------------------+-------------------------------+
| Variable_name            | Value                         |
+--------------------------+-------------------------------+
| character_set_client     | utf8mb4                       |
| character_set_connection | utf8mb4                       |
| character_set_database   | utf8mb4                       |
| character_set_filesystem | binary                        |
| character_set_results    | utf8mb4                       |
| character_set_server     | utf8mb4                       |
| character_set_system     | utf8                          |
| character_sets_dir       | /home/yxx_git/share/charsets/ |
+--------------------------+-------------------------------+
8 rows in set (0.04 sec)

2. create a table `t1` with utf8mb4 charset

CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `c1` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

mysql> insert into t1 values (111,'111');
Query OK, 1 row affected (0.01 sec)

3. query fields of table `t1` with pad functions (pad with Chinese string)

mysql> select rpad(id, 11, '中文'), rpad(c1, 11, '中文') from t1;
+-------------------------------------------------------------+-----------------------------+
| rpad(id, 11, '中文')                                        | rpad(c1, 11, '中文')        |
+-------------------------------------------------------------+-----------------------------+
| 111中文中文中文中文                                 | 111中文中文中文中文         |
+-------------------------------------------------------------+-----------------------------+
1 row in set (0.01 sec)

mysql> select lpad(id, 11, '中文'), lpad(c1, 11, '中文') from t1;
+-----------------------------+-----------------------------+
| lpad(id, 11, '中文')        | lpad(c1, 11, '中文')        |
+-----------------------------+-----------------------------+
| 中文中文中文中文111         | 中文中文中文中文111         |
+-----------------------------+-----------------------------+
1 row in set (0.00 sec)

4. problem

Now that I have configured charset as utf8mb4, why did I get messy code ?

Suggested fix:
None
[15 Sep 5:45] MySQL Verification Team
Hello Brian Yue,

Thank you for the report and feedback.

regards,
Umesh
[28 Sep 18:56] Paul Dubois
Posted by developer:
 
Fixed in 8.0.23.

The RPAD() function did not correctly set the character set of the
result.