MySQL Bugs: #113531: the convert function seems not work as expected

Bug #113531	the convert function seems not work as expected
Submitted:	31 Dec 2023 6:52	Modified:	3 Jan 2024 7:41
Reporter:	z yz	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	8.0.35	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
We want to display the latin1 character in utf8 client, and we using the convert function like below :

mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT CONVERT(_latin1'中文' USING utf8);
+-------------------------------------+
| CONVERT(_latin1'中文' USING utf8)   |
+-------------------------------------+
| ä¸æ–‡                              |
+-------------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql

As we expected ,the function CONVERT will transfer the character from latin1 to utf8, and the client charter setting is also utf8 , it should generate the normal Chinese word as '中文', but the output is mojibake 

And we set the client character setting to latin1 again, the output is normal, seems have not transfer to utf8 yet :

mysql>
mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT CONVERT(_latin1'中文' USING utf8);
+-------------------------------------+
| CONVERT(_latin1'中文' USING utf8)   |
+-------------------------------------+
| 中文                                |
+-------------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql>

 
do we have some method that display the string using latin1 character set and client character set is utf8 ?

How to repeat:
follow the above query

Hello z yz,

Thank you for the report and feedback.

regards,
Umesh

Posted by developer:
 
This is actually how the MySQL CONVERT function is implemented, so it is not a bug.

The UTF8 string '中文', which is 2 characters and 6 bytes wide, is interpreted as
a LATIN1 string of 6 characters. This is possible, because every one-byte value is
a valid LATIN1 character. Then, each of the characters is converted to UTF8,
which gives a string that is 14 characters wide.

The same thing happens both with connection character set as UTF8 and LATIN1,
however due to display issues, it seems the result is reasonable with
SET NAMES LATIN1.

Note also that what you try to do is impossible: Only the lower 256 code points
of a Unicode repertoire can successfully be converted to LATIN1.
These UTF8 code points that each occupy 3 bytes are out of that range.
If you really want to store such strings as LATIN1, consider converting them
to hexadecimal notation.