Bug #93626 Unicode string incorrectly marked as invalid
Submitted: 15 Dec 2018 21:22 Modified: 30 Jan 15:14
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: Unicode, utf8, utf8mb4

[15 Dec 2018 21:22] Daniël van Eeden
Description:
If a 4-byte unicode char is used as column name MySQL complains that it is not valid.

How to repeat:
mysql> select 'x' as '🐬' ;
+---+
| ? |
+---+
| x |
+---+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------+
| Level   | Code | Message                                              |
+---------+------+------------------------------------------------------+
| Warning | 1300 | Invalid utf8mb4 character string: '\xF0\x9F\x90\xAC' |
+---------+------+------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT 0xF09F90AC;
+------------+
| 0xF09F90AC |
+------------+
| 🐬           |
+------------+
1 row in set (0.00 sec)

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.11    |
+-----------+
1 row in set (0.00 sec)
[15 Dec 2018 21:24] Daniël van Eeden
MySQL  localhost:33060+ ssl  SQL > select 'x' as '🐬' ;
+---+
| ? |
+---+
| x |
+---+
1 row in set, 1 warning (0.0012 sec)
Warning (code 1300): Invalid utf8mb4 character string: '\xF0\x9F\x90\xAC'

 MySQL  localhost:33060+ ssl  SQL > \py
Switching to Python mode...

 MySQL  localhost:33060+ ssl  Py > print('\xF0\x9F\x90\xAC');
🐬
[15 Dec 2018 21:28] Daniël van Eeden
Looks like Connector/C++ with X Protocol does return the emoji as column name correctly.
[16 Dec 2018 0:50] Miguel Solorzano
Thank you for the bug report.
[16 Dec 2018 12:59] Peter Laursen
I am not so sure it is a server-side charset problem as categorized, but rather an issue of the terminal when the client is running or maybe even simply missing glyph(s) in the font used by the client for display. 

In a Windows GUI tool (SQLyog) I get as expected when using the Microsoft font "Courier New": 

SELECT VERSION() -- returns '8.0.13'

SELECT 'x' AS '🐬';

/*RETURNS
🐬    
--------
X       
*/
[16 Dec 2018 13:02] Peter Laursen
Query result correctly displaying in SQLyog

Attachment: dolphin.PNG (image/png, text), 12.83 KiB.

[16 Dec 2018 13:30] Peter Laursen
I forgot to mention one detail that might matter here:

SQLyog uses MariaDB's Connector/C ("libmariadb") and not "libmysql" or any other connector from Oracle for connection to MySQL.
[16 Dec 2018 13:39] Peter Laursen
.. but the 1300 error/warning is listed as a server error here:
https://dev.mysql.com/doc/refman/8.0/en/server-error-reference.html

.. hhmmmm ...
[17 Dec 2018 2:34] Xing Zhang
Posted by developer:
 
This is not a problem of character set, but of the silly error reporting. On linux, the 'system character set' is not switched to utf8mb4 yet. MySQL maps the linux charset to utf8_general_ci. When parsing the statement, MySQL tries to convert this dolphin character to the system character set (utf8_general_ci) but it only supports characters in BMP. The error reporting always says the problem happens on the 'from_cs', but this time the problem happens on the 'to_cs'. I think the correct behavior should be giving a warning msg: "Invalid utf8 character string: '\xF0\x9F\x90\xAC'".
[30 Jan 15:14] Paul Dubois
Posted by developer:
 
Fixed in 8.0.17.

Some supplemental Unicode characters could incorrectly be flagged with
a warning message as invalid.