Bug #25666 UTF-8 support beyong the BMP
Submitted: 17 Jan 2007 5:10 Modified: 21 Aug 2008 5:07
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version: OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[17 Jan 2007 5:10] [ name withheld ]
MySQL's UTF-8 support is currently restricted to the Basic Multilingual Plane of Unicode.

The later versions of Unicode have supported far more characters than this for some years. MediaWiki and doubtless many projects need full Unicode UTF-8 support. Currently we achieve this by storing UTF-8 encoded text in binary fields.

How to repeat:
Try to store character from the Supplementary Ideographic Plane such as these and see that they are converted to literal question marks:

U+ 	0 	1 	2 	3 	4 	5 	6 	7 	8 	9 	A 	B 	C 	D 	E 	F

Suggested fix:
For compatibility reasons one solution might be to support two UTF-8 encodings with different names since full UTF-8 support requires more bytes to encode some characters.
[19 Jan 2007 11:22] [ name withheld ]
See also bug 14052.
[19 Jan 2007 16:26] MySQL Verification Team
Thank you for the bug report feature request.
[21 Aug 2008 5:07] Alexander Barkov
Support for supplementary characters was added to mysql-6.0.
Please upgrade.

Closing as not a bug.
[21 Aug 2008 15:48] Paul DuBois
For more information about supplementary-character support:
[23 Apr 2019 10:01] Antoine Mottier
I did a quick test on a 5.5 server trying to insert the character 丽 (U+2F800) in a VARCHAR(1) of a schema with character set configured to utf8mb4 and it was successful.

As far as I understand utf8mb4 covers all unicode code points so I cannot think of any limitation related to UTF-8 usage in a database configured with utf8mb4 character set.

So I guess that support for supplementary characters was added to mysql-5.5 and not mysql-6.0 based on the following documentation page: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html