MySQL Bugs: #25666: UTF-8 support beyong the BMP

Bug #25666	UTF-8 support beyong the BMP
Submitted:	17 Jan 2007 5:10	Modified:	21 Aug 2008 5:07
Reporter:	[ name withheld ]	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:	Alexander Barkov	CPU Architecture:	Any

Description:
MySQL's UTF-8 support is currently restricted to the Basic Multilingual Plane of Unicode.

The later versions of Unicode have supported far more characters than this for some years. MediaWiki and doubtless many projects need full Unicode UTF-8 support. Currently we achieve this by storing UTF-8 encoded text in binary fields.

How to repeat:
Try to store character from the Supplementary Ideographic Plane such as these and see that they are converted to literal question marks:

U+ 	0 	1 	2 	3 	4 	5 	6 	7 	8 	9 	A 	B 	C 	D 	E 	F
20000 	

Suggested fix:
For compatibility reasons one solution might be to support two UTF-8 encodings with different names since full UTF-8 support requires more bytes to encode some characters.

See also bug 14052.

Thank you for the bug report feature request.

Support for supplementary characters was added to mysql-6.0.
Please upgrade.

Closing as not a bug.

For more information about supplementary-character support:
http://dev.mysql.com/doc/refman/6.0/en/charset-unicode.html

I did a quick test on a 5.5 server trying to insert the character 丽 (U+2F800) in a VARCHAR(1) of a schema with character set configured to utf8mb4 and it was successful.

As far as I understand utf8mb4 covers all unicode code points so I cannot think of any limitation related to UTF-8 usage in a database configured with utf8mb4 character set.

So I guess that support for supplementary characters was added to mysql-5.5 and not mysql-6.0 based on the following documentation page: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html