Bug #25666 UTF-8 support beyong the BMP
Submitted: 17 Jan 2007 5:10 Modified: 21 Aug 2008 5:07
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version: OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Triage: D5 (Feature request)

[17 Jan 2007 5:10] [ name withheld ]
Description:
MySQL's UTF-8 support is currently restricted to the Basic Multilingual Plane of Unicode.

The later versions of Unicode have supported far more characters than this for some years. MediaWiki and doubtless many projects need full Unicode UTF-8 support. Currently we achieve this by storing UTF-8 encoded text in binary fields.

How to repeat:
Try to store character from the Supplementary Ideographic Plane such as these and see that they are converted to literal question marks:

U+ 	0 	1 	2 	3 	4 	5 	6 	7 	8 	9 	A 	B 	C 	D 	E 	F
20000 	

Suggested fix:
For compatibility reasons one solution might be to support two UTF-8 encodings with different names since full UTF-8 support requires more bytes to encode some characters.
[19 Jan 2007 11:22] [ name withheld ]
See also bug 14052.
[19 Jan 2007 16:26] Miguel Solorzano
Thank you for the bug report feature request.
[21 Aug 2008 5:07] Alexander Barkov
Support for supplementary characters was added to mysql-6.0.
Please upgrade.

Closing as not a bug.
[21 Aug 2008 15:48] Paul Dubois
For more information about supplementary-character support:
http://dev.mysql.com/doc/refman/6.0/en/charset-unicode.html