Bug #28002 information_schema/character_sets/utf8_general_ci maxlen wrong
Submitted: 21 Apr 2007 15:39 Modified: 23 Apr 2007 4:18
Reporter: Kai Hofmann Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Information schema Severity:S1 (Critical)
Version: OS:Windows
Assigned to: CPU Architecture:Any
Tags: UTF-8

[21 Apr 2007 15:39] Kai Hofmann
Description:
The max character length for utf-8 is given with 3 - which is wrong wo by best knowledge, because the maximum is 4. Please verify with the standard!
This is critical because applications depending on this might crash.

How to repeat:
select * from character_sets
[23 Apr 2007 4:18] Valeriy Kravchuk
Thank you for a problem report. Indeed, 3 bytes are reportered for UTF-8 in MySQL. But this is a documented limitation of our current implementation. Please, read the manual, http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html:

"RFC 3629 describes encoding sequences that take from one to four bytes. Currently, MySQL support for UTF-8 does not include four-byte sequences."