Bug #8943 | utf8 character set error | ||
---|---|---|---|
Submitted: | 4 Mar 2005 9:19 | Modified: | 9 Mar 2005 8:31 |
Reporter: | Peter Chen | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.1.7 | OS: | Windows (Win 2000 Pro) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[4 Mar 2005 9:19]
Peter Chen
[4 Mar 2005 9:33]
Peter Chen
my operation steps
Attachment: MySQL.jpg (image/pjpeg, text), 55.44 KiB.
[4 Mar 2005 13:10]
Heikki Tuuri
Peter, 1-byte character codes where the highest bit is 1 are not valid UTF-8 character codes. You should use the appropriate 2-byte UTF-8 code for such characters. Regards, Heikki
[4 Mar 2005 17:54]
Jose Alejandro Guizar
Then why do they all work except for the 4 he mentions? You can see in his test string that all of the characters are 1-byte highest bit set kind, but he only has trouble with 4 of them. Me and others are having similar problems, check bugs #8949, 8730 and 8672. MySQL is eating those chars up.
[7 Mar 2005 8:12]
Marko Mäkelä
I tested this with 4.1.11-bk. The MySQL "latin1" is actually Windows Code Page 1252, which makes use of some code positions that are undefined in ISO 8859-1. To my knowledge, ISO 8859-1 leaves the code positions 0x80..0x9f undefined or reserved for control characters. To my knowledge, Unicode is a superset of latin1, meaning that Latin1 codes 0x00..0x7f and 0xa0..0xff correspond to the codes U+0000..U+007f and U+00a0..U+00ff. I'm not sure, but I would tend to believe that Unicode leaves U+0080..U+00bf reserved for control characters. In my tests, the CHAR() function seemed to be interpreted as MySQL latin1 character codes. For example, CHAR(244,241,255) corresponds to F4F1FF in a latin1 column, and C3B4C3B1C3BF in a UTF-8 encoded Unicode column. It seems that the high-order bits of CHAR() arguments are ignored: char(288,289) will not correspond to U+0120 and U+0121 but " !". That could be viewed as a bug of MySQL, or as a feature request. That's why I'm assigning this bug to our character set expert Alexander Barkov.
[9 Mar 2005 8:31]
Alexander Barkov
This is not a bug. mysql.exe is a DOS-alike program which uses cp850 character set, not cp1251. You should run "SET NAMES cp850;" after connect, or start mysql with: mysql.exe --default-character-set=cp850 -uroot -p test or write default-character-set=850 into your my.cnf file in [mysql] or [client] sections.
[9 Mar 2005 21:46]
Marko Mäkelä
Bar, are you saying that the behavior of the CHAR() function depends on the character_set_% variables? I think that it should be more clearly documented what CHAR() does with arguments greater than 126. I also think that CHAR(288) in UTF-8 context (insert into UTF-8 column values(char(288)) and mysql --default-character-set=utf8) should mean 0xc4 0xa0 and not 0x01 0x20, as it appears to be (in 5.0.3-bk).