Bug #8199 | mysqldump dumps mismached utf-8 characters | ||
---|---|---|---|
Submitted: | 30 Jan 2005 0:38 | Modified: | 31 Jan 2005 8:55 |
Reporter: | [ name withheld ] | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: mysqldump Command-line Client | Severity: | S2 (Serious) |
Version: | 4.1.7-4.1.8a | OS: | Linux (GNU/Linux) |
Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[30 Jan 2005 0:38]
[ name withheld ]
[30 Jan 2005 12:40]
Aleksey Kishkin
Was not be able to reproduce it. I attached ("Files" section) a small perl script that I used for testing. Could you please to test it on your computer?
[30 Jan 2005 12:41]
Aleksey Kishkin
test case. Actually I used sv_SE.utf8 locale when tested it.
Attachment: bug8199.pl (application/octet-stream, text), 309 bytes.
[30 Jan 2005 15:01]
[ name withheld ]
i tried the script and here my analysis session: {232} root@borrowed /tmp # mysqldump t | od -t x1 -a 0001560 60 74 60 20 56 41 4c 55 45 53 20 28 27 c3 85 6e ` t ` sp V A L U E S sp ( ' C enq n 0001600 67 73 74 72 c3 b6 6d 27 29 3b 0a 55 4e 4c 4f 43 g s t r C 6 m ' ) ; nl U N L O C which is correct. but: {233} root@borrowed /tmp # mysql -e "select * from t" t | od -t x1 -a 0000000 68 0a c5 6e 67 73 74 72 f6 6d 0a h nl E n g s t r v m nl which is not correct. i reversed your script to read things out from table t: #!/usr/bin/perl use utf8; use open ":utf8"; use DBI; $db = DBI->connect("dbi:mysql:test","root","") or die "Cannot connect"; $st = $db->prepare("SELECT * FROM t"); $st->execute(); while(@row=$st->fetchrow()) { print "$row[0]\n"; } $st->finish(); and executed it: {242} root@borrowed /tmp # perl n | od -t x1 -a 0000000 c5 6e 67 73 74 72 f6 6d 0a E n g s t r v m nl i have no idea what is, 0xc5,0x6e but i know the translation of latin-1 ascii 0xc5 (Å) into utf-8 should be 0xc3,0x85. in any case this is a serious problem. utf-8 handling must be consistent. thomax
[31 Jan 2005 8:55]
Sergei Golubchik
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: This is expected behaviour. Your client's character set is latin1 (default), so when you insert 0xc3,0xbc, MySQL thinks it's two characters in latin1 charset, and converts them to utf8 to store in the table. Use SET NAMES utf8 before connecting (or see the manual, as there are many ways to make your script working)