Bug #65669 Mysqldump adds wierd bytes to utf8 characters stored in field encoded as latin1
Submitted: 19 Jun 2012 9:47 Modified: 20 Jun 2012 19:10
Reporter: Santilín lín Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:mysql Ver 14.14 Distrib 5.5.24 OS:Linux (debian-linux-gnu)
Assigned to: CPU Architecture:Any
Tags: utf8 mysqldump latin1

[19 Jun 2012 9:47] Santilín lín
Description:
Storing an utf8 string in a field of a table encoded as latin1 makes mysqldump add wierd bytes to the utf8 representation of the characters making them unrecoverable.

As seen in the example in the "How to repeat" text box, when inserting the unicode character 'á' (hex c3a1), the mysqldump tool outputs it as hex c3 83 c2 a1.

How to repeat:
In mysqlclient

mysql> CREATE DATABASE `mytests` DEFAULT CHARSET 'latin1';
mysql> USE `mytests`;
mysql> CREATE TABLE `test` ( `field` CHAR(100) ) ENGINE=MyISAM;
mysql> INSERT INTO `test` VALUES ('á');

on the shell:
# hexdump -C /var/lib/mysql/mytests/test.MYD
00000000  fd c3 a1 20 20 20 20 20  20 20 20 20 20 20 20 20  |...             |
00000010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

# mysqldump -u user mytests | hexdump -C | grep -C 3 VALUES
000004c0  4c 54 45 52 20 54 41 42  4c 45 20 60 74 65 73 74  |LTER TABLE `test|
000004d0  60 20 44 49 53 41 42 4c  45 20 4b 45 59 53 20 2a  |` DISABLE KEYS *|
000004e0  2f 3b 0a 49 4e 53 45 52  54 20 49 4e 54 4f 20 60  |/;.INSERT INTO `|
000004f0  74 65 73 74 60 20 56 41  4c 55 45 53 20 28 27 c3  |test` VALUES ('.|
00000500  83 c2 a1 27 29 3b 0a 2f  2a 21 34 30 30 30 30 20  |...');./*!40000 |
00000510  41 4c 54 45 52 20 54 41  42 4c 45 20 60 74 65 73  |ALTER TABLE `tes|
00000520  74 60 20 45 4e 41 42 4c  45 20 4b 45 59 53 20 2a  |t` ENABLE KEYS *|
[20 Jun 2012 19:10] Sveta Smirnova
Thank you for the report.

> Storing an utf8 string in a field of a table encoded as latin1 makes mysqldump add wierd bytes to the utf8 representation of the characters making them unrecoverable.

This is not supported since version 4.1. Although you can insert such a value, further behavior is expected to be not predictable.