Bug #55056 mysql command line character set problems
Submitted: 7 Jul 2010 13:03 Modified: 20 Jul 2010 6:03
Reporter: Bogdan Degtyariov Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.x, 5.1.x OS:Windows
Assigned to: CPU Architecture:Any

[7 Jul 2010 13:03] Bogdan Degtyariov
Description:
mysql command line client corrupts data when trying to import a simple sql dump that contains UTF-8 characters. To repeat the bug the database default charset has to be latin1, the table charset should be utf8.

This does not depend on the server version. The problem only occurs when using --default-character-set=utf8 parameter for mysql.exe.

The most interesting fact is that with --default-character-set=latin1 the problem never happens. It looks that things should be the other way around or at least not failing with utf8 charset and utf8 data in the file.

How to repeat:
We try to put two records in the table. One record is fully latin character string "NA" and another contains UTF-8 string "NÁ". Because the column is primary key and UTF-8 is converted to "NA" the duplicate key error is displayed.

CREATE DATABASE `test_bug` DEFAULT CHARACTER SET latin1;

import the sql file (czechutf8.sql) uploaded below as follows:

mysql -u*** -p*** --default-character-set=utf8 test_bug < czechutf8.sql

see the output:
ERROR 1062 (23000) at line 19: Duplicate entry '1-N├Б' for key 'code_ind'
[7 Jul 2010 13:04] Bogdan Degtyariov
sql file to repeat the bug

Attachment: czechutf8.sql (application/octet-stream, text), 457 bytes.

[13 Jul 2010 1:37] Omer Barnir
does using 'set names utf8' in the script addresses the issue?
[13 Jul 2010 8:16] Bogdan Degtyariov
Omer,

putting "set names utf8;" as first line in .sql file does not solve the problem.
[20 Jul 2010 6:03] Alexander Barkov
This is not a bug.

utf8_general_ci is an accent insensitive collation,
so 'NA' and 'NÁ' are treated as the same values.