MySQL Bugs: #55056: mysql command line character set problems

Bug #55056	mysql command line character set problems
Submitted:	7 Jul 2010 13:03	Modified:	20 Jul 2010 6:03
Reporter:	Bogdan Degtyariov	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	5.0.x, 5.1.x	OS:	Windows
Assigned to:		CPU Architecture:	Any

Description:
mysql command line client corrupts data when trying to import a simple sql dump that contains UTF-8 characters. To repeat the bug the database default charset has to be latin1, the table charset should be utf8.

This does not depend on the server version. The problem only occurs when using --default-character-set=utf8 parameter for mysql.exe.

The most interesting fact is that with --default-character-set=latin1 the problem never happens. It looks that things should be the other way around or at least not failing with utf8 charset and utf8 data in the file.

How to repeat:
We try to put two records in the table. One record is fully latin character string "NA" and another contains UTF-8 string "NÁ". Because the column is primary key and UTF-8 is converted to "NA" the duplicate key error is displayed.

CREATE DATABASE `test_bug` DEFAULT CHARACTER SET latin1;

import the sql file (czechutf8.sql) uploaded below as follows:

mysql -u*** -p*** --default-character-set=utf8 test_bug < czechutf8.sql

see the output:
ERROR 1062 (23000) at line 19: Duplicate entry '1-N├Б' for key 'code_ind'

sql file to repeat the bug

Attachment: czechutf8.sql (application/octet-stream, text), 457 bytes.

does using 'set names utf8' in the script addresses the issue?

Omer,

putting "set names utf8;" as first line in .sql file does not solve the problem.

This is not a bug.

utf8_general_ci is an accent insensitive collation,
so 'NA' and 'NÁ' are treated as the same values.