Bug #58493 Rows get skipped as primary duplicates, when they are in fact not.
Submitted: 25 Nov 2010 13:32 Modified: 26 Nov 2010 8:17
Reporter: Toshiro Mifune Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.53 OS:Any
Assigned to: CPU Architecture:Any
Tags: CJK, primary key, radicals supplement

[25 Nov 2010 13:32] Toshiro Mifune
Description:
I use "LOAD DATA LOCAL INFILE" to import a text file, and 5 rows get skipped because they are supposedly duplicates of a primary key. I have checked and they are most certainly not. The values that are skipped are:

⺌ duplicate of 小
⺮ duplicate of 竹
⺪ duplicate of 疋
⻊ duplicate of 足
⺶ duplicate of 羊

The collation I use is utf8_unicode_ci. The characters "⺌, ⺮, ⺪, ⻊, ⺶" belong to the "CJK_RADICALS_SUPPLEMENT" Unicode block. They were introduced with version 3.0 of the Unicode Standard and range from U+2E80 to U+2EFF with a total number of 115 characters. They can be seen here: http://www.unicode.org/charts/PDF/U2E80.pdf
These characters are variants of their full-form characters (ie. ⺌ is a variant of 小). 

How to repeat:
LOAD DATA LOCAL INFILE 'C:/path/test.txt' INTO TABLE test
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n';
[25 Nov 2010 13:33] Toshiro Mifune
test file for import

Attachment: test.txt (text/plain), 110 bytes.

[25 Nov 2010 14:25] Susanne Ebrecht
We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the result of a bug.

Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

When you want that these signs are treated as different signs then just use utf8_bin collation instead.

Thank you for your interest in MySQL.
[25 Nov 2010 14:35] Toshiro Mifune
Using utf8_bin instead does not solve the problem. They are being skipped as well.
[25 Nov 2010 14:44] Toshiro Mifune
Changing collation does not solve the issue.
[26 Nov 2010 7:27] Susanne Ebrecht
This isn't a bug.

The behaviour you described is an expected behaviour.

Please asked at http://forums.mysql.com/ for getting more informations.
[26 Nov 2010 8:17] Toshiro Mifune
How can it possibly be an expected behaviour when the character "⺦", which is also from the CJK_RADICALS_SUPPLEMENT, doesn't get skipped like the other five? I do not understand...