Bug #34362 Loaded data appears to be duplicate but should not be
Submitted: 6 Feb 2008 19:59 Modified: 6 Feb 2008 21:21
Reporter: Troy Pearson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.27 OS:Windows
Assigned to: CPU Architecture:Any
Tags: ucs2, ucs2_bin

[6 Feb 2008 19:59] Troy Pearson
Description:
When attempting to load the data in the attached text file, 2 of the 6 records fail to load.

On further investigation it appears they fail to load due to the UNIQUE constraint on the ENTRY column.  The data is not the same but something is causing the index to be violated.

Is there a way to get the error information reported back?  I tried unsuccessfully with the --debug option.
Is there another character set or collation that needs to be specified to load this type of data?  
Why do they appear to be duplicates to the database?  The database is by default utf8 but here I am specifying the type and collation for the column.

Any insight would be greatly appreciated.

How to repeat:
Unzip the attached ZIP file into a directory.
Edit the TESTTABLE.bat file to insert your server/username etc. in the variables at the top
Run TESTTABLE.bat from the command prompt.

D:\French\TESTTABLE
D:\French>mysqlimport -h mysqlserver -P 3306  -u troyp -pxxx troyp  --fields-terminated-by=","  --lines-terminated-by="\n"  --default-character-set=ucs2 --local D:\French\TESTTABLE.txt
troyp.TESTTABLE: Records: 6  Deleted: 0  Skipped: 2  Warnings: 5

Remove
  CONSTRAINT DC_FR_PRIM_UK UNIQUE(ENTRY)
from the table definition in TESTTABLE_SEED.sql and the records all load.
[6 Feb 2008 20:00] Troy Pearson
Script to duplicate the error

Attachment: TestTable.ZIP (application/x-zip-compressed, text), 877 bytes.

[6 Feb 2008 21:21] Troy Pearson
This was found to be an embedded comma in the entry data which was also used as the field separator.