Bug #27463 collation returns wrong results
Submitted: 27 Mar 2007 9:09 Modified: 28 Mar 2007 10:41
Reporter: Florian Rissner Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.24a OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: collation, latin1_german1_ci, latin1_german2_ci

[27 Mar 2007 9:09] Florian Rissner
Description:

some collations return multiple results even if a unique result is expected

some collations used on PK forbid insert of 2 obviously different values
 

How to repeat:
first test:
CREATE TABLE `coltest` (
  `ID` int(10) unsigned NOT NULL auto_increment,
  `Name_german1` varchar(45) collate latin1_german1_ci NOT NULL,
  `Name_german2` varchar(45) character set latin1 collate latin1_german2_ci NOT NULL,
  `Name_general_ci` varchar(45) character set latin1 collate latin1_bin NOT NULL,
  PRIMARY KEY  (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

INSERT INTO `coltest` (`ID`,`Name_german1`,`Name_german2`,`Name_general_ci`) VALUES 
 (1,'boern','boern','boern'),
 (2,'börn','börn','börn'),
 (3,'born','born','born');

SELECT * FROM coltest c where Name_german1 = 'börn';
>> returns rows 2 and 3; expected only row 2;

SELECT * FROM coltest c where Name_german2 = 'börn';
>> returns rows 1 and 2; expected only row 2;

SELECT * FROM coltest c where Name_general_ci = 'börn';
>> returns row 2; expected only row 2;  

second test:
create table coltest2 (Name varchar(45) character set latin1 collate latin1_german2_ci NOT NULL, primary Key(Name));

insert into coltest2 (Name), value('börn');
>> is OK
insert into coltest2 (Name), value('boern');
>> returns error (duplicate Key)

Suggested fix:
collation should only affect sorting
[27 Mar 2007 9:25] Florian Rissner
sorry: createstatement is not optimal:

change this line 

`Name_general_ci` varchar(45) character set latin1 collate latin1_bin NOT
NULL,

to

`Name_general_ci` varchar(45) character set latin1 collate latin1_general_ci NOT
NULL,

difference:
if You use the first one, You'll have all chars of this column represented as Hex in your database-dumps
[27 Mar 2007 22:25] MySQL Verification Team
Thank you for the bug report. This is an expected behavior and documented
at:

10.10.2. West European Character Sets
http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html
[28 Mar 2007 10:41] Florian Rissner
sorry, 
I translated "collation" as "sorting" (which is wrong)
"collation" means something like "comparision" 

again:
if you need unique results in the above example use some collation of
latin1_general_ci, latin1_general_cs, latin1_swedish_ci
(though i don't get the difference between those)