Bug #6786 accented and non-accented character can't be stored is a unique key on utf8 dat
Submitted: 24 Nov 2004 1:13 Modified: 24 Nov 2004 11:06
Reporter: Jorge del Conde Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.7 OS:
Assigned to: CPU Architecture:Any

[24 Nov 2004 1:13] Jorge del Conde
Description:
When using latin1 I am able to store both of these email addresses in a unique key:

cédric.xxxxxx@vvvvvv.uunet.be.xxx
cedric.xxxxxx@vvvvvv.uunet.be.xxx

If you run "alter table contacts modify email varchar(80) character set utf8 collate utf8_unicode_ci;" after loading this dump you will get a duplicate key error.

The comparison algorithm seems to be identifying the accented e (é) (latin1 0xE9) as identical to a standard e (latin1 0x65, utf8 0x65) rather than the utf-8 accented e (UTF-8 0xC3 0xA9, U+00E9).

I was unable to repeat this in 4.1.2 & 4.1.4

How to repeat:
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE="NO_AUTO_VALUE_ON_ZERO" */;

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `c_id` int(11) NOT NULL default '0',
  `last_name` varchar(80) default NULL,
  `first_name` varchar(80) default NULL,
  `email` varchar(80) character set latin1 collate latin1_bin default NULL,
  UNIQUE KEY `contacts$c_id` (`c_id`),
  UNIQUE KEY `contacts$email` (`email`),
  KEY `contacts$last_name` (`last_name`),
  KEY `contacts$first_name` (`first_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

/*!40000 ALTER TABLE `contacts` DISABLE KEYS */;
LOCK TABLES `contacts` WRITE;
INSERT INTO `contacts` VALUES (1076958202,'xxxxxx','cédric','cédric.xxxxxx@vvvvvv.uunet.be.xxx'),(1076958203,'xxxxxx','cÂŽédric','cedric.xxxxxx@vvvvvv.uunet.be.xxx');
UNLOCK TABLES;
/*!40000 ALTER TABLE `contacts` ENABLE KEYS */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

Then run the following ALTER TABLE command:

ALTER TABLE contacts modify email varchar(80) character set utf8 collate utf8_unicode_ci
[24 Nov 2004 11:06] Sergei Golubchik
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

Different collations have different rules how to compare characters - it's what collations are all about.