Description:
When using latin1 I am able to store both of these email addresses in a unique key:
cédric.xxxxxx@vvvvvv.uunet.be.xxx
cedric.xxxxxx@vvvvvv.uunet.be.xxx
If you run "alter table contacts modify email varchar(80) character set utf8 collate utf8_unicode_ci;" after loading this dump you will get a duplicate key error.
The comparison algorithm seems to be identifying the accented e (é) (latin1 0xE9) as identical to a standard e (latin1 0x65, utf8 0x65) rather than the utf-8 accented e (UTF-8 0xC3 0xA9, U+00E9).
I was unable to repeat this in 4.1.2 & 4.1.4
How to repeat:
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE="NO_AUTO_VALUE_ON_ZERO" */;
DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
`c_id` int(11) NOT NULL default '0',
`last_name` varchar(80) default NULL,
`first_name` varchar(80) default NULL,
`email` varchar(80) character set latin1 collate latin1_bin default NULL,
UNIQUE KEY `contacts$c_id` (`c_id`),
UNIQUE KEY `contacts$email` (`email`),
KEY `contacts$last_name` (`last_name`),
KEY `contacts$first_name` (`first_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40000 ALTER TABLE `contacts` DISABLE KEYS */;
LOCK TABLES `contacts` WRITE;
INSERT INTO `contacts` VALUES (1076958202,'xxxxxx','cédric','cédric.xxxxxx@vvvvvv.uunet.be.xxx'),(1076958203,'xxxxxx','cÂŽédric','cedric.xxxxxx@vvvvvv.uunet.be.xxx');
UNLOCK TABLES;
/*!40000 ALTER TABLE `contacts` ENABLE KEYS */;
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
Then run the following ALTER TABLE command:
ALTER TABLE contacts modify email varchar(80) character set utf8 collate utf8_unicode_ci