Bug #32181 Some utf8 characters overlapp
Submitted: 8 Nov 2007 9:43 Modified: 8 Nov 2007 12:35
Reporter: Bogdan Kecman Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0, 5.1 OS:Any
Assigned to: Ramil Kalimullin CPU Architecture:Any
Tags: utf8

[8 Nov 2007 9:43] Bogdan Kecman
Description:
Some characters overlap each other:
(x'e285a0', x'e285b0')
(x'e285a1', x'w285b1')
(x'e285a2', x'w285b2')
(x'e285a3', x'w285b3')
(x'e285a4', x'w285b4')
(x'e285a5', x'w285b5')
(x'e285a6', x'w285b6')
(x'e285a7', x'w285b7')
(x'e285a8', x'w285b8')
(x'e285a9', x'w285b9')

How to repeat:
CREATE TABLE t1 ( k char(1) NOT NULL, note varchar(16) NULL, PRIMARY KEY(k));
INSERT INTO t1 VALUES (x'e285b0', "S1");

Query OK, 1 row affected

INSERT INTO t1 VALUES (x'e285a0', "L1");

ERRIR 1062 (23000): Duplicate entry for key 1

Suggested fix:
n/a
[8 Nov 2007 10:14] Bogdan Kecman
Overlapping characters again (there was a typo in previous comment)
(x'e285a0', x'e285b0')
(x'e285a1', x'e285b1')
(x'e285a2', x'e285b2')
(x'e285a3', x'e285b3')
(x'e285a4', x'e285b4')
(x'e285a5', x'e285b5')
(x'e285a6', x'e285b6')
(x'e285a7', x'e285b7')
(x'e285a8', x'e285b8')
(x'e285a9', x'e285b9')
[8 Nov 2007 12:35] Ramil Kalimullin
Note: in the utf8_general_ci collation ROMAN NUMERAL ONE (TWO, THREE, etc.) and
SMALL ROMAN NUMERAL ONE (TWO, THREE, etc.) simbols compare as equals.

One should use utf8_bin collation to differentiate them.

For example:
mysql> create table t1 (a char(1) not null, primary key(a)) character set=utf8 collate utf8_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values (x'e285a0');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (x'e285b0');
Query OK, 1 row affected (0.00 sec)