Bug #85989 Danish collations latin1 and utf8 treats 'å' and 'aa' differently
Submitted: 18 Apr 2017 18:57 Modified: 20 Apr 2017 11:37
Reporter: Martin Larsson Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.5/5.6/5.7/8.0 OS:Any
Assigned to: CPU Architecture:Any

[18 Apr 2017 18:57] Martin Larsson
Description:
Creating a database with collation 'utf8_danish_ci' makes MySQL treat 'aa' as equal to 'å'. But creating the database with collation 'latin1_danish_ci' makes 'aa' and 'å' different. This is not about to sorting, where 'aa' and 'å' should be sorted as equals, but about primary keys and unique indexes where 'aa' and 'å' should be considered as two different strings.

How to repeat:
Consider this simple script:
create table n(
  t varchar(5),
  primary key(t)
);
insert into n values ('å');
insert into n values ('aa');

If the database was created with:
create database db CHARACTER SET latin1 COLLATE latin1_danish_ci;
this will work fine.

However, if the database was created with:
create database db CHARACTER SET utf8 COLLATE utf8_danish_ci;
the script will fail with:
Error: Duplicate entry 'aa' for key 'PRIMARY'
SQLState:  23000
ErrorCode: 1062

Suggested fix:
There's actually no real fix for this. Using the lating1_danish_c1 means 'aa' is sorted as 'a' (at the beginning of the alphabet), and not as 'å' (at the end), as it should. 
However, using the corresponding utf8-collation, means 'aa' and 'å' are treated as equal strings. This makes MySQL impossible to use for us.
[18 Apr 2017 22:33] MySQL Verification Team
Thank you for the bug report. Looks like duplicate\related to bug:

https://bugs.mysql.com/bug.php?id=37571

mysql 5.7 > SHOW VARIABLES LIKE "%VERSION%";
+-------------------------+---------------------------------------+
| Variable_name           | Value                                 |
+-------------------------+---------------------------------------+
| innodb_version          | 5.7.19                                |
| protocol_version        | 10                                    |
| slave_type_conversions  |                                       |
| tls_version             | TLSv1,TLSv1.1                         |
| version                 | 5.7.19                                |
| version_comment         | Source distribution pull: 2017-apr-05 |
| version_compile_machine | x86_64                                |
| version_compile_os      | Linux                                 |
+-------------------------+---------------------------------------+
8 rows in set (0,02 sec)

mysql 5.7 > create database db CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 1 row affected (0,00 sec)

mysql 5.7 > use db
Database changed
mysql 5.7 > create table n(
    ->   t varchar(5),
    ->   primary key(t)
    -> );
Query OK, 0 rows affected (0,31 sec)

mysql 5.7 > insert into n values ('å');
Query OK, 1 row affected (0,05 sec)

mysql 5.7 > insert into n values ('aa');
Query OK, 1 row affected (0,07 sec)

mysql 5.7 > select * from n;
+----+
| t  |
+----+
| aa |
| å  |
+----+
2 rows in set (0,00 sec)

mysql 5.7 > drop database db;
Query OK, 1 row affected (0,39 sec)

mysql 5.7 > create database db CHARACTER SET utf8 COLLATE utf8_danish_ci;
Query OK, 1 row affected (0,01 sec)

mysql 5.7 > use db
Database changed
mysql 5.7 > create table n(
    ->   t varchar(5),
    ->   primary key(t)
    -> );
Query OK, 0 rows affected (0,37 sec)

mysql 5.7 > insert into n values ('å');
Query OK, 1 row affected (0,06 sec)

mysql 5.7 > insert into n values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 'PRIMARY'

mysql 5.6 > SHOW VARIABLES LIKE "%VERSION%";
+-------------------------+---------------------------------------+
| Variable_name           | Value                                 |
+-------------------------+---------------------------------------+
| innodb_version          | 5.6.37                                |
| protocol_version        | 10                                    |
| slave_type_conversions  |                                       |
| version                 | 5.6.37                                |
| version_comment         | Source distribution pull: 2017-apr-05 |
| version_compile_machine | x86_64                                |
| version_compile_os      | Linux                                 |
+-------------------------+---------------------------------------+
7 rows in set (0,01 sec)

mysql 5.6 > create database db CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 1 row affected (0,02 sec)

mysql 5.6 > use db
Database changed
mysql 5.6 > create table n(   t varchar(5),   primary key(t) );
Query OK, 0 rows affected (0,35 sec)

mysql 5.6 > insert into n values ('å');
Query OK, 1 row affected (0,05 sec)

mysql 5.5 > SHOW VARIABLES LIKE "%VERSION%";
+-------------------------+---------------------------------------+
| Variable_name           | Value                                 |
+-------------------------+---------------------------------------+
| innodb_version          | 5.5.56                                |
| protocol_version        | 10                                    |
| slave_type_conversions  |                                       |
| version                 | 5.5.56                                |
| version_comment         | Source distribution pull: 2017-apr-05 |
| version_compile_machine | x86_64                                |
| version_compile_os      | Linux                                 |
+-------------------------+---------------------------------------+
7 rows in set (0,00 sec)

mysql 5.5 > create database db CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 1 row affected (0,00 sec)

mysql 5.5 > use db
Database changed
mysql 5.5 > create table n(   t varchar(5),   primary key(t) );
Query OK, 0 rows affected (0,09 sec)

mysql 5.5 > insert into n values ('å');
Query OK, 1 row affected (0,05 sec)

mysql 5.5 > insert into n values ('aa');
Query OK, 1 row affected (0,05 sec)

mysql 5.5 > drop database db;
Query OK, 1 row affected (0,05 sec)

mysql 5.5 > create database db CHARACTER SET utf8 COLLATE utf8_danish_ci;
Query OK, 1 row affected (0,00 sec)

mysql 5.5 > use db
Database changed
mysql 5.5 > create table n(   t varchar(5),   primary key(t) );
Query OK, 0 rows affected (0,07 sec)

mysql 5.5 > insert into n values ('å');
Query OK, 1 row affected (0,06 sec)

mysql 5.5 > insert into n values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 'PRIMARY'
mysql 5.5 > 

mysql 5.6 > insert into n values ('aa');
Query OK, 1 row affected (0,04 sec)

mysql 5.6 > select * from n;
+----+
| t  |
+----+
| aa |
| å  |
+----+
2 rows in set (0,00 sec)

mysql 5.6 > drop database db;
Query OK, 1 row affected (0,21 sec)

mysql 5.6 > create database db CHARACTER SET utf8 COLLATE utf8_danish_ci;
Query OK, 1 row affected (0,00 sec)

mysql 5.6 > use db
Database changed
mysql 5.6 > create table n(   t varchar(5),   primary key(t) );
Query OK, 0 rows affected (0,31 sec)

mysql 5.6 > insert into n values ('å');
Query OK, 1 row affected (0,04 sec)

mysql 5.6 > insert into n values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 'PRIMARY'
[19 Apr 2017 5:16] Martin Larsson
While this is similar to https://bugs.mysql.com/bug.php?id=37571, 37571 concerns the inconvenience that latin1_danish_ci sorts 'aa' wrong. This report is about inserts failing with a 'duplicate key' exception when using utf8_danish_ci. Having the sort order wrong is terrible UX, but the solution is to use utf8 throughout. This would be the most sensible option for new development anyway. However, having that fail with duplicate key errors for unique indexes and primary keys is rather worse.
[19 Apr 2017 6:44] Martin Larsson
Done criteria for both this bug and https://bugs.mysql.com/bug.php?id=37571 would be that, given my table definition above, the following inserts:

insert into n values ('å');
insert into n values ('aa');
insert into n values ('ø');
insert into n values ('aab');

should succeed. And then this select:

select * from n order by t;

should return 

+-----+
| t   |
+-----+
| ø   |
| aa  |
| å   |
| aab |
+-----+
4 rows in set (0.00 sec)

The consequence is that the city of Aachen will be sorted wrong. This is unavoidable.
[19 Apr 2017 13:35] MySQL Verification Team
Thank you for the feedback.

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 8.0.2-dmr Source distribution - pull: 2017-apr-05

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 8.0 > create database db CHARACTER SET latin1 COLLATE latin1_danish_ci;
Query OK, 1 row affected (0.08 sec)

mysql 8.0 > use db
Database changed
mysql 8.0 > create table n(
    -> t varchar(5),
    -> primary key(t)
    -> );
Query OK, 0 rows affected (0.39 sec)

mysql 8.0 > insert into n values ('å');
Query OK, 1 row affected (0.06 sec)

mysql 8.0 > insert into n values ('aa');
Query OK, 1 row affected (0.02 sec)

mysql 8.0 > select * from n;
+----+
| t  |
+----+
| aa |
| å  |
+----+
2 rows in set (0.02 sec)

mysql 8.0 > drop database db;
Query OK, 1 row affected (0.42 sec)

mysql 8.0 > create database db CHARACTER SET utf8 COLLATE utf8_danish_ci;
Query OK, 1 row affected (0.03 sec)

mysql 8.0 > create table n(
    ->   t varchar(5),
    ->   primary key(t)
    -> );
Query OK, 0 rows affected (0.24 sec)

mysql 8.0 > insert into n values ('å');
Query OK, 1 row affected (0.02 sec)

mysql 8.0 > insert into n values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 'PRIMARY'
mysql 8.0 >
[20 Apr 2017 9:11] Steinar Gunderson
Posted by developer:
 
This is, quite simply, not a bug. latin1_danish_ci and utf8_danish_ci are both accent- and case-insensitive (comparing first level only), and as such, aa and a-with-ring compare equal (they differ only on second level).

In MySQL 8.0, you have utf8mb4_da_0900_as_cs, which is accent- and case-sensitive, making them compare different for the purposes of UNIQUE.
[20 Apr 2017 11:37] Martin Larsson
Good to hear you have fixed this non-bug i 8.0. When can we expect 8.0 to be generally available?
[20 Apr 2017 12:17] Steinar Gunderson
There is no published release date for 8.0 GA, but there are milestone releases available.