Bug #41328 Changing Collation Order of an existing Table fails, latin1_german2_ci
Submitted: 9 Dec 2008 13:25 Modified: 7 Apr 2009 6:02
Reporter: bodo karpowski Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DDL Severity:S4 (Feature request)
Version:5.0, 5.1, 6.0 OS:Windows (All platforms)
Assigned to: CPU Architecture:Any

[9 Dec 2008 13:25] bodo karpowski
Description:
The following ALTER TABLE Statement takes no effect in an existing table
with more ore less German umlauts. 

 'ä' must be treated as 'ae', etc. 

ALTER TABLE TPerson
 CHARACTER SET=latin1,
 COLLATE=latin1_german2_ci;

The SHOW CREATE TABLE TPerson shows the correct collation value, but
as reported.

B.K.

How to repeat:
CREATE DATABASE IF NOT EXISTS DBeispiel;

USE DBeispiel;

DROP TABLE IF EXISTS TPerson;

CREATE TABLE TPerson (
  Name VARCHAR(35),
  Vorname VARCHAR (25)
)ENGINE=INNODB
-- COLLATE=latin1_german2_ci
;

INSERT INTO TPerson
VALUES ('Kuhn','Fritz');

INSERT INTO TPerson
VALUES ('Kowalski','Werner');

INSERT INTO TPerson
VALUES ('Kähn','Harry');

INSERT INTO TPerson
VALUES ('Meßmer','Udo');

INSERT INTO TPerson
VALUES ('Kaehn','Harry');

INSERT INTO TPerson
VALUES ('Kabel','Heidi');

INSERT INTO TPerson
VALUES ('Messer','Angelique');

INSERT INTO TPerson
VALUES ('Kahn','Oliver');

INSERT INTO TPerson
VALUES ('König','Frank');

INSERT INTO TPerson
VALUES ('Koenig','Frank');

INSERT INTO TPerson
VALUES ('König','Frank');

INSERT INTO TPerson
VALUES ('Messmer','Erika');

INSERT INTO TPerson
VALUES ('Mesmer','Karl');

INSERT INTO TPerson
VALUES ('Meßner','Paula');

INSERT INTO TPerson
VALUES ('Kong','King');

INSERT INTO TPerson
VALUES ('Kühn','Harry');

ALTER TABLE TPerson
 CHARACTER SET=latin1,
 COLLATE=latin1_german2_ci;

-- --------------------------------------
/*
 Test with :

SELECT * FROM TPerson
ORDER BY Name;

*/
[10 Dec 2008 10:07] Susanne Ebrecht
Hello Bodo,

mysql> SELECT * FROM TPerson
    -> ORDER BY Name;
+----------+-----------+
| Name     | Vorname   |
+----------+-----------+
| Kabel    | Heidi     | 
| Kähn    | Harry     | 
| Kaehn    | Harry     | 
| Kahn     | Oliver    | 
| König   | Frank     | 
| König   | Frank     | 
| Koenig   | Frank     | 
| Kong     | King      | 
| Kowalski | Werner    | 
| Kühn    | Harry     | 
| Kuhn     | Fritz     | 
| Mesmer   | Karl      | 
| Messer   | Angelique | 
| Messmer  | Erika     | 
| Meßmer  | Udo       | 
| Meßner  | Paula     | 
+----------+-----------+

As German I agree with you that this looks wrong on the first view.

But it is right.

German2 also is known as phone book. I just took here the phone book of Aachen and figured out there is non order for 'ö' and 'oe'.

Some examples:
Koch
Körfer
Koerfer
Körfer
Körling

Jäger
Jäger
Jaeger
Jäger
Jaeger
Jäger
...

According to DIN-5007-2 there is non rule given for ordering:

http://de.wikipedia.org/wiki/DIN_5007

Be careful the given link is German!

So German sorting rules doesn't say that ä should be sorted before or after ae.

Here the second sort condition is important.

MySQL is sorting:
Förster L.
Foerster L.

Phone book of Aachen is sorting (printed version of 2008):
Foerster L.
Förster L.

I agree with you that we should be similiar here.
That is not a bug in MySQL because the DIN doesn't say something according to this topic. It is just more familiar because German phone book is doing it that way.
[13 Dec 2008 22:23] bodo karpowski
To http://bugs.mysql.com/?id=41328&edit=2

Dear Susanne,

Thank you for prompt response on my complaint. I was busy in the last 
days, so my reply is a little bit late.

By the way, our dialogue is made difficult by the fact, that my email-
online-client is not yet changed to utf8.

1.) My opinion is, that I described my problem clearly:

I've got about 1 and a half thousand existing data records containing 
German surnames.
I recognized, that the normal sorting order will produce odd results. Now
I tried to correct this to launch the command (on an exististing table):

ALTER TABLE TPerson
 CHARACTER SET=latin1,
 COLLATE=latin1_german2_ci; // Phone book order

Afterwards there was no effect to be seen on the query result. although
the following command showed the correct configuration :  

SHOW CREATE TABLE TSchueler;

Original Result:

CREATE TABLE `tschueler` (
  `SchNr` int(11) NOT NULL auto_increment,
  `Name` varchar(30) character set latin1 NOT NULL,
  `Vorname` varchar(35) character set latin1 default NULL,
  `G` enum('m','w') character set latin1 default 'm',
  `GebDatum` date default NULL,
  `FBerBez` varchar(16) character set latin1 default NULL,
  `EinSchulung` date default NULL,
  `Ausgeschieden` date default NULL,
  `FBetrNr` int(11) default NULL,
  PRIMARY KEY  (`SchNr`),
  UNIQUE KEY `UN_TSchueler` (`Name`,`Vorname`,`G`,`GebDatum`),
  KEY `FK_TBerufe` (`FBerBez`),
  KEY `FK_TBetriebe` (`FBetrNr`),
  CONSTRAINT `FK_TBerufe` FOREIGN KEY (`FBerBez`) REFERENCES `tberufe` (`BerBez`),
  CONSTRAINT `FK_TBetriebe` FOREIGN KEY (`FBetrNr`) REFERENCES `tbetriebe` (`BetrNr`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci;

In spite of the (changed) configuration the ordering didn't change, it did't take any
effect !

Therefore I tried to develop a script, that will show it clearly. First a
db is created, the data will be inserted and at the end the collation order
will be changed by an ALTER TABLE command (I sent it to you).

The following command :

SELECT * 
FROM TPerson // The example !
ORDER BY Name; 

produces the following result :

Name	Vorname
-------------------
Kabel	Heidi
Kaehn	Harry
Kahn	Oliver
Koenig	Frank
Kong	King
Kowalski	Werner
Kuhn	Fritz
Kühn	Harry
Kähn	Harry
König	Frank
König	Frank
Mesmer	Karl
Messer	Angelique
Messmer	Erika
Meßmer	Udo
Meßner	Paula

It is clear to see, that the ALTER TABLE - command did't take any effect, and 
I'm dreadful sorry,

I call it a bug !
=================

2.) As I understand your reply you want to explain what phone book ordering
means. But that wasn't the problem.

You wrote :

"I agree with you that we should be similiar here.
That is not a bug in MySQL because the DIN doesn't say something
according to this topic. It is just more familiar because German phone
book is doing it that way."

You are right, if I would have get the same DB- response as you got, I must have
been satisfied.

The rule is :  a German umlaut is (exactly) treated like a vowel + 'e'.

I could (or better say, I must :-) ) live with that. But let's look to
your resultset. 

+----------+-----------+
| Name     | Vorname   |
+----------+-----------+
| Kabel    | Heidi     |
| Kähn    | Harry     |
| Kaehn    | Harry     |
| Kahn     | Oliver    |
| König   | Frank     |
| König   | Frank     |
| Koenig   | Frank     |
| Kong     | King      |
| Kowalski | Werner    |
| Kühn    | Harry     |
| Kuhn     | Fritz     |
| Mesmer   | Karl      |
| Messer   | Angelique |
| Messmer  | Erika     |
| Meßmer  | Udo       |
| Meßner  | Paula     |
+----------+-----------+

Isn't it a little bit strange to see that the two Königs are kept together,
while the insert order was König - Koenig - König ? Does the software
really fulfill the rule ?

It's only a lint, yes I know.

3.) I had a look into my Berlin phone book, because I haven't any from Aachen.

The foresters are really seemingly mixed up in the column. But in every case
Foerster comes before Förster dependant on the following first name.

I'd like to suggest you to develop a more professional collation order, because
every German secretary will do it in this way.

Let's call it :    "real phone book"   in syntax: "latin1_german3_ci"

//------------------------------------------

Best Wishes, Bodo
[7 Apr 2009 6:02] Alexander Barkov
This is not a bug.

This query only changes the default character set and collation for
the table:

ALTER TABLE t1 CHARACTER SET latin1 COLLATE latin1_german2_ci;

and affects only newly added columns. It does not change the existing
columns.

This is documented in the manual.

If you need to change the existing columns you need to use
another query:

ALTER TABLE t1 CONVERT TO CHARACTER SET latin1 COLLATE latin1_german2_ci;