Bug #45517 Searching with umlaut (latin1_german2_ci) with LIKE
Submitted: 16 Jun 2009 8:27 Modified: 30 Jun 2009 6:09
Reporter: Richard Teubel Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.45 5.0.83 5.1.35 OS:Linux
Assigned to: CPU Architecture:Any
Tags: latin1_german2_ci, like, umlaut

[16 Jun 2009 8:27] Richard Teubel
Description:
I have a wordlisttable with a unique key on this words. The charset is latin1_german2_ci. If I insert the Word "Blaubär" and "Blaubaer" I get error with duplicate key. Ok. If I Select this two words with equal sign, I get the same ID. That is ok. But, if I select this with LIKE, i get only the ID with "Blaubär". For "Blaubaer" i get an empty set.

How to repeat:
create database test_latin1_german2_ci;
use test_latin1_german2_ci;

CREATE TABLE IF NOT EXISTS `Wordlist` (
  `ID` int(11) unsigned NOT NULL auto_increment,
  `word` varchar(50) collate latin1_german2_ci NOT NULL ,
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `word` (`word`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci AUTO_INCREMENT=1 ;

INSERT INTO Wordlist SET word = "Blaubär";
-- Query OK, 1 row affected (0.05 sec)

INSERT INTO Wordlist SET word = "Blaubaer";
-- ERROR 1062 (23000): Doppelter Eintrag 'Blaubaer' für Schlüssel 2

SELECT ID FROM Wordlist WHERE word = "Blaubär";
+----+
| ID |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

SELECT ID FROM Wordlist WHERE word = "Blaubaer";
+----+
| ID |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

SELECT ID FROM Wordlist WHERE word LIKE "Blaubär";
+----+
| ID |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

SELECT ID FROM Wordlist WHERE word LIKE "Blaubaer";
Empty set (0.00 sec)
[16 Jun 2009 8:54] Sveta Smirnova
Thank you for the report.

But version 5.0.45 is old. Please try with current version 5.0.83 and if problem still exists provide output of:

show variables like 'char%';
show variables like 'coll%';
[16 Jun 2009 10:49] Richard Teubel
The same problem also in version 5.0.83 and 5.1.35.

mysql> select version();
+------------+
| version()  |
+------------+
| 5.0.83-log |
+------------+
1 row in set (0.00 sec)

mysql> show variables like 'char%';
+--------------------------+----------------------------------------------------------------------+
| Variable_name            | Value                                                                |
+--------------------------+----------------------------------------------------------------------+
| character_set_client     | latin1                                                               |
| character_set_connection | latin1                                                               |
| character_set_database   | latin1                                                               |
| character_set_filesystem | binary                                                               |
| character_set_results    | latin1                                                               |
| character_set_server     | latin1                                                               |
| character_set_system     | utf8                                                                 |
| character_sets_dir       | /usr/local/mysql-5.0.83-linux-i686-icc-glibc23/share/mysql/charsets/ |
+--------------------------+----------------------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> show variables like 'coll%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_german2_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

mysql> set collation_connection =latin1_german2_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER DATABASE `test_latin1_german2_ci` DEFAULT CHARACTER SET latin1 COLLATE latin1_german2_ci;
Query OK, 1 row affected (0.00 sec)

mysql> show variables like 'coll%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_german2_ci |
| collation_database   | latin1_german2_ci |
| collation_server     | latin1_german2_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

mysql> SELECT ID FROM Wordlist WHERE word LIKE "Blaubaer";
Empty set (0.00 sec)
[30 Jun 2009 6:09] Susanne Ebrecht
Many thanks for writing a bug report.

Unfortunately, this is not a bug.

DIN 5007-2 says, for Germany 'ä' and 'ae' are equal (http://de.wikipedia.org/wiki/DIN_5007)

This means:

SELECT 'ä' = 'ae';

=> TRUE

The SQL Standard says that the command 'LIKE' should perform matching on a per-character basis.

So for the command 'LIKE' 'ä' and 'ae' are different.

http://dev.mysql.com/doc/refman/5.1/en/string-comparison-functions.html