Bug #12191 select * from x where col like '%ä%' doesn't work with UTF8
Submitted: 26 Jul 2005 23:24 Modified: 27 Jul 2005 1:50
Reporter: Markus Popp Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.13, 5.0.10 OS:Windows (Windows, Linux)
Assigned to: CPU Architecture:Any

[26 Jul 2005 23:24] Markus Popp
Description:
If I've got a table with charset UTF8, I can't use the LIKE clause for special characters like e.g. German umlauts.

The server default is utf8 and collation utf8_general_ci.

There's a second problem with PHP (don't know whether it's really a MySQL bug, but it might relate to this problem). I've tried following script against the same data in the 'how to repeat' section:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf8" />
</head>
<body>
<?php
$link = mysql_connect('db4free.org', 'xxx', 'yyy');
mysql_select_db('zzz', $link) or die ('Keine Verbindung!');

$sql = "select land, waehrung from waehrungen where land = 'Ägypten' order by land";

$query = mysql_query($sql, $link);

while ($data = mysql_fetch_row($query))
	{
	echo $data[0].": ".$data[1]."<br />\n";
	}
	
mysql_close($link);
?>
</body>
</html>

On my local server with PHP 5.0.4 and API client version 4.1.7 I get the correct result: 'Ägypten'. If I run the same script on my webhoster's server, who uses PHP 5.0.2 and API client version 4.0.18, I don't get any result. However, I do get the correct results, if I leave the WHERE clause away. The MySQL server was in both cases a remote server, running 5.0.10 on Linux.

How to repeat:
Create following table:

CREATE TABLE `waehrungen` (
  `bezeichnung` varchar(50) NOT NULL default '',
  `land` varchar(50) NOT NULL default '',
  `iso` varchar(3) NOT NULL default '',
  `waehrung` varchar(100) NOT NULL default '',
  PRIMARY KEY  (`bezeichnung`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `waehrungen` VALUES ('afghanistan', 'Afghanistan', 'AFA', 'Afghani');
INSERT INTO `waehrungen` VALUES ('albania', 'Albanien', 'ALL', 'Lek');
INSERT INTO `waehrungen` VALUES ('egypt', 'Ägypten', 'EGP', 'Ägyptisches Pfund');
INSERT INTO `waehrungen` VALUES ('estonia', 'Estland', 'EEK', 'Estnische Krone');
INSERT INTO `waehrungen` VALUES ('ethiopia', 'Äthiopien', 'ETB', 'Birr');
INSERT INTO `waehrungen` VALUES ('euro', 'Europäische Union', 'EUR', 'Euro');
INSERT INTO `waehrungen` VALUES ('fiji', 'Fidschi', 'FJD', 'Fidschi Dollar');

The query

select land, waehrung from waehrungen where land like '%x%' order by land

returns correctly 'Mexiko'. However, the query 

select land, waehrung from waehrungen where land like '%ä%' order by land

returns all rows except the one with 'Fidschi'.

Suggested fix:
Queries with special characters like umlauts should work the same way as with 'normal' characters.
[26 Jul 2005 23:52] Markus Popp
Right now, I had the same results when I used charset latin1 and collation latin1_german1_ci (which I also used for the client connection).
[27 Jul 2005 0:59] MySQL Verification Team
Looks here is terminal issue:

c:\mysql\bin>mysql -uroot --default_character_set=utf8 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 5.0.11-beta-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set names cp1251;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `waehrungen` (
    ->   `bezeichnung` varchar(50) NOT NULL default '',
    ->   `land` varchar(50) NOT NULL default '',
    ->   `iso` varchar(3) NOT NULL default '',
    ->   `waehrung` varchar(100) NOT NULL default '',
    ->   PRIMARY KEY  (`bezeichnung`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.05 sec)

mysql>
mysql> INSERT INTO `waehrungen` VALUES ('afghanistan', 'Afghanistan', 'AFA',
    -> 'Afghani');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO `waehrungen` VALUES ('albania', 'Albanien', 'ALL', 'Lek');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `waehrungen` VALUES ('egypt', 'Ägypten', 'EGP', 'Ägyptisches
    '> Pfund');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `waehrungen` VALUES ('estonia', 'Estland', 'EEK', 'Estnische
    '> Krone');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `waehrungen` VALUES ('ethiopia', 'Äthiopien', 'ETB', 'Birr');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `waehrungen` VALUES ('euro', 'Europäische Union', 'EUR', 'Euro');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `waehrungen` VALUES ('fiji', 'Fidschi', 'FJD', 'Fidschi Dollar');
Query OK, 1 row affected (0.00 sec)

mysql> select land, waehrung from waehrungen where land like '%x%' order by land;
Empty set (0.00 sec)

mysql> select land, waehrung from waehrungen where land like '%ä%' order by land
    -> ;
+-------------------+----------+
| land              | waehrung |
+-------------------+----------+
| Europäische Union  | Euro     |
+-------------------+----------+
1 row in set (0.00 sec)
[27 Jul 2005 1:09] Markus Popp
Here's what I get in the windows terminal:

mysql> CREATE TABLE `waehrungen` (
    ->   `bezeichnung` varchar(50) NOT NULL default '',
    ->   `land` varchar(50) NOT NULL default '',
    ->   `iso` varchar(3) NOT NULL default '',
    ->   `waehrung` varchar(100) NOT NULL default '',
    ->   PRIMARY KEY  (`bezeichnung`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into waehrungen values ('turkey', 'Türkei', 'TRN', 'Türkische Lira
');
Query OK, 1 row affected, 2 warnings (0.02 sec)

mysql> select land from waehrungen where land like '%ü%';
Empty set (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_client     | utf8                     |
| character_set_connection | utf8                     |
| character_set_database   | utf8                     |
| character_set_results    | utf8                     |
| character_set_server     | utf8                     |
| character_set_system     | utf8                     |
| character_sets_dir       | D:\mysql\share\charsets/ |
+--------------------------+--------------------------+
7 rows in set (0.00 sec)

mysql>
[27 Jul 2005 1:17] Markus Popp
The umlaut and everything behind it has been completely ignored, when I did the insertion:

mysql> select * from waehrungen;
+-------------+------+-----+----------+
| bezeichnung | land | iso | waehrung |
+-------------+------+-----+----------+
| turkey      | T    | TRN | T        |
+-------------+------+-----+----------+
1 row in set (0.00 sec)

mysql>
[27 Jul 2005 1:22] MySQL Verification Team
Try to use set names cp1251:

c:\mysql\bin>mysql -uroot --default_character_set=utf8 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5 to server version: 5.0.11-beta-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set names cp1251;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `waehrungen` (
    -> `bezeichnung` varchar(50) NOT NULL default '',
    -> `land` varchar(50) NOT NULL default '',
    -> `iso` varchar(3) NOT NULL default '',
    -> `waehrung` varchar(100) NOT NULL default '',
    ->  PRIMARY KEY  (`bezeichnung`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into waehrungen values ('turkey',
    -> 'Türkei', 'TRN', 'TürkischeLira');
Query OK, 1 row affected (0.00 sec)

mysql> select land from waehrungen where land like '%ü%';
+--------+
| land   |
+--------+
| Türkei  |
+--------+
1 row in set (0.00 sec)
[27 Jul 2005 1:29] Markus Popp
This works the same way with me.
[27 Jul 2005 1:40] MySQL Verification Team
I assume set names cp1251 worked for you or I am wrong?
[27 Jul 2005 1:49] Markus Popp
That's correct.
[27 Jul 2005 2:02] Markus Popp
Why is this not a bug?
[9 Sep 2005 20:54] Renich Bon Ciric
I have a problem just like this one.

The diference is that I use php5's mysqli extension. I need my server to accept UTF-8 permanently over the php connector.

When I try this:

SELECT * 
FROM `Dirección General`;

It gives me a "Table 'directorio.Dirección General' doesn't exist" error

How can I change this?