Bug #6040 | can't retrieve records with umlaut characters in case insensitive manner | ||
---|---|---|---|
Submitted: | 12 Oct 2004 8:22 | Modified: | 18 Oct 2004 10:28 |
Reporter: | Denitsa Pelova | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.1.5 | OS: | Linux (Red Hat 9) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[12 Oct 2004 8:22]
Denitsa Pelova
[12 Oct 2004 22:20]
MySQL Verification Team
I wasn't able to repeat with a simple table I built. However please verify if the bug: http://bugs.mysql.com/bug.php?id=6043 can applied to your case. Otherwise can you send a test case with your my.cnf ? Thanks
[13 Oct 2004 10:45]
Denitsa Pelova
Hello, here's an easy test that will help you reproduce the problem: CREATE TABLE Person ( id bigint(20) NOT NULL, name varchar(255) default NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO Person VALUES (1, 'Günter André'); INSERT INTO Person VALUES (2, 'Don Überlingen'); INSERT INTO Person VALUES (3, 'Annika Kruse'); SELECT LOWER(name) FROM Person; SELECT UPPER(name) FROM Person; SELECT name FROM Person WHERE LOWER(name) LIKE lower('annika%'); SELECT name FROM Person WHERE LOWER(name) LIKE lower('GÜNTER%'); SELECT name FROM Person WHERE UPPER(name) LIKE UPPER('GÜNTER%'); SELECT UPPER(name) FROM Person WHERE UPPER(name) LIKE UPPER('Günter%'); - The result of each query is as follows: mysql> SELECT LOWER(name) FROM Person; +-----------------+ | LOWER(name) | +-----------------+ | günter andré | | don Überlingen | | annika kruse | +-----------------+ 3 rows in set (0.00 sec) - Observe line 2. No lower on Ü applied. mysql> SELECT UPPER(name) FROM Person; +-----------------+ | UPPER(name) | +-----------------+ | GüNTER ANDRé | | DON ÜBERLINGEN | | ANNIKA KRUSE | +-----------------+ 3 rows in set (0.00 sec) - Observe line 1. No upper on letters ü & é mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('annika%'); +--------------+ | name | +--------------+ | Annika Kruse | +--------------+ 1 row in set (0.00 sec) - OK mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('GÜNTER%'); Empty set (0.00 sec) - no match as lower cannot turn Ü into ü in the LIKE part mysql> SELECT name FROM Person WHERE UPPER(name) LIKE UPPER('GÜNTER%'); Empty set (0.00 sec) - no match as lower cannot turn ü into Ü in the UPPER(name) part mysql> SELECT UPPER(name) FROM Person WHERE UPPER(name) LIKE UPPER('Günter%'); +----------------+ | UPPER(name) | +----------------+ | GüNTER ANDRé | +----------------+ 1 row in set (0.00 sec) - worked only because both UPPER(name) and LIKE UPPER('Günter%') couldn't turn ü into Ü and therefore the query looked like: where 'GüNTER ANDRé' like 'GüNTER%' Now, no matter if I have a my.cnf or not, the problem persist. But may be I am overlooking something. Here's my my.cnf content cat /etc/my.cnf ----------------------- [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock [mysql.server] user=mysql basedir=/var/lib [safe_mysqld] err-log=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid ------------------------------ The tests have been done on Red Hat 9 and on Fedora Core 2 using mysql v. 3.23, 4.1.4, 4.1.5, 5.0.1 To your question: no, this problem is not same as http://bugs.mysql.com/bug.php?id=6043. Changing database and table charset to utf8 doen't help. Regards Denitsa Pelova
[15 Oct 2004 7:37]
Alexander Keremidarski
Äåíèöà, By default 4.1 and 5.0 set all character sets to latin1 with latin1_swedish_ci What you need is either one of german collations latin1_german1_ci or latin1_german2_ci or utf8 with utf8_general_ci (ci suffix stands for Case Insensitive) Fedora itself is all utf8 as you know. > Changing database and table charset to utf8 doen't help. Database charset is just a default as well as table charset. How did you changed teh table charset? ALTER TABLE Person DEFAULT CHARSET utf8; this syntax only changes the default and does *not* convert exisitng data. You can see that with: mysql> SHOW CREATE TABLE Person; CREATE TABLE `Person` ( `id` bigint(20) NOT NULL default '0', `name` varchar(255) character set latin1 default NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; If you want to convert the data to a new charset the proper syntax is: ALTER TABLE Person CONVERT TO CHARSET utf8; Now it becomes: CREATE TABLE `Person` ( `id` bigint(20) NOT NULL default '0', `name` varchar(255) default NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Now if client charsets are also utf8 UPPER() and LOWER() will work (change all client character sets to utf8 with SET NAMES utf8) : mysql> SELECT name FROM Person WHERE LOWER(name) LIKE -> lower('GÜNTER%'); +----------------+ | name | +----------------+ | Günter André | | Günter André | +----------------+ 2 rows in set (0.00 sec) mysql> SELECT name FROM Person WHERE UPPER(name) LIKE -> UPPER('GÜNTER%'); +----------------+ | name | +----------------+ | Günter André | | Günter André | +----------------+ 2 rows in set (0.00 sec) However case insensitive match still does not work: mysql> SELECT name FROM Person WHERE name LIKE 'GÜNTER%'; Empty set (0.00 sec) This is because both name and 'GÜNTER%' have charset utf8, collation utf8_general_ci Let's make it working: mysql> SELECT name FROM Person WHERE name LIKE 'GÜNTER%' COLLATE utf8_unicode_ci; +----------------+ | name | +----------------+ | Günter André | | Günter André | +----------------+ 2 rows in set (0.00 sec) Whether it's a bug with utf8_general_co collation is to be investigated by us. Meanwhile I suggest you reading about Collations and set utf8 + utf8_unicode_ci as defaults in both mysqld and client sections of your my.cnf mysql> ALTER TABLE Person CONVERT TO CHARACTER SET latin1 COLLATE latin1_german1_ci; Query OK, 6 rows affected (0.04 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SET NAMES latin1; Query OK, 0 rows affected (0.00 sec) mysql> SELECT name FROM Person WHERE name LIKE 'GÜNTER%' COLLATE latin1_german1_ci; Empty set (0.00 sec) However when setting all character sets to latin1 with both collations latin1_german1_ci and latin1_german2_ci I get the same results as yours: mysql> ALTER TABLE Person CONVERT TO CHARACTER SET latin1 COLLATE latin1_german1_ci; mysql> select * from Person; +----+-----------------+ | id | name | +----+-----------------+ | 1 | Günter André | | 2 | Don Überlingen | | 3 | Annika Kruse | +----+-----------------+ 3 rows in set (0.00 sec) mysql> SELECT LOWER(name) FROM Person; +-----------------+ | LOWER(name) | +-----------------+ | g��nter andr�� | | don ��berlingen | | annika kruse | +-----------------+ 3 rows in set (0.00 sec) mysql> SELECT UPPER(name) FROM Person; +-----------------+ | UPPER(name) | +-----------------+ | GüNTER ANDRé | | DON ÜBERLINGEN | | ANNIKA KRUSE | +-----------------+ 3 rows in set (0.00 sec) mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('annika%'); +--------------+ | name | +--------------+ | Annika Kruse | +--------------+ 1 row in set (0.02 sec) mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('GÜNTER%'); Empty set (0.02 sec) mysql> SELECT name FROM Person WHERE UPPER(name) LIKE UPPER('GÜNTER%'); Empty set (0.01 sec) mysql> SELECT UPPER(name) FROM Person WHERE UPPER(name) LIKE -> UPPER('Günter%'); +----------------+ | UPPER(name) | +----------------+ | GüNTER ANDRé | +----------------+ 1 row in set (0.00 sec) This definitely seems wrong. Thank you a lot for this perfect test case! Easiest way
[16 Oct 2004 18:26]
Alexander Barkov
Denitsa, can you please start 4.1.x again, create data in "Person" table from the beginning using the same script, and then run these queries: SELECT HEX(name) FROM Person; SHOW VARIABLES LIKE 'character_set%' Thank you very much!
[16 Oct 2004 18:40]
Denitsa Pelova
Hello Alexander, here are the queries: mysql> SELECT HEX(name) FROM Person; +--------------------------------+ | HEX(name) | +--------------------------------+ | 47C3BC6E74657220416E6472C3A9 | | 446F6E20C39C6265726C696E67656E | | 416E6E696B61204B72757365 | +--------------------------------+ 3 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 7 rows in set (0.11 sec)
[16 Oct 2004 19:09]
Alexander Barkov
It looks like the problem is that the client sends everything in UTF8, but the server thinks it is latin1. Can you please add "SET NAMES utf8;" in the beginning of the script, then run it from the beginning once again. Do SELECTs now return the expected results? If you run SELECTs in a separate session, please execute "SET NAMES utf8;" in this session too. thanks.
[16 Oct 2004 19:30]
Denitsa Pelova
Hi again :), here are the results: mysql> set names utf8; Query OK, 0 rows affected (0.00 sec) mysql> SELECT HEX(name) FROM Person; +--------------------------------+ | HEX(name) | +--------------------------------+ | 47C3BC6E74657220416E6472C3A9 | | 446F6E20C39C6265726C696E67656E | | 416E6E696B61204B72757365 | +--------------------------------+ 3 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 7 rows in set (0.01 sec) mysql> select upper(name) from Person; +--------------------+ | upper(name) | +--------------------+ | GüNTER ANDRé | | DON Ã?BERLINGEN | | ANNIKA KRUSE | +--------------------+ 3 rows in set (0.00 sec) mysql> select lower(name) from Person; +--------------------+ | lower(name) | +--------------------+ | gã¼nter andrã© | | don ã?berlingen | | annika kruse | +--------------------+ 3 rows in set (0.00 sec) mysql> select name from Person; +--------------------+ | name | +--------------------+ | Günter André | | Don Ã?berlingen | | Annika Kruse | +--------------------+ 3 rows in set (0.00 sec) This time I get these strange looking symbols...
[16 Oct 2004 19:45]
Denitsa Pelova
I also tried that: mysql> set character_set_database= utf8; Query OK, 0 rows affected (0.00 sec) mysql> set character_set_server= utf8; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 7 rows in set (0.00 sec) mysql> select lower(name) from Person; +--------------------+ | lower(name) | +--------------------+ | gã¼nter andrã© | | don ã?berlingen | | annika kruse | +--------------------+ 3 rows in set (0.00 sec) mysql> select upper(name) from Person; +--------------------+ | upper(name) | +--------------------+ | GüNTER ANDRé | | DON Ã?BERLINGEN | | ANNIKA KRUSE | +--------------------+ 3 rows in set (0.00 sec) But still not working..
[16 Oct 2004 19:46]
Alexander Barkov
Denitsa, You added "SET NAMES utf8;" before running SELECTs and SHOW. This is done correctly. But you need to recreate data with "SET NAMES utf8;" too. Please add "SET NAMES utf8;" in the beginnin of the script which create the table and inserts the records. And run it again. The run SELECTs once again, with "SET NAMES utf8;" in the beginning.
[17 Oct 2004 7:52]
Denitsa Pelova
Hi Alexander, I did it again the way you said and here are the results: mysql> set names utf8; Query OK, 0 rows affected (0.07 sec) mysql> create table Person (name VARCHAR(255) DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.09 sec) mysql> INSERT INTO Person VALUES ('Günter André'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO Person VALUES ('Don Überlingen'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO Person VALUES ('Annika Kruse'); Query OK, 1 row affected (0.00 sec) mysql> SELECT LOWER(name) FROM Person; +-----------------+ | LOWER(name) | +-----------------+ | günter andré | | don überlingen | | annika kruse | +-----------------+ 3 rows in set (0.06 sec) mysql> SELECT UPPER(name) FROM Person; +-----------------+ | UPPER(name) | +-----------------+ | GÜNTER ANDRÉ | | DON ÜBERLINGEN | | ANNIKA KRUSE | +-----------------+ 3 rows in set (0.03 sec) mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('annika%'); +--------------+ | name | +--------------+ | Annika Kruse | +--------------+ 1 row in set (0.00 sec) mysql> SELECT name FROM Person WHERE LOWER(name) LIKE lower('GÜNTER%'); +----------------+ | name | +----------------+ | Günter André | +----------------+ 1 row in set (0.00 sec) mysql> SELECT name FROM Person WHERE UPPER(name) LIKE UPPER('günter%'); +----------------+ | name | +----------------+ | Günter André | +----------------+ 1 row in set (0.00 sec) So far everything is OK, but the case insensitive search still won't return anything: mysql> SELECT name FROM Person WHERE name like 'don ü%'; Empty set (0.01 sec) mysql> select name from Person where name like 'GÜNTER%'; Empty set (0.00 sec) mysql> select name from Person where name like 'günter%'; +----------------+ | name | +----------------+ | Günter André | +----------------+ 1 row in set (0.00 sec) Anyway the lower() and upper() functions will solve my problem, so thank you very much for your help! If you need me to do any other tests I'll be glad to help you :) Best regards, Denitsa
[17 Oct 2004 7:58]
Denitsa Pelova
I forgot to try with that: mysql> SELECT name FROM Person WHERE name like 'don ü%' collate utf8_unicode_ci; +-----------------+ | name | +-----------------+ | Don Überlingen | +-----------------+ 1 row in set (0.10 sec) mysql> select name from Person where name like 'günter%' collate utf8_unicode_ci; +----------------+ | name | +----------------+ | Günter André | +----------------+ 1 row in set (0.00 sec) So to make the case insensitive search work I have to add the collation at the end of the query... Thanks again!
[17 Oct 2004 15:30]
Alexander Barkov
Denitsa, thank you very much for your tests. It is something wring with utf8_general_ci, whith is the default collation for utf8. I'll try to fix it on Monday. Sorry for inconvenience. As a temporary solution, not to write COLLATE in each SELECT, you can specify collation during CREATE TABLE, like this: create table Person (name VARCHAR(255) DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE utf8_unicode_ci; After that, all comparisons and LIKEs between column "name" and a string constant will be done according to utf8_unicode_ci, which seems to work fine with diactrics.
[18 Oct 2004 6:45]
Alexander Barkov
Denitsa, I found the reason of this bug in the code, and will try to fix it. Did COLLATE utf8_unicode_ci in CREATE TABLE help you? I can suggest another solution. If you need ONLY German language, then UTF8 is not really necessary and it is possible to use latin1 in the database. latin1 is usually faster. However, if you want to use many languages at the same time, then latin1 will not work. If you are interested, we can tell how to configure to store data in latin1, but at the same time use utf8 for client-server comminication.
[18 Oct 2004 10:28]
Alexander Barkov
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html
[18 Oct 2004 11:38]
Denitsa Pelova
Hello Alexander, actualy I never wanted to use utf-8, I did it because the latin1 didn't work. Before that I was working with the default character-set/collation but because of the problem and after the bug report, Miguel Solorzano answered to me and proposed to check bug #6043 if it is the same as mine, where utf8 was used, and then you answered me with explanations on how to do it with utf-8. I will be happy to know how it works with latin1. Now I have configured my my.cnf file so that the mysqld and mysql are always started using utf8. That way I don't have to change the CREATE TABLE statements one by one, because none of them specifies character set, but they just use the default. Regards, Denitsa