Bug #21061 | Wrong rows returned when using unicode | ||
---|---|---|---|
Submitted: | 14 Jul 2006 14:07 | Modified: | 19 Jul 2006 11:02 |
Reporter: | [ name withheld ] | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 5.0.22 | OS: | Windows (Windows XP) |
Assigned to: | CPU Architecture: | Any | |
Tags: | Unicode, utf8 |
[14 Jul 2006 14:07]
[ name withheld ]
[14 Jul 2006 14:33]
MySQL Verification Team
Thank you for the bug report. Please try to use set names, i.e: C:\mysql\bin>mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6 to server version: 5.0.23-community-nt Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> set names cp1251; Query OK, 0 rows affected (0.11 sec) mysql> CREATE TABLE `person` ( -> `name` text NOT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.61 sec) mysql> INSERT INTO `person` (`name`) VALUES -> ('Hugo'), -> ('Hügö'); Query OK, 2 rows affected (0.11 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM person WHERE name='Hugo'; +------+ | name | +------+ | Hugo | +------+ 1 row in set (0.05 sec) mysql> SELECT * FROM person WHERE name='Hügö'; +------+ | name | +------+ | Hügö | +------+ 1 row in set (0.00 sec) mysql> Thanks in advance.
[14 Jul 2006 21:48]
[ name withheld ]
MySQL Query Browser Screen Shot
Attachment: QB.png (image/png, text), 10.63 KiB.
[14 Jul 2006 21:51]
[ name withheld ]
Hmm, but I do not want to use cp1251. I would like to use utf8, and when using SET NAMES utf8; it does not work again. When I try your suggestion cp1251, MySQL Query Browser shows badly encoded characters (see attachment QB.png). Am I right, that when using utf8, ü and u are indistinguishable in SELECTs when using MySQL?
[17 Jul 2006 14:55]
MySQL Verification Team
Thank you for the feedback. What I tried to show you that isn't a server issue instead a terminal issue which not accepts the correct encoding for to send to the server. Below using a terminal which supports utf8 I don't have problems even I don't need to use set names. miguel@hegel:~/dbs/5.0> bin/mysql -umiguel -p -h192.168.0.33 Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 to server version: 5.0.23-community-nt Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> create database dbw; Query OK, 1 row affected (0.04 sec) mysql> use dbw Database changed mysql> CREATE TABLE `person` ( -> `name` text NOT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.13 sec) mysql> mysql> INSERT INTO `person` (`name`) VALUES -> ('Hugo'), -> ('Hügö'); Query OK, 2 rows affected (0.07 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM person WHERE name='Hugo'; +------+ | name | +------+ | Hugo | +------+ 1 row in set (0.00 sec) mysql> SELECT * FROM person WHERE name='Hügö'; +--------+ | name | +--------+ | Hügö | +--------+ 1 row in set (0.00 sec) mysql>
[17 Jul 2006 15:06]
[ name withheld ]
Thank you for your response. When I use the GUI client "MySQL Query Browser" I get the resultset shown in the attachment QB_utf8.png. I thought this client is fully utf8 aware. Was this assumption wrong?
[17 Jul 2006 15:07]
[ name withheld ]
Screen Shot of MySQL Query Browser
Attachment: QB_utf8.png (image/png, text), 11.51 KiB.
[19 Jul 2006 11:02]
[ name withheld ]
Problem solved: * First, you are right that it is not a bug. * Second, I do not think the problem was related to using a terminal which is or is not able so send unicode, since the Windows GUI program "MySQL Query Browser" is IMO perfectly unicode aware. The problem of treating e.g. 'ü' and 'u' as the same characters in SELECTs is related to the COLLATE used. When SELECTing using a collate, it works ok: SELECT * FROM person WHERE name='Hugo' COLLATE utf8_swedish_ci; The same can be achived when giving a collate at table creation: CREATE TABLE `person` ( `name` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_swedish_ci; I found, that the following utf8 collations treat 'ü' and 'u' as different characters: * utf8_danish_ci * utf8_estonian_ci * utf8_hungarian_ci * utf8_icelandic_ci * utf8_roman_ci * utf8_swedish_ci * utf8_turkish_ci Note: Please do not think I write this down to teach you. I am posting my findings so that others who might read this report have a solution. Kind regards, and be well, Markus P.S.: Worth to read: http://dev.mysql.com/doc/refman/5.0/en/charset.html