Bug #7558 | UTF8 and unique/primary keys | ||
---|---|---|---|
Submitted: | 28 Dec 2004 16:10 | Modified: | 30 Dec 2004 14:10 |
Reporter: | Thilo Raufeisen | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.1.8/4.1.9 | OS: | Linux (Linux Gentoo) |
Assigned to: | CPU Architecture: | Any |
[28 Dec 2004 16:10]
Thilo Raufeisen
[28 Dec 2004 18:36]
MySQL Verification Team
I tested on Slackware with latest BK 4.1 source tree and got a different behavior because the first insert was truncated: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 to server version: 4.1.9-debug-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show variables like "%character%"; +--------------------------+--------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /home/miguel/dbs/4.1/share/mysql/charsets/ | +--------------------------+--------------------------------------------+ 7 rows in set (0.00 sec) mysql> show variables like "%collation%"; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ 3 rows in set (0.01 sec) mysql> CREATE TABLE `testtab` ( -> `eintrag` varchar(255) NOT NULL default '', -> PRIMARY KEY (`eintrag`) -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.00 sec) mysql> mysql> INSERT INTO `testtab` VALUES ('tésteintrag'); Query OK, 1 row affected, 1 warning (0.00 sec) mysql> INSERT INTO `testtab` VALUES ('testeintrag'); Query OK, 1 row affected (0.00 sec) mysql> select * from testtab; +-------------+ | eintrag | +-------------+ | t | | testeintrag | +-------------+ 2 rows in set (0.00 sec) mysql>
[29 Dec 2004 7:36]
Heikki Tuuri
Hi! I guess that e and accent-e SHOULD be identified in the general UTF-8 collation. Thus, the original problem is not a bug. But I do not see why Miguel got the value truncated. What kind of character sets are you using in the computer, Miguel? In hundin, I was able to repeat the original behavior. Regards, Heikki heikki@hundin:~/mysql-4.1/client> ./mysql test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 4.1.9-debug-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> mysql> mysql> mysql> CREATE TABLE `testtab` ( -> `eintrag` varchar(255) NOT NULL default '', -> PRIMARY KEY (`eintrag`) -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.13 sec) mysql> INSERT INTO `testtab` VALUES ('tésteintrag'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO `testtab` VALUES ('testeintrag'); ERROR 1062 (23000): Duplicate entry 'testeintrag' for key 1 mysql>
[29 Dec 2004 8:19]
Sergei Golubchik
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: This is not a bug. How characters are compared are defined by the collation. In utf8_general_ci "é" = "e". Use a proper collation to get the results you need. http://dev.mysql.com/doc/mysql/en/Charset-Unicode-sets.html (As for Miguel's results it's apparently a problem of the terminal settings. In any case it's a different issue)
[29 Dec 2004 9:54]
MySQL Verification Team
Heikki, I though at first glance was my terminal but then I tested with default character set and got the original behavior reported like you got. So it is a terminal issue when using the same terminal and changing the character-set the behavior are different ? miguel@hegel:~/dbs/4.1$ bin/mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 to server version: 4.1.9-debug-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show variables like "%character%"; +--------------------------+--------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /home/miguel/dbs/4.1/share/mysql/charsets/ | +--------------------------+--------------------------------------------+ 7 rows in set (0.00 sec) mysql> show variables like "%collation%"; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec) mysql> CREATE TABLE `testtab` ( -> `eintrag` varchar(255) NOT NULL default '', -> PRIMARY KEY (`eintrag`) -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.06 sec) mysql> INSERT INTO `testtab` VALUES ('tésteintrag'); Query OK, 1 row affected (0.02 sec) mysql> INSERT INTO `testtab` VALUES ('testeintrag'); ERROR 1062 (23000): Duplicate entry 'testeintrag' for key 1 mysql> exit Bye
[30 Dec 2004 14:10]
Sergei Golubchik
Ah, I see. Then it's "not a bug". You know, when you set character_set_client to utf8, the server expects you to send correctly formed utf8 strings. But you terminal is not utf8 and the string 'tésteintrag' is sent in latin1 charset. As it's not valid utf8 string, it is truncated on the first invalid character.