Bug #43810 collation problem between 5.0.22 and 5.1.32 version
Submitted: 23 Mar 2009 19:42 Modified: 1 Apr 2009 6:27
Reporter: Toni Sanavullah Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S1 (Critical)
Version:5.1.32 OS:Linux (x86_64 x86_64 x86_64 GNU/Linux)
Assigned to: Assigned Account CPU Architecture:Any
Tags: utf8, utf8_general_ci

[23 Mar 2009 19:42] Toni Sanavullah
Description:
For utf8, ut8_general_ci, & mysq-5.0.22

mysql> select version();
+------------+
| version()  |
+------------+
| 5.0.22-log |
+------------+
1 row in set (0.01 sec)

mysql> select * from xyz where query = 'wesling';
+---------+
| query   |
+---------+
| wesling |
+---------+
1 row in set (0.00 sec)

For: utf8 & utf8_general_ci & mysql-5.1.32

mysql> select version();
+------------+
| version()  |
+------------+
| 5.1.32-log |
+------------+
1 row in set (0.00 sec)

mysql> select * from xyz;
+----------+
| query    |
+----------+
| wesling  |
| weÃling  |
+----------+
2 rows in set (0.00 sec)

I can reproduce this problem again and again.  What is changed in 5.1.32 that it is considering these two as same characters while 5.0.22 differenciate them?  I understand the effect of character collation, but the outcome is different between 5.0.22 and 5.1.32.  This is crucial for our environment because query is a primary key, and we are trying to upgrade from mysql-5.0.22 to mysql-5.1.32.

Thanks,

Toni

How to repeat:
From mysql-5.0.22

mysql> select * from xyz where query = 'wesling';
+---------+
| query   |
+---------+
| wesling |
+---------+
1 row in set (0.00 sec)

mysql> select * from xyz;
+----------+
| query    |
+----------+
| wesling  |
| weÃling  |
+----------+
2 rows in set (0.00 sec)

CREATE TABLE `xyz` (
  `query` varchar(200) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 

mysqldump --tables xyz --default-character-set=utf8 > xyz.sql
(it did not matter if I used mysqldump binary from mysql-5.0.22 or mysql-5.1.32)

Load it to mysql-5.1.32

mysql-5.1.32/bin/mysql -u -p --default-character-set=utf8

use mydb;

source xyz.sql;

mysql> select * from xyz where query = 'wesling';
+----------+
| query    |
+----------+
| wesling  |
| weÃling  |
+----------+
2 rows in set (0.00 sec)
[23 Mar 2009 19:44] Toni Sanavullah
In the description, the output of 5.1.32 is

mysql> select * from xyz where query = 'wesling';
+----------+
| query    |
+----------+
| wesling  |
| weÃling  |
+----------+
2 rows in set (0.00 sec)
[23 Mar 2009 21:00] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior. Please provide your dump file.
[24 Mar 2009 12:53] Toni Sanavullah
-- MySQL dump 10.10
--
-- Host: localhost    Database: dbutil
-- ------------------------------------------------------
-- Server version       5.0.22-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `xyz`
--

DROP TABLE IF EXISTS `xyz`;
CREATE TABLE `xyz` (
  `query` varchar(200) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

--
-- Dumping data for table `xyz`
--

/*!40000 ALTER TABLE `xyz` DISABLE KEYS */;
LOCK TABLES `xyz` WRITE;
INSERT INTO `xyz` VALUES ('wesling'),('weÃling');
UNLOCK TABLES;
/*!40000 ALTER TABLE `xyz` ENABLE KEYS */;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
[25 Mar 2009 11:05] Sveta Smirnova
Thank you for the feedback.

I still can not repeat described behavior.

Please indicate accurate package version (filename) of MySQL 5.1.32 you use.
[25 Mar 2009 17:18] Toni Sanavullah
Include the following in my.cnf for both 5.0.22 and 5.1.32

default-character-set = utf8
default-collation     = utf8_general_ci

Set in mysql session:

set names utf8 collate utf8_general_ci;

MySQL binaries:

mysql-5.1.32-linux-x86_64-icc-glibc23.tar.gz
mysql-standard-5[1].0.22-linux-x86_64-glibc23.tar.gz

uname -a:
Linux xxx.xxx.xxx.com 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[27 Mar 2009 7:35] Sveta Smirnova
Thank you for the feedback.

Please also provide output of \s (status) command of mysql command line client you use.
[27 Mar 2009 12:32] Toni Sanavullah
--------------
mysql  Ver 14.12 Distrib 5.0.22, for unknown-linux-gnu (x86_64) using readline 5.0

Connection id:          3
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.0.22-standard-log
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql.sock
Uptime:                 1 min 6 sec

Threads: 1  Questions: 10  Slow queries: 0  Opens: 0  Flush tables: 1  Open tables: 6  Queries per second avg: 0.152
--------------

--------------
mysql  Ver 14.14 Distrib 5.1.32, for unknown-linux-gnu (x86_64) using readline 5.1

Connection id:          17
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.1.32-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysqlB.sock
Uptime:                 3 days 16 hours 58 min 14 sec

Threads: 1  Questions: 99  Slow queries: 0  Opens: 24  Flush tables: 1  Open tables: 13  Queries per second avg: 0.0
--------------
[30 Mar 2009 17:41] MySQL Verification Team
Thank you for the feedback. Could you please provide the output of:

select hex(query) from xyz;

Thanks in advance.
[31 Mar 2009 15:15] Toni Sanavullah
mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 5.0.22-standard-log |
+---------------------+
1 row in set (0.00 sec)

mysql> select * from xyz where query = 'wesling';
+---------+
| query   |
+---------+
| wesling |
+---------+
1 row in set (0.00 sec)

mysql> select * from xyz;
+----------+
| query    |
+----------+
| wesling  |
| weÃing  |
+----------+
2 rows in set (0.00 sec)

mysql> select hex(query) from xyz;
+------------------+
| hex(query)       |
+------------------+
| 7765736C696E67   |
| 7765C39F6C696E67 |
+------------------+
2 rows in set (0.01 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.1.32-log |
+------------+
1 row in set (0.00 sec)

mysql> select * from xyz where query = 'wesling';
+----------+
| query    |
+----------+
| wesling  |
| weÃing  |
+----------+
2 rows in set (0.00 sec)

mysql> select hex(query)  from xyz;
+------------------+
| hex(query)       |
+------------------+
| 7765736C696E67   |
| 7765C39F6C696E67 |
+------------------+
2 rows in set (0.00 sec)
[1 Apr 2009 6:27] Susanne Ebrecht
Many thanks for writing a bug report.

We fixed utf8_general_ci and the German ß now is treated as s in 5.1.

Because this has some more unexpected side effects we have had a longer discussion of bug #43306.

This is a duplicate of bug #43306.

This is a duplicate of bug #43593