Bug #13233 select distinct char(column) fails with utf8
Submitted: 15 Sep 2005 16:51 Modified: 23 Oct 2005 0:46
Reporter: Richard Wesley Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.14-nt, 4.1.10-standard OS:Windows (Windows/Linux)
Assigned to: Alexander Barkov CPU Architecture:Any

[15 Sep 2005 16:51] Richard Wesley
Description:
The results of a select distinct query involving a CHAR() function on a column depends on the character set of the connection.  Connecting from a mysql client under Linux gives correct answers (as determined by comparing the results to similar queries under SQL Server, Jet and Oracle), but if you SET NAMES to utf8, only two distinct values are returned, and they are NULL and some sort of blank string.  This happens both from the mysql command line tool under Linux and using the Connector/ODBC 3.51.11 tool.

How to repeat:
1. Create a table called 'Calcs' with a single column:

+--------+
| num4   |
+--------+
|   NULL |
|  10.85 |
| -13.47 |
|  -6.05 |
|   8.32 |
|  10.71 |
|   NULL |
| -10.24 |
|   4.77 |
|   NULL |
|  19.39 |
|   3.82 |
|   3.38 |
|   NULL |
| -14.21 |
|   6.75 |
|   NULL |
+--------+

2.  Issue the query

mysql> select distinct if((num4*7 >= 0) AND (num4*7 < 256), CHAR(num4*7), NULL) as z_char_num from Calcs;

This should produce the results:

+------------+
| z_char_num |
+------------+
| NULL       |
| K          |
| :          |
| J          |
| !          |
| ?          |
|           |
|           |
| /          |
+------------+
9 rows in set (0.06 sec)

3. Now change to utf8 for communication:

mysql>  SET NAMES utf8;

4. Issue the same query as in Step 2.

Expected:  The same results as in Step 2.
Actual:

+------------+
| z_char_num |
+------------+
| NULL       |
|            |
+------------+
2 rows in set (0.00 sec)
[15 Sep 2005 17:02] Richard Wesley
The problem does not occur with constants:

mysql> select distinct char(10.85*7) from Calcs;

Produces 'L', as expected.

The same problem also happens with a Linux 4.1.10-standard server.
[16 Sep 2005 8:21] Vasily Kishkin
Thanks for bug report. I was able to reproduce the bug. On 4.1.14 I found the follow strange results:

mysql> select distinct char(75.95) from Calcs;
+-------------+
| char(75.95) |
+-------------+
| L           |
+-------------+
1 row in set (0.00 sec)

mysql> select distinct char(10.85*7) from Calcs;
+---------------+
| char(10.85*7) |
+---------------+
| K             |
+---------------+
1 row in set (0.00 sec)

I checked on Windows 2003
[23 Sep 2005 7:56] Vasily Kishkin
I've checked on Linux SUSE 9.3. There is same result.
[10 Oct 2005 8:48] Alexander Barkov
An easier test demonstrating the same problem:

mysql> create table t1 (a int);
mysql> insert into t1 values (48),(49),(50);
mysql> set names utf8;
mysql> select distinct char(a) from t1;
+---------+
| char(a) |
+---------+
|         |
+---------+
1 row in set (0.00 sec)

The same work fine with latin1:

mysql> create table t1 (a int);
mysql> insert into t1 values (48),(49),(50);
mysql> set names latin1;
mysql> select distinct char(a) from t1;
+---------+
| char(a) |
+---------+
| 0       |
| 1       |
| 2       |
+---------+
3 rows in set (0.01 sec)
[10 Oct 2005 8:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/30852
[21 Oct 2005 12:10] Alexander Barkov
Fixed in 4.1.16 and 5.0.16
[23 Oct 2005 0:46] Paul DuBois
Noted in 4.1.16, 5.0.16 changelogs.