Bug #24838 group by compares strings case-insensitively
Submitted: 5 Dec 2006 20:59 Modified: 5 Dec 2006 21:27
Reporter: Ces Ium Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.22-standard OS:Linux (linux)
Assigned to: CPU Architecture:Any
Tags: case-insensitive, GROUP BY

[5 Dec 2006 20:59] Ces Ium
Description:
mysql  Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i686) using readline 5.0

using innodb tables with utf-8 character set and varchar(1024) columns:

mysql> select name,count(*) from temp_global_names where unique_feature_id in (-3752352,-2770971) group by name;
+------------------------------+----------+
| name                         | count(*) |
+------------------------------+----------+
|                              |        1 | 
| Islamic Republic of Pakistan |        2 | 
| JAMAICA                      |        3 | 
| P?kist?n                     |        5 | 
| PAKISTAN ISLAMICREPUBLICOF   |        1 | 
+------------------------------+----------+
5 rows in set (0.03 sec)
mysql> select name from temp_global_names where unique_feature_id in (-3752352,
-2770971) order by name;
+------------------------------+
| name                         |
+------------------------------+
|                              | 
| Islamic Republic of Pakistan | 
| Islamic Republic of Pakistan | 
| JAMAICA                      | 
| Jamaica                      | 
| Jamaica                      | 
| Pakistan                     | 
| P?kist?n                     | 
| PAKISTAN                     | 
| Pakistan                     | 
| P?kist?n                     | 
| PAKISTAN ISLAMICREPUBLICOF   | 
+------------------------------+
12 rows in set (0.00 sec)

"JAMAICA" and "Jamaica" are different strings and the "group by" clause should not, by default, treat them as the same.

--Additional bug report:  if there is some reasonable way to force the group by clause to treat strings as if they are case-sensitive, it is not well documented.  It doesn't seem to be described on the select syntax page, the string functions page, or the group by functions pages.

How to repeat:
Create a table containing a varchar(1024) table using innodb and utf-8.  Populate it with upper and lower case strings.  Count the number of occurences of each string.

mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8; 
Query OK, 0 rows affected (0.45 sec)

mysql> insert gub values("Jamaica"), ("Jamaica"), ("JAMAICA");
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select name,count(*) from gub group by name;
+---------+----------+
| name    | count(*) |
+---------+----------+
| Jamaica |        3 | 
+---------+----------+
1 row in set (0.07 sec)

mysql> 

Hmmm....  Look at that.  The results you get back depend on the order the data was entered.

mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8;Query OK, 0 rows affected (0.21 sec)

mysql>  insert gub values("JAMAICA"),("Jamaica"), ("Jamaica");
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select name,count(*) from gub group by name;
+---------+----------+
| name    | count(*) |
+---------+----------+
| JAMAICA |        3 | 
+---------+----------+
1 row in set (0.00 sec)

In both of the above cases, two rows of output should be produced.  The first row should be JAMAICA 1, and the second row Jamaica 2.
[5 Dec 2006 21:27] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Please read about collations at http://dev.mysql.com/doc/refman/4.1/en/charset.html