MySQL Bugs: #24838: group by compares strings case-insensitively

Bug #24838	group by compares strings case-insensitively
Submitted:	5 Dec 2006 20:59	Modified:	5 Dec 2006 21:27
Reporter:	Ces Ium	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.0.22-standard	OS:	Linux (linux)
Assigned to:		CPU Architecture:	Any
Tags:	case-insensitive, GROUP BY

Description:
mysql  Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i686) using readline 5.0

using innodb tables with utf-8 character set and varchar(1024) columns:

mysql> select name,count(*) from temp_global_names where unique_feature_id in (-3752352,-2770971) group by name;
+------------------------------+----------+
| name                         | count(*) |
+------------------------------+----------+
|                              |        1 | 
| Islamic Republic of Pakistan |        2 | 
| JAMAICA                      |        3 | 
| P?kist?n                     |        5 | 
| PAKISTAN ISLAMICREPUBLICOF   |        1 | 
+------------------------------+----------+
5 rows in set (0.03 sec)
mysql> select name from temp_global_names where unique_feature_id in (-3752352,
-2770971) order by name;
+------------------------------+
| name                         |
+------------------------------+
|                              | 
| Islamic Republic of Pakistan | 
| Islamic Republic of Pakistan | 
| JAMAICA                      | 
| Jamaica                      | 
| Jamaica                      | 
| Pakistan                     | 
| P?kist?n                     | 
| PAKISTAN                     | 
| Pakistan                     | 
| P?kist?n                     | 
| PAKISTAN ISLAMICREPUBLICOF   | 
+------------------------------+
12 rows in set (0.00 sec)

"JAMAICA" and "Jamaica" are different strings and the "group by" clause should not, by default, treat them as the same.

--Additional bug report:  if there is some reasonable way to force the group by clause to treat strings as if they are case-sensitive, it is not well documented.  It doesn't seem to be described on the select syntax page, the string functions page, or the group by functions pages.

How to repeat:
Create a table containing a varchar(1024) table using innodb and utf-8.  Populate it with upper and lower case strings.  Count the number of occurences of each string.

mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8; 
Query OK, 0 rows affected (0.45 sec)

mysql> insert gub values("Jamaica"), ("Jamaica"), ("JAMAICA");
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select name,count(*) from gub group by name;
+---------+----------+
| name    | count(*) |
+---------+----------+
| Jamaica |        3 | 
+---------+----------+
1 row in set (0.07 sec)

mysql> 

Hmmm....  Look at that.  The results you get back depend on the order the data was entered.

mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8;Query OK, 0 rows affected (0.21 sec)

mysql>  insert gub values("JAMAICA"),("Jamaica"), ("Jamaica");
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select name,count(*) from gub group by name;
+---------+----------+
| name    | count(*) |
+---------+----------+
| JAMAICA |        3 | 
+---------+----------+
1 row in set (0.00 sec)

In both of the above cases, two rows of output should be produced.  The first row should be JAMAICA 1, and the second row Jamaica 2.

Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Please read about collations at http://dev.mysql.com/doc/refman/4.1/en/charset.html