Description:
mysql Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i686) using readline 5.0
using innodb tables with utf-8 character set and varchar(1024) columns:
mysql> select name,count(*) from temp_global_names where unique_feature_id in (-3752352,-2770971) group by name;
+------------------------------+----------+
| name | count(*) |
+------------------------------+----------+
| | 1 |
| Islamic Republic of Pakistan | 2 |
| JAMAICA | 3 |
| P?kist?n | 5 |
| PAKISTAN ISLAMICREPUBLICOF | 1 |
+------------------------------+----------+
5 rows in set (0.03 sec)
mysql> select name from temp_global_names where unique_feature_id in (-3752352,
-2770971) order by name;
+------------------------------+
| name |
+------------------------------+
| |
| Islamic Republic of Pakistan |
| Islamic Republic of Pakistan |
| JAMAICA |
| Jamaica |
| Jamaica |
| Pakistan |
| P?kist?n |
| PAKISTAN |
| Pakistan |
| P?kist?n |
| PAKISTAN ISLAMICREPUBLICOF |
+------------------------------+
12 rows in set (0.00 sec)
"JAMAICA" and "Jamaica" are different strings and the "group by" clause should not, by default, treat them as the same.
--Additional bug report: if there is some reasonable way to force the group by clause to treat strings as if they are case-sensitive, it is not well documented. It doesn't seem to be described on the select syntax page, the string functions page, or the group by functions pages.
How to repeat:
Create a table containing a varchar(1024) table using innodb and utf-8. Populate it with upper and lower case strings. Count the number of occurences of each string.
mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8;
Query OK, 0 rows affected (0.45 sec)
mysql> insert gub values("Jamaica"), ("Jamaica"), ("JAMAICA");
Query OK, 3 rows affected (0.08 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select name,count(*) from gub group by name;
+---------+----------+
| name | count(*) |
+---------+----------+
| Jamaica | 3 |
+---------+----------+
1 row in set (0.07 sec)
mysql>
Hmmm.... Look at that. The results you get back depend on the order the data was entered.
mysql> create table gub(name varchar(1024)) engine=innodb default charset=utf8;Query OK, 0 rows affected (0.21 sec)
mysql> insert gub values("JAMAICA"),("Jamaica"), ("Jamaica");
Query OK, 3 rows affected (0.07 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select name,count(*) from gub group by name;
+---------+----------+
| name | count(*) |
+---------+----------+
| JAMAICA | 3 |
+---------+----------+
1 row in set (0.00 sec)
In both of the above cases, two rows of output should be produced. The first row should be JAMAICA 1, and the second row Jamaica 2.