Bug #29850 session character encoding vs. group_concat
Submitted: 17 Jul 2007 17:34 Modified: 27 Jul 2007 4:25
Reporter: Jozsef Marton Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.32, 4.1 bk, 5.0 bk, 5.1 bk OS:Any
Assigned to: Evgeny Potemkin CPU Architecture:Any
Tags: character_encoding, group_concat, Latin1, latin2, non-utf8, utf8

[17 Jul 2007 17:34] Jozsef Marton
Description:
In a session with session encoding latin1 (set names latin1) [also applies to latin2] group_concat returns it's result in utf8 in a self-join query.

The supplied repeating step is a dummy query, but it shows the problem.

How to repeat:
set names latin1;

create table group_concat_test (id int, name varchar(20));

insert into group_concat_test (id, name) values (1, "óra");
insert into group_concat_test (id, name) values (2, "óra");

-- this gives the result in correct encoding
mysql> select * from group_concat_test;
+------+------+
| id   | name |
+------+------+
|    1 | óra  |
|    2 | óra  |
+------+------+
2 rows in set (0.00 sec)

-- this is also correct
mysql> select group_concat(name) from group_concat_test;
+--------------------+
| group_concat(name) |
+--------------------+
| óra,óra            |
+--------------------+
1 row in set (0.00 sec)

-- but this is NOT correct as it is in UTF8 encoding.
mysql> select b.id, group_concat(b.name) from group_concat_test a, group_concat_test b group by b.id;
+------+----------------------+
| id   | group_concat(b.name) |
+------+----------------------+
|    1 | Ăłra,Ăłra            |
|    2 | Ăłra,Ăłra            |
+------+----------------------+
2 rows in set (0.00 sec)
[17 Jul 2007 19:00] Sveta Smirnova
test case

Attachment: bug29850.test (application/octet-stream, text), 494 bytes.

[17 Jul 2007 19:00] Sveta Smirnova
Thank you for the report.

Verified as described using attached test case.
[17 Jul 2007 19:36] Sveta Smirnova
Real problem is not session encoding, but work of group_concat and join.

To repeat you need to change CREATE TABLE statement to "create table group_concat_test (id int, name varchar(20)) DEFAULT CHARSET=utf8;"
[19 Jul 2007 16:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/31182

ChangeSet@1.2536, 2007-07-19 20:21:23+04:00, evgen@moonbone.local +3 -0
  Bug#29850: Wrong charset of GROUP_CONCAT result when the select employs
  a temporary table.
  
  The result string of the Item_func_group_concat wasn't initialized in the 
  copying constructor of the Item_func_group_concat class. This led to a
  wrong charset of GROUP_CONCAT result when the select employs a temporary
  table.
  
  The copying constructor of the Item_func_group_concat class now correctly
  initializes the charset of the result string.
[20 Jul 2007 23:46] Bugs System
Pushed into 5.1.21-beta
[20 Jul 2007 23:49] Bugs System
Pushed into 5.0.48
[27 Jul 2007 4:25] Paul DuBois
Noted in 5.0.48, 5.1.21 changelogs.

If query execution involved a temporary table, GROUP_CONCAT() could
return a result with an incorrect character set.
[9 Aug 2007 17:39] Sveta Smirnova
Bug still exists in current 4.1 sources.

Bug #30040 was marked as duplicate of this one.