| Bug #23451 | GROUP_CONCAT truncates a multibyte utf8 character | ||
|---|---|---|---|
| Submitted: | 19 Oct 2006 2:50 | Modified: | 17 Nov 2006 18:19 |
| Reporter: | Steven Wong | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) |
| Version: | 5.0.27-BK, 5.0.22 | OS: | Linux (Linux) |
| Assigned to: | Alexander Barkov | CPU Architecture: | Any |
| Tags: | GROUP_CONCAT multibyte utf8 | ||
[19 Oct 2006 7:33]
Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.0.27-BK on Linux.
[20 Oct 2006 11:49]
Alexander Barkov
mysql-4.1 is affected as well.
[23 Oct 2006 7:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/14151 ChangeSet@1.2562, 2006-10-23 12:38:18+05:00, bar@mysql.com +4 -0 Bug#23451 GROUP_CONCAT truncates a multibyte utf8 character Problem: GROUP_CONCAT on a multi-byte column can truncate in the middle of a multibyte character when applying group_concat_max_len limit. It produces an invalid multi-byte character in the result string. Fix: allow to truncate only on character boundaries.
[7 Nov 2006 8:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/14930 ChangeSet@1.2536, 2006-11-07 12:45:48+04:00, bar@mysql.com +3 -0 Bug#23451 GROUP_CONCAT truncates a multibyte utf8 character Problem: GROUP_CONCAT on a multi-byte column can truncate in the middle of a multibyte character when applying group_concat_max_len limit. It produces an invalid multi-byte character in the result string. The second, easier version - reusing old "warning_for_row" flag, instead of introducing of "result_is_full" - which was added in the previous commit.
[16 Nov 2006 16:48]
Alexander Barkov
Appeared in 4.1.13 common Appeared in 5.0.32 common Appeared in 5.1.14 common
[17 Nov 2006 18:19]
Paul DuBois
Noted in 4.1.13, 5.0.32, 5.1.14 changelogs. When applying the group_concat_max_len limit, GROUP_CONCAT() could truncate multi-byte characters in the middle.

Description: GROUP_CONCAT of a utf8 column truncates in the middle of a multibyte character, if the character is at the boundary where GROUP_CONCAT cuts off the concatenated string according to the setting group_concat_max_len. This produces an invalid utf8 character in the result string. How to repeat: The following simple script reproduces the problem. -- Prerequisite: MySQL server's group_concat_max_len must not be configured, so -- that the default value (1024) is in use. SET NAMES utf8; DROP TABLE IF EXISTS TEST_GROUP_CONCAT; CREATE TABLE TEST_GROUP_CONCAT ( X VARCHAR(2048) CHARACTER SET utf8 NOT NULL, Y INTEGER NOT NULL ) ENGINE=InnoDB; INSERT INTO TEST_GROUP_CONCAT VALUES (REPEAT('a', 1022), 0), (_utf8 x'c3b7', 0); SELECT X, CHAR_LENGTH(X), LENGTH(X), CHARSET(X) FROM TEST_GROUP_CONCAT; SET group_concat_max_len = 1024; -- also try 1023 and 1025 for comparison SELECT GROUP_CONCAT(X), CHAR_LENGTH(GROUP_CONCAT(X)), LENGTH(GROUP_CONCAT(X)), CHARSET(GROUP_CONCAT(X)), RIGHT(HEX(GROUP_CONCAT(X)), 10) FROM TEST_GROUP_CONCAT GROUP BY Y; DROP TABLE TEST_GROUP_CONCAT; Suggested fix: GROUP_CONCAT does not truncate any multibyte character in any character encoding. Also, group_concat_max_len should be interpreted as having units of characters (not bytes), if it is not already.