Bug #55188 GROUP BY, GROUP_CONCAT and TEXT - inconsistent results
Submitted: 12 Jul 2010 21:01 Modified: 15 Oct 2010 13:43
Reporter: Lig Isler-Turmelle Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.1.46sp1 OS:Any
Assigned to: Georgi Kodinov CPU Architecture:Any

[12 Jul 2010 21:01] Lig Isler-Turmelle
Description:
From the manual - http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_sort_length

"The number of bytes to use when sorting BLOB or TEXT values. Only the first max_sort_length bytes of each value are used; the rest are ignored. "

mysql> set session max_sort_length = 1024;
Query OK, 0 rows affected (0.00 sec)

mysql> select substring(bigvalue,1,10), length(bigvalue), group_concat(smallvalue) from test group by substring(bigvalue, 1024);
+--------------------------+------------------+--------------------------+
| substring(bigvalue,1,10) | length(bigvalue) | group_concat(smallvalue) |
+--------------------------+------------------+--------------------------+
| 1111111111               |             1033 | one,two                  |
+--------------------------+------------------+--------------------------+
1 row in set (0.00 sec)

mysql> select substring(bigvalue,1,10), length(bigvalue), group_concat(smallvalue) from test group by bigvalue;
+--------------------------+------------------+--------------------------+
| substring(bigvalue,1,10) | length(bigvalue) | group_concat(smallvalue) |
+--------------------------+------------------+--------------------------+
| 1111111111               |             1033 | one                      |
| 1111111111               |             1033 | two                      |
+--------------------------+------------------+--------------------------+
2 rows in set (0.00 sec)

mysql> select substring(bigvalue,1,10), length(bigvalue), group_concat(smallvalue) from test group by substring(bigvalue, 1033);
+--------------------------+------------------+--------------------------+
| substring(bigvalue,1,10) | length(bigvalue) | group_concat(smallvalue) |
+--------------------------+------------------+--------------------------+
| 1111111111               |             1033 | one,two                  |
+--------------------------+------------------+--------------------------+
1 row in set (0.00 sec)

mysql> select substring(bigvalue,1,10), length(bigvalue), group_concat(smallvalue) from test group by substring(bigvalue, 1050);
+--------------------------+------------------+--------------------------+
| substring(bigvalue,1,10) | length(bigvalue) | group_concat(smallvalue) |
+--------------------------+------------------+--------------------------+
| 1111111111               |             1033 | one,two                  |
+--------------------------+------------------+--------------------------+
1 row in set (0.00 sec)

mysql> select version();
+-----------------------------------+
| version()                         |
+-----------------------------------+
| 5.1.46sp1-enterprise-gpl-advanced |
+-----------------------------------+
1 row in set (0.00 sec)

If the extra characters are ignored - why am I not getting the different result set for the middle query?  According to the documentation they should be the same.  

To me there is a possible explanation for these results - 
When the GROUP BY is used with the implicit limit used from max_sort_length - any data outside the max_sort_length is automatically assumed not to match causing the group_concat to assume it is differing values and not grouped.  Which is different then ignoring the rest.  If however I explicitly tell it to compare the values after the max_sort_length - then it does "match" things up for the group_concat.

So my bug report is for either a bug fix for the inconsistent result sets based on the GROUP BY or if this is expected behavior - updating the documentation to clarify what is happening.

How to repeat:
dump script to be attached for tables.

Suggested fix:
Work Around: increasing sort_max_length 

Fix: have the same results return or update documentation for why it does not.
[12 Jul 2010 21:02] MySQL Verification Team
test.sql

Attachment: test.sql (application/octet-stream, text), 4.33 KiB.

[27 Jul 2010 13:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114451

3456 Georgi Kodinov	2010-07-27
      Bug #55188: GROUP BY, GROUP_CONCAT and TEXT - inconsistent results
      
      In order to be able to check if the set of the grouping fields in a 
      GROUP BY has changed (and thus to start a new group) the optimizer
      caches the current values of these fields in a set of Cached_item 
      derived objects.
      The Cached_item_str, used for caching varchar and TEXT columns,
      is limited in length by the max_sort_length variable.
      A String buffer to store the value with an alloced length of either
      the max length of the string or the value of max_sort_length 
      (whichever is smaller) in Cached_item_str's constructor.
      Then, at compare time the value of the string to compare to was 
      truncated to the alloced length of the string buffer inside 
      Cached_item_str.
      This is all fine and valid, but only if you're not assigning 
      values near or equal to the alloced length of this buffer.
      Because when assigning values like this the alloced length is 
      rounded up and as a result the next set of data will not match the
      group buffer, thus leading to wrong results because of the changed
      alloced_length.
      Fixed by preserving the original maximum length in the 
      Cached_item_str's constructor and using this instead of the 
      alloced_length to limit the string to compare to.
      Test case added.
[30 Jul 2010 13:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114741

3456 Georgi Kodinov	2010-07-30
      Bug #55188: GROUP BY, GROUP_CONCAT and TEXT - inconsistent results
      
      In order to be able to check if the set of the grouping fields in a 
      GROUP BY has changed (and thus to start a new group) the optimizer
      caches the current values of these fields in a set of Cached_item 
      derived objects.
      The Cached_item_str, used for caching varchar and TEXT columns,
      is limited in length by the max_sort_length variable.
      A String buffer to store the value with an alloced length of either
      the max length of the string or the value of max_sort_length 
      (whichever is smaller) in Cached_item_str's constructor.
      Then, at compare time the value of the string to compare to was 
      truncated to the alloced length of the string buffer inside 
      Cached_item_str.
      This is all fine and valid, but only if you're not assigning 
      values near or equal to the alloced length of this buffer.
      Because when assigning values like this the alloced length is 
      rounded up and as a result the next set of data will not match the
      group buffer, thus leading to wrong results because of the changed
      alloced_length.
      Fixed by preserving the original maximum length in the 
      Cached_item_str's constructor and using this instead of the 
      alloced_length to limit the string to compare to.
      Test case added.
[30 Jul 2010 13:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114742

3466 Georgi Kodinov	2010-07-30
      Bug #55188: GROUP BY, GROUP_CONCAT and TEXT - inconsistent results
      
      In order to be able to check if the set of the grouping fields in a 
      GROUP BY has changed (and thus to start a new group) the optimizer
      caches the current values of these fields in a set of Cached_item 
      derived objects.
      The Cached_item_str, used for caching varchar and TEXT columns,
      is limited in length by the max_sort_length variable.
      A String buffer to store the value with an alloced length of either
      the max length of the string or the value of max_sort_length 
      (whichever is smaller) in Cached_item_str's constructor.
      Then, at compare time the value of the string to compare to was 
      truncated to the alloced length of the string buffer inside 
      Cached_item_str.
      This is all fine and valid, but only if you're not assigning 
      values near or equal to the alloced length of this buffer.
      Because when assigning values like this the alloced length is 
      rounded up and as a result the next set of data will not match the
      group buffer, thus leading to wrong results because of the changed
      alloced_length.
      Fixed by preserving the original maximum length in the 
      Cached_item_str's constructor and using this instead of the 
      alloced_length to limit the string to compare to.
      Test case added.
[3 Aug 2010 17:36] Paul DuBois
Noted in 5.1.50.

GROUP BY operations used max_sort_length inconsistently. 

Setting report to Need Merge pending further pushes.
[18 Aug 2010 7:20] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100818071819-2lu46b0mm3cs34rf) (version source revid:alik@sun.com-20100818071732-g682fg1v0nnrrutx) (merge vers: 5.6.1-m4) (pib:20)
[18 Aug 2010 7:21] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100818071923-4ounwbhiium2met1) (version source revid:alik@sun.com-20100818071743-lrzordai06i2crty) (pib:20)
[18 Aug 2010 7:22] Bugs System
Pushed into mysql-5.5 5.5.6-m3 (revid:alik@sun.com-20100818071719-dktnkvt8zvidj0sy) (version source revid:alik@sun.com-20100818071719-dktnkvt8zvidj0sy) (merge vers: 5.5.6-m3) (pib:20)
[18 Aug 2010 14:57] Paul DuBois
Noted in 5.5.6, 5.6.1 changelogs.
[19 Aug 2010 15:39] Bugs System
Pushed into mysql-5.1 5.1.51 (revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (version source revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (merge vers: 5.1.51) (pib:20)
[14 Oct 2010 8:25] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[14 Oct 2010 8:40] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[14 Oct 2010 8:55] Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (merge vers: 5.1.51-ndb-6.2.19) (pib:21)
[15 Oct 2010 13:43] Jon Stephens
Already documented in the 5.1.50 changelog. Reverting to Closed state.