Bug #26815 Unexpected built-in function behavior: group_concat(distinct substring_index())
Submitted: 3 Mar 2007 3:09 Modified: 17 Apr 2007 19:52
Reporter: Ryan Nevell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0.38-BK, 5.0.37, 5.1.16, 5.1.15-beta-win32 OS:Linux (Linux, Windows XP Service Pack 2, 32 bit)
Assigned to: Georgi Kodinov CPU Architecture:Any

[3 Mar 2007 3:09] Ryan Nevell
Description:
When using GROUP_CONCAT(DISTINCT SUBSTRING_INDEX()), distinct values returned by SUBSTRING_INDEX() are ignored.

It seems to take only the first result, and ignore everything after that. If you put an ORDER BY a DESC in there, it only gives you "two".

How to repeat:
mysql> SELECT * FROM test;
+------+
| a    |
+------+
| one  |
| two  |
| one  |
+------+

mysql> SELECT SUBSTRING_INDEX(a, '.', 1) FROM test;
+----------------------------+
| SUBSTRING_INDEX(a, '.', 1) |
+----------------------------+
| one                        |
| two                        |
| one                        |
+----------------------------+

-- It works without DISTINCT
mysql> SELECT GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) FROM test;
+------------------------------------------+
| GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+

-- And it works without SUBSTRING_INDEX()
mysql> SELECT GROUP_CONCAT(DISTINCT a) FROM test;
+--------------------------+
| GROUP_CONCAT(DISTINCT a) |
+--------------------------+
| one,two                  |
+--------------------------+

-- But doesn't work with both of them
mysql> SELECT GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) FROM test;
+---------------------------------------------------+
| GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) |
+---------------------------------------------------+
| one                                               |
+---------------------------------------------------+

Suggested fix:
My guess would be in the way that DISTINCT determines if a result has been seen already or not.
[3 Mar 2007 7:53] Valeriy Kravchuk
Thank you for a problem report. Sorry, but I can not repeat the behaviour described with latest 5.0.38-BK on Linux:

openxs@suse:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.0.38 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table test(a char(20));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into test value ('one', 'two', 'one');
ERROR 1136 (21S01): Column count doesn't match value count at row 1
mysql> insert into test value ('one'), ('two'), ('one');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test;
+------+
| a    |
+------+
| one  |
| two  |
| one  |
+------+
3 rows in set (0.01 sec)

mysql> SELECT SUBSTRING_INDEX(a, '.', 1) FROM test;
+----------------------------+
| SUBSTRING_INDEX(a, '.', 1) |
+----------------------------+
| one                        |
| two                        |
| one                        |
+----------------------------+
3 rows in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) FROM test;
+------------------------------------------+
| GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+
1 row in set (0.01 sec)

mysql> SELECT GROUP_CONCAT(DISTINCT a) FROM test;
+--------------------------+
| GROUP_CONCAT(DISTINCT a) |
+--------------------------+
| one,two                  |
+--------------------------+
1 row in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) FROM test;
+---------------------------------------------------+
| GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) |
+---------------------------------------------------+
| one,two                                           |
+---------------------------------------------------+
1 row in set (0.01 sec)
[4 Mar 2007 1:57] Ryan Nevell
I forgot to include important version/OS information.

MySQL version: 5.1.15-beta-win32
OS: Window XP, Service Pack 2. 32-bit
[4 Mar 2007 8:35] Valeriy Kravchuk
Still can not repeat with latest 5.1.17-BK on Linux:

openxs@suse:~/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.17-beta Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table test(a char(20));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test value ('one'), ('two'), ('one');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test;
+------+
| a    |
+------+
| one  |
| two  |
| one  |
+------+
3 rows in set (0.00 sec)

mysql> SELECT SUBSTRING_INDEX(a, '.', 1) FROM test;
+----------------------------+
| SUBSTRING_INDEX(a, '.', 1) |
+----------------------------+
| one                        |
| two                        |
| one                        |
+----------------------------+
3 rows in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) FROM test;
+------------------------------------------+
| GROUP_CONCAT(SUBSTRING_INDEX(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(DISTINCT a) FROM test;
+--------------------------+
| GROUP_CONCAT(DISTINCT a) |
+--------------------------+
| one,two                  |
+--------------------------+
1 row in set (0.01 sec)

mysql> SELECT GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) FROM test;
+---------------------------------------------------+
| GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) |
+---------------------------------------------------+
| one,two                                           |
+---------------------------------------------------+
1 row in set (0.00 sec)

So, please, check with a newer version, 5.1.16 for Windows, and inform about the results.
[4 Mar 2007 18:43] Ryan Nevell
I've managed to reproduce this behavior on another Windows machine, on a fresh installation and different version of MySQL. So far, Linux and OS X don't seem to have this bug. But it looks like Windows versions 5.0 and 5.1 are both affected.

MySQL version: 5.0.27-community
OS version: Windows XP, Service Pack 2, 32-bit

Your MySQL connection id is 2 to server version: 5.0.27-community

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use test
Database changed
mysql> show tables;
Empty set (0.00 sec)

mysql> create table test (a text);
Query OK, 0 rows affected (0.09 sec)

mysql> insert into test VALUES ('one.1'), ('two.2'), ('one.3');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) FROM test;
+---------------------------------------------------+
| GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(a, '.', 1)) |
+---------------------------------------------------+
| one                                               |
+---------------------------------------------------+
1 row in set (0.02 sec)
[13 Mar 2007 14:06] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.16 and/or 5.0.37, and inform about the results.
[13 Mar 2007 17:39] Ryan Nevell
5.1.16-beta-community and 5.0.37-community both have this bug, too.

***************************************************************************

Server version: 5.1.16-beta-community MySQL Community Server (GPL)

mysql> create table t (a text);
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t values ("one.1"),("two.2"),("one.3");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select group_concat(substring_index(a, '.', 1)) from t;
+------------------------------------------+
| group_concat(substring_index(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> select group_concat(distinct substring_index(a, '.', 1)) from t;
+---------------------------------------------------+
| group_concat(distinct substring_index(a, '.', 1)) |
+---------------------------------------------------+
| one                                               |
+---------------------------------------------------+
1 row in set (0.00 sec)

**************************************************************************

Server version: 5.0.37-community MySQL Community Edition (GPL)

mysql> create table t (a text);
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t values ("one.1"),("two.2"),("one.3");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select group_concat(substring_index(a, '.', 1)) from t;
+------------------------------------------+
| group_concat(substring_index(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> select group_concat(distinct substring_index(a, '.', 1)) from t;
+---------------------------------------------------+
| group_concat(distinct substring_index(a, '.', 1)) |
+---------------------------------------------------+
| one                                               |
+---------------------------------------------------+
1 row in set (0.00 sec)
[14 Mar 2007 14:40] Valeriy Kravchuk
Verified just as described also with latest 5.0.38-BK on Linux:

openxs@suse:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.38 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t(a text);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values ("one.1"),("two.2"),("one.3");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select group_concat(substring_index(a, '.', 1)) from t;
+------------------------------------------+
| group_concat(substring_index(a, '.', 1)) |
+------------------------------------------+
| one,two,one                              |
+------------------------------------------+
1 row in set (0.01 sec)

mysql> select group_concat(distinct substring_index(a, '.', 1)) from t;
+---------------------------------------------------+
| group_concat(distinct substring_index(a, '.', 1)) |
+---------------------------------------------------+
| one                                               |
+---------------------------------------------------+
1 row in set (0.00 sec)

Not repeatable with current 5.1.17-BK, as I already noted.
[27 Mar 2007 16:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23043

ChangeSet@1.2421, 2007-03-27 19:28:04+03:00, gkodinov@magare.gmz +4 -0
  Bug #26815:
   When creating a temporary table the concise column type
   of a string expression is decided based on its length:
   - if its length is under 512 it is stored as either 
     varchar or char.
   - otherwise it is stored as a BLOB.
   
   There is a flag (convert_blob_length) to create_tmp_field 
   that, when >0 allows to force creation of a varchar if the
   max blob length is under convert_blob_length.
   However it must be verified that convert_blob_length 
   (settable through a SQL option in some cases) is 
   under the maximum that can be stored in a varchar column.
   While performing that check for expressions in 
   create_tmp_field_from_item the max length of the blob was
   used instead. This causes blob columns to be created in the
   heap temp table used by GROUP_CONCAT (where blobs must not
   be created in the temp table because of the constant 
   convert_blob_length that is passed to create_tmp_field() ).
   And since these blob columns are not expected in that place
   we get wrong results.
   Fixed by checking that the value of the flag variable is 
   in the limits that fit into VARCHAR instead of the max length
   of the blob column.
[31 Mar 2007 8:39] Bugs System
Pushed into 5.1.18-beta
[31 Mar 2007 8:44] Bugs System
Pushed into 5.0.40
[17 Apr 2007 19:52] Paul DuBois
Noted in 5.0.40, 5.1.18 changelogs.

Duplicates were not properly identified among (potentially) long
strings used as arguments for GROUP_CONCAT(DISTINCT).