Bug #16705 User variable in GROUP BY causes unpredictable results
Submitted: 22 Jan 2006 1:09 Modified: 1 Jun 2006 12:06
Reporter: Kolbe Kegel Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.19 OS:Linux (Linux)
Assigned to: Georgi Kodinov CPU Architecture:Any

[22 Jan 2006 1:09] Kolbe Kegel
Description:
Using a user variable in a GROUP BY clause can cause unexpected and erroneous results.

How to repeat:
Load dump.sql

Execute these queries:

    SELECT col1,                                                                                                                               
           @var:=col2 AS var                                                                                                                   
      FROM t1                                                                                                                                  
      GROUP BY col1, @var;                                                                                                                     

Note the duplication of the value "117" in the "var" column.

+------+-----+
| col1 | var |
+------+-----+
| A    | 117 |
| A    | 117 |
| A    | 117 |
| B    | 117 |
| B    | 118 |
| B    | 119 |
| C    | 117 |
| C    | 118 |
| C    | 119 |
| D    | 117 |
| D    | 118 |
| D    | 119 |
| E    | 117 |
| E    | 118 |
| E    | 119 |
+------+-----+

    SELECT col1,                                                                                                                               
           @var:=col2 AS var                                                                                                                   
      FROM t1                                                                                                                                  
      GROUP BY col1, var;                                                                                                                     

Using the alias instead of the variable itself in the GROUP BY clause leads to the expected resultset.

+------+-----+
| col1 | var |
+------+-----+
| A    | 117 |
| A    | 118 |
| A    | 119 |
| B    | 117 |
| B    | 118 |
| B    | 119 |
| C    | 117 |
| C    | 118 |
| C    | 119 |
| D    | 117 |
| D    | 118 |
| D    | 119 |
| E    | 117 |
| E    | 118 |
| E    | 119 |
+------+-----+

    SELECT col1,                                                                                                                               
           @var:=col2 AS var,
           count(*)                                                                                                                            
      FROM t1                                                                                                                                  
      GROUP BY col1, @var;                                                                                                                                                                                                                                                                    

Not sure how to explain this one... adding count(*) sends everything into disarray.

+------+-----+----------+
| col1 | var | count(*) |
+------+-----+----------+
| A    | 117 |      599 |
| A    | 117 |        1 |
| B    | 117 |      600 |
| C    | 117 |      600 |
| D    | 117 |      600 |
| E    | 117 |      600 |
+------+-----+----------+

In other test datasets, using @var in the GROUP BY clause led to the var column being unsorted in the resultset.

Suggested fix:
Using a user variable in the GROUP BY clause should function as expected.
[22 Jan 2006 1:10] Kolbe Kegel
SQL Dump of test data

Attachment: dump.sql (application/octet-stream, text), 30.82 KiB.

[29 Apr 2006 13:08] Sergey Petrunya
It is not clear how should the queries like
SELECT col1, @var:=col2 AS var FROM t1 GROUP BY col1, @var;
be executed. I can think of two possible interpetations: 
 1) "@var:=col2" is performed "before" the GROUP BY operation (like aliasing). 
    In the above query "GROUP BY col1, @var" will be treated in the same way as
    "GROUP BY col1, col2".
 2) "@var:=col2" is performed "after" the GROUP BY operation, and "GROUP BY col1, @var"
    will have the same effect as "GROUP BY col1, some-constant".

I'm inclined towards #1, but I'll need to check it with PeterG.
[29 Apr 2006 13:11] Sergey Petrunya
While analyzing, I've noticed the following:
1. start the server
2. Run "SELECT col1, @var:=col2 AS var FROM t1 GROUP BY col1, @var" 
    The resultset has 16 rows, MyISAM temporary table is used
3. Run "SELECT col1, @var:=col2 AS var FROM t1 GROUP BY col1, @var"  again
    The resultset has 16 rows, HEAP temporary table is used.

We should find the cause of this difference and eliminate it.
[4 May 2006 20:36] David Bagnara
I have similar problems when using a field alias. I have two identical tables in different versions of mysql one being 4.1.12-nt and the other 5.0.20a-standard.

The summary of the problem is if I execute an identical query on both platforms I get differing results. The results for 5.0.20a-standard are wrong. I think the problem stems from using an alias for a field then using that in the group by clause.

i.e.  SELECT 'sometext' as A ..... GROUP BY A.......

I experienced the problem in another context with a field aliased. I fixed the problem by using the real field name, not the aliased field name in the group by clause.
e.g. Select `a really bad column name` AS A ..... GROUP BY A  ......
didn't work while
e.g. Select `a really bad column name` AS A ..... GROUP BY `a really bad column name`
worked as I expected.

 
Following is a real example of the wrong and correct operation.

============================
Wrong results from version 5
============================

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 353 to server version: 5.0.20a-standard

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use data ;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> SELECT 'All Stores' AS DealerCode, Period, Status, count(ID) AS Total
FROM tbldcr t group by DealerCode, Period, Status order by DealerCode, Period desc
limit 10 ;
+------------+--------+--------+-------+
| DealerCode | Period | Status | Total |
+------------+--------+--------+-------+
| All Stores | 200603 | Cancel | 3     |
| All Stores | 200603 | Active | 9     |
| All Stores | 200603 | Active | 284   |
| All Stores | 200603 | Suspen | 21    |
| All Stores | 200603 | Active | 42    |
| All Stores | 200603 | Suspen | 22    |
| All Stores | 200603 | Active | 140   |
| All Stores | 200603 | Suspen | 2     |
| All Stores | 200603 | Cancel | 41    |
| All Stores | 200603 | Cancel | 10    |
+------------+--------+--------+-------+
10 rows in set, 1 warning (0.69 sec)

==============================
Correct results from version 4
==============================

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 820 to server version: 4.1.12-nt

mysql> use data ;
Database changed
mysql> SELECT 'All Stores' AS DealerCode, Period, Status, count(ID) AS Total
FROM tbldcr t group by DealerCode, Period, Status order by DealerCode, Period desc
limit 10 ;
+------------+--------+--------+-------+
| DealerCode | Period | Status | Total |
+------------+--------+--------+-------+
| All Stores | 200603 | Active |  2028 |
| All Stores | 200603 | Cancel |   558 |
| All Stores | 200603 | Pendin |     1 |
| All Stores | 200603 | Suspen |   149 |
| All Stores | 200602 | Active |  1742 |
| All Stores | 200602 | Cancel |   554 |
| All Stores | 200602 | Pendin |     1 |
| All Stores | 200602 | Suspen |   208 |
| All Stores | 200601 | Active |  2285 |
| All Stores | 200601 | Cancel |   665 |
+------------+--------+--------+-------+
10 rows in set (0.58 sec)
[1 Jun 2006 12:06] Georgi Kodinov
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

According to section 9.3 of the Reference Manual : 
"Note: In a SELECT statement, each expression is evaluated only when sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY clause, you cannot refer to an expression that involves variables that are set in the SELECT list."