Bug #38217 GROUP BY with subquery fails.
Submitted: 17 Jul 2008 21:35 Modified: 9 Nov 2009 20:14
Reporter: Adam Dixon Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.60, 5.0.64, 5.1.28 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D2 (Serious) / R4 (High) / E3 (Medium)

[17 Jul 2008 21:35] Adam Dixon
Description:
Looks like a bug with using sub query result column to group by. Placing the full sub query in the group by instead of alias seems to work around this.

How to repeat:
Load data:

CREATE TABLE `incidents` (
`i_id` int(11) NOT NULL default '0',
`prod_lvl1_id` smallint(6) default NULL,
UNIQUE KEY `incidents$i_id` (`i_id`)
) ENGINE=InnoDB;

INSERT INTO `incidents` VALUES (439457,1),(443044,13),(480829,13),(493066,13),(535633,2),(539801,895),(541843,2),(543433,895),(546699,895),(550893,13),(553707,895),(557589,895),(559215,2),(561682,895),(563442,895),(564681,895),(564685,895),(567049,895),(570629,895),(572890,895);

CREATE TABLE `labels` (
`tbl` smallint(6) NOT NULL default '0',
`label_id` int(11) NOT NULL default '0',
`lang_id` smallint(6) NOT NULL default '0',
`fld` smallint(6) NOT NULL default '0',
`label` mediumtext collate utf8_unicode_ci,
UNIQUE KEY `labels$id$tbl$lang_id$fld` (`label_id`,`tbl`,`lang_id`,`fld`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `labels` VALUES (65,1,1,1,'PRODA'),(65,2,1,1,'PRODB'),(65,13,1,1,'PRODC'),(65,14,1,1,'PRODD'),(65,17,1,1,'PRODE'),(65,18,1,1,'PRODF'),(65,19,1,1,'PRODG'),(65,20,1,1,'PRODH'),(65,26,1,1,'PRODI'),(65,179,1,1,'PRODJ'),(65,205,1,1,'PRODK'),(65,895,1,1,'PRODL');

This query gives correct results:

SELECT (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1 ) col1 , COUNT(*) col2, COUNT((incidents.i_id)) col3 FROM incidents GROUP BY col1;
+-------+------+------+
| col1 | col2 | col3 |
+-------+------+------+
| PRODA | 1 | 1 |
| PRODB | 3 | 3 |
| PRODC | 4 | 4 |
| PRODL | 12 | 12 |
+-------+------+------+

Now let me add a DISTINCT to col3 which results in invalid results:

SELECT (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1 ) col1 , COUNT(*) col2, COUNT(DISTINCT(incidents.i_id)) col3 FROM incidents GROUP BY col1;
+-------+------+------+
| col1 | col2 | col3 |
+-------+------+------+
| PRODL | 1 | 1 |
| PRODL | 19 | 19 |
+-------+------+------+

Now let's paste the details of col1 into the group by instead of using the alias col1 which will also give correct results:

SELECT (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1 ) col1 , COUNT(*) col2, COUNT(DISTINCT(incidents.i_id)) col3 FROM incidents GROUP BY (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1 );
+-------+------+------+
| col1 | col2 | col3 |
+-------+------+------+
| PRODA | 1 | 1 |
| PRODB | 3 | 3 |
| PRODC | 4 | 4 |
| PRODL | 12 | 12 |
+-------+------+------+

Can get correct results again if you add the full sub query in the group by...;
SELECT (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1 ) col1 , COUNT(*) col2, COUNT(DISTINCT(incidents.i_id)) col3 FROM incidents GROUP BY (SELECT label FROM labels WHERE label_id = incidents.prod_lvl1_id AND tbl = 65 AND fld = 1 AND lang_id = 1);
+-------+------+------+
| col1  | col2 | col3 |
+-------+------+------+
| PRODA |    1 |    1 | 
| PRODB |    3 |    3 | 
| PRODC |    4 |    4 | 
| PRODL |   12 |   12 | 
+-------+------+------+
4 rows in set (0.00 sec)
[24 Jul 2008 0:01] Adam Dixon
Verified in 5.1 bzr too.
[12 Nov 2008 15:55] Georgi Kodinov
The problem here is as follows :
A subquery's external references are compiled differently dependent on where the subquery is referenced from the outer query when the outer query is a grouping one :
 - Item_outer_ref if the subquery is referenced from the HAVING or the SELECT list
 - Item_field otherwise.

Now consider queries that reference a subquery from the SELECT list by name. These references are not copying the subquery, but use a single subquery (with Item_outer_ref for the external reference) instead and evaluate it in both a SELECT list context and GROUP BY context.
This leads to wrong data being used for the outer references in the subquery.
[23 Mar 2009 15:21] Georgi Kodinov
This bug is a regression from the fix for bug #27321
[3 Apr 2009 14:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71326

2743 Georgi Kodinov	2009-04-03
      Bug #38217: GROUP BY with subquery fails.
      
      When a subquery references fields from the outer context 
      the server creates either an Item_outer_ref that expands to 
      either Item_ref (pointing to the last row of the previous 
      group in a GROUP BY context) or Item_direct_ref (pointing 
      to the first row of the new group in a GROUP BY context).
      This causes wrong results with scalar subqueries in the SELECT 
      list that are referenced e.g. by name from the GROUP BY list.
      For these subqueries it's not possible to make both 
      Item_ref (because the sub-query is in the SELECT list)
      and Item_direct_ref (because the same subquery is referenced
      through an Item_field from the GROUP BY field) for a single 
      copy of Item_outer_ref in the subquery.
      No wrong results are returned when temporary table is used.
      Temporary table approach works because the scalar
      subquery from the GROUP BY is evaluated only once and 
      stored in the temporary table.
      We have two approaches to fix this:
      1. Make sure we copy the entire tree of the subquery when it's
      referenced in GROUP BY, thus effectively transforming :
      SELECT (SELECT a FROM t2 WHERE b = t1.a) c1, COUNT(*) FROM t1 
        GROUP BY c1
      to
      SELECT (SELECT a FROM t2 WHERE b = t1.a) c1, COUNT(*) FROM t1 
        GROUP BY (SELECT a FROM t2 WHERE b = t1.a)
      and processing this as usual.
      
      2. Make sure scalar subqueries in GROUP BY/ORDER BY always 
      trigger using temporary table.
      
      This fix implements 2).
      This is done to avoid multiple re-calculation of the scalar
      subquery during the calculation of GROUP BY/ORDER BY.
      MySQL is already doing similar thing for UDFs and 
      stored procedures.
     @ mysql-test/r/subselect.result
        Bug #38217: 
         - test case
         - fixed a non-sorted query result 
           (caused by a query that has ORDER BY <const>)
     @ mysql-test/t/subselect.test
        Bug #38217: 
         - test case
         - fixed a non-sorted query result 
           (caused by a query that has ORDER BY <const>)
     @ sql/item_subselect.h
        Bug #38217: subqueries are expensive expressions
     @ sql/sql_select.cc
        Bug #38217: check GROUP BY for expensive expressions as well.
[9 Nov 2009 20:14] Gleb Shchepa
This bug duplicates the bug #45640.