Bug #58782 Missing rows with SELECT .. WHERE .. IN subquery with full GROUP BY and no aggr
Submitted: 7 Dec 2010 10:46 Modified: 13 Jan 2011 9:45
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:bzr_WL1393 OS:Any
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: wl1393

[7 Dec 2010 10:46] John Embretsen
Description:
When executing a query with GROUP BY on all selected columns, where the query is part of a larger SELECT .. WHERE .. IN () query, the server returns fewer rows than expected with the implementation for WL#1393 (Filesort optimization) than without.

Consider the following base query:

SELECT pk AS field1, col_int_nokey AS field2 
FROM t1 
WHERE col_int_nokey > 0
GROUP BY field1, field2;

Let's say it returns the following data:

+--------+--------+
| field1 | field2 |
+--------+--------+
|     10 |      7 |
|     11 |      1 |
|     12 |      5 |
|     13 |      3 |
+--------+--------+

If we create a table "where_subselect" containing exactly this data and then SELECT * from that table using the original query in a WHERE IN() construct, we should get the same results:

mysql> SELECT * FROM where_subselect;
+--------+--------+
| field1 | field2 |
+--------+--------+
|     10 |      7 |
|     11 |      1 |
|     12 |      5 |
|     13 |      3 |
+--------+--------+
4 rows in set (0.00 sec)

mysql> SELECT * 
    -> FROM where_subselect
    -> WHERE (field1, field2) IN (
    ->   SELECT pk AS field1, col_int_nokey AS field2
    ->   FROM t1
    ->   WHERE col_int_nokey > 0
    ->   GROUP BY field1, field2
    -> );
+--------+--------+
| field1 | field2 |
+--------+--------+
|     10 |      7 |
|     11 |      1 |
|     12 |      5 |
|     13 |      3 |
+--------+--------+
4 rows in set (0.00 sec)

However, with the implementation of WL#1393 we get (from the final SELECT):

+--------+--------+
| field1 | field2 |
+--------+--------+
|     10 |      7 |
+--------+--------+
1 row in set (0.00 sec)

that is, only the first matching row.

If using SELECT DISTINCT and removing the GROUP BY clause, the results are as expected (all distinct rows are returned). The same goes for similar queries with one aggregate column and one GROUP BY column.

Tested code:

revision-id: tor.didriksen@oracle.com-20101206094649-dwt7fkm6m0esgyzv
date: 2010-12-06 10:46:49 +0100
revno: 3254
branch-nick: next-mr-opt-team-wl1393-merge

The issue seems to be isolated to the WL#1393 implementation - mysql-next-mr-opt-team and mysql-trunk-bugfixing are not affected.

How to repeat:
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS where_subselect;

CREATE TABLE t1 (
  pk INT NOT NULL,
  col_int_nokey INT,
  PRIMARY KEY (pk)
);

INSERT INTO t1 VALUES (10,7);
INSERT INTO t1 VALUES (11,1);
INSERT INTO t1 VALUES (12,5);
INSERT INTO t1 VALUES (13,3);

-- original query:

SELECT pk AS field1, col_int_nokey AS field2 
FROM t1 
WHERE col_int_nokey > 0
GROUP BY field1, field2;

-- store query results in a new table:

CREATE TABLE where_subselect
  SELECT pk AS field1, col_int_nokey AS field2
  FROM t1
  WHERE col_int_nokey > 0
  GROUP BY field1, field2
;

-- query the new table and compare to original using WHERE ... IN():

SELECT * 
FROM where_subselect
WHERE (field1, field2) IN (
  SELECT pk AS field1, col_int_nokey AS field2
  FROM t1
  WHERE col_int_nokey > 0
  GROUP BY field1, field2
);

-- note that you get fewer rows than expected
[9 Dec 2010 12:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126411

3258 Tor Didriksen	2010-12-09
      Bug #58782 Missing rows with SELECT .. WHERE .. IN subquery with full GROUP BY and no aggr
      
      The GROUP BY is optimized away earlier than any example queries seen so far:
      In JOIN::optimize()
              We have found that grouping can be removed since groups correspond to
              only one row anyway, but we still have to guarantee correct result
              order. The line below effectively rewrites the query from GROUP BY
              <fields> to ORDER BY <fields>.
      So, the handling of GROUP BY to disable PQ in JOIN::exec does not catch it.
      This is a DEPENDENT SUBQUERY, which is executed once for each outer row.
      This means we have to return all rows each time we do filesort,
      i.e. we must set limit == HA_POS_ERROR, and disable PQ.
     @ mysql-test/r/group_by.result
        New test case.
     @ mysql-test/r/order_by_icp_mrr.result
        New (correct) result.
     @ mysql-test/r/order_by_none.result
        New (correct) result.
     @ mysql-test/t/group_by.test
        New test case.
     @ sql/sql_select.cc
        If the query as a GROUP BY, then remember that fact in JOIN::prepare.
        Use this fact in JOIN::exec, to disable PQ.
     @ sql/sql_select.h
        Rename group to has_group_by, for readability, add had_group_by.
[10 Dec 2010 15:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126532

3258 Tor Didriksen	2010-12-10
      Bug #58782 Missing rows with SELECT .. WHERE .. IN subquery with full GROUP BY and no aggr
      
      The missing results were caused by a 'group by' which was transformed to an 'order by'.
     @ mysql-test/r/explain.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/group_by.result
        New test case.
     @ mysql-test/r/myisam_mrr.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/myisam_mrr_cost.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/myisam_mrr_cost_icp.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/myisam_mrr_icp.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/myisam_mrr_none.result
        New explain result, the group by can be eliminated.
     @ mysql-test/r/order_by_icp_mrr.result
        New (correct) result.
     @ mysql-test/r/order_by_none.result
        New (correct) result.
     @ mysql-test/t/group_by.test
        New test case.
     @ sql/sql_select.cc
        If the query as a GROUP BY, which can be converted to an ORDER BY,
        we can eliminate it for subqueries.