Bug #47650 using group by with rollup without indexes returns incorrect results with where
Submitted: 25 Sep 2009 16:53 Modified: 12 Mar 2010 17:20
Reporter: Derrek Leute Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.0.X, 5.1.39, next-mr OS:Any (Mac OS X Snow leopard, Linux)
Assigned to: Martin Hansson CPU Architecture:Any
Tags: count, GROUP, rollup, where
Triage: Triaged: D2 (Serious)

[25 Sep 2009 16:53] Derrek Leute
Description:

The below assumes all innodb tables. Have not tested with other table types.

Using the following query:

SELECT e.employer_id AS employer_id, COUNT(distinct u.user_id) AS cnt FROM e_employer e LEFT JOIN e_user u ON e.employer_id = u.employer_id WHERE e.employer_id = 10006 GROUP BY e.employer_id with rollup LIMIT 100;

+-------------+------+
| employer_id |  cnt |
+-------------+------+
|       10006 | 9967 |
|        NULL | 9967 |
+-------------+------+

9967 is the total number of rows in u. The result should have been 71.

This problem only occurs when where = specifies a single row in e. "where in (10006,10000)" works as expected with or without indexes.

Adding an index on u.employer_id solved the problem. However, it doesn't seem like indexes should be required to make group by with rollup work correctly.

I noticed this because I have developed a graphical reporting tool for any data structure. Not every table is indexed at all, let alone properly.

It should also be noted that other ways to get the data i.e. count(*) where id = 10006 do work as expected with or without indexes.

How to repeat:

Tables A, B, and C.

Table A (At least two records)

id (primary key)

Table B (populated with records linking to all rows of table A)

id (primary key autoincrement)
a_id (id from table a *not* indexed.) 

Table C (Populated with references to all rows of table A)

id (primary key)
a_id (id from table A. indexed)

The following is pseudo code. See the description above for code that actually reproduced the issue.

SELECT A.id AS a_id, COUNT(distinct B.id) AS cnt FROM A A LEFT JOIN B B ON A.id = B.a_id WHERE A.id = 1 GROUP BY a.id with rollup LIMIT 100;

results for that join with table b = count of total records in table b.

however doing the same thing with table c (with the index) results in the intended result.

Suggested fix:
No idea.

Workaround is to add indexes for all joins. However, this seems like it should still work without indexes even if it is slower.
[25 Sep 2009 17:35] Godofredo Miguel Solorzano
Thank you for the bug report. Could you please provide the exactly server version which are you getting the bad results, complete test case create table and insert data statements, the result you got and the expected ones. Thanks in advance.
[25 Sep 2009 18:07] Derrek Leute
Below is a structure and data dump for this problem. It should be noted that the tables are MyISAM. So it isn't an innodb issue. Complete version numbers in the mysql dump below. This happened on 5.0 and 5.1 for me.

The difference between the two queries below is table b does not have an index on a_id and table c does. See the mysqldump5 below for details.

SELECT A.id AS a_id, COUNT(distinct B.id) AS cnt FROM A A LEFT JOIN B B ON A.id = B.a_id WHERE A.id = 1 GROUP BY a.id with rollup LIMIT 100;

+------+-----+
| a_id | cnt |
+------+-----+
|    1 |   8 |
| NULL |   8 |
+------+-----+

SELECT A.id AS a_id, COUNT(distinct C.id) AS cnt FROM A A LEFT JOIN C C ON A.id = C.a_id WHERE A.id = 1 GROUP BY a.id with rollup LIMIT 100;

+------+-----+
| a_id | cnt |
+------+-----+
|    1 |   3 |
| NULL |   3 |
+------+-----+

-- MySQL dump 10.13  Distrib 5.1.39, for apple-darwin10.0.0 (i386)
--
-- Host: localhost    Database: bugreport
-- ------------------------------------------------------
-- Server version	5.1.39

--
-- Table structure for table `a`
--

DROP TABLE IF EXISTS `a`;
CREATE TABLE `a` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

--
-- Dumping data for table `a`
--

LOCK TABLES `a` WRITE;
INSERT INTO `a` VALUES (1),(2),(3),(4),(5),(6),(7),(8);
UNLOCK TABLES;

--
-- Table structure for table `b`
--

DROP TABLE IF EXISTS `b`;
CREATE TABLE `b` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `a_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

--
-- Dumping data for table `b`
--

LOCK TABLES `b` WRITE;
INSERT INTO `b` VALUES (1,1),(2,1),(3,1),(4,2),(5,2),(6,2),(7,3),(8,3);
UNLOCK TABLES;

--
-- Table structure for table `c`
--

DROP TABLE IF EXISTS `c`;
CREATE TABLE `c` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `a_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ctable` (`a_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

--
-- Dumping data for table `c`
--

LOCK TABLES `c` WRITE;
INSERT INTO `c` VALUES (1,1),(2,1),(3,1),(4,2),(5,2),(6,2),(7,3),(8,3);
UNLOCK TABLES;
[26 Sep 2009 20:02] Sveta Smirnova
Thank you for the feedback.

Verified as described.
[10 Nov 2009 14:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89960

3191 Martin Hansson	2009-11-10
      Bug#47650: using group by with rollup without indexes
                 returns incorrect results with where
      
      A result set for a ROLLUP query outer-joining tables t1 and
      t2 turned t1 into a const table after applying the join
      condition, which in turn caused GROUP BY to be optimized
      away. A filtered table scan was then used on t2 in the
      join. The const access and lack of GROUP BY caused the
      optimizer to use filesort without temporary table to access
      t2. But the join condition, translated into a WHERE is lost
      in the optimization step. Filesort does no handle filtered
      table scans.
      
      Fixed by always sorting with a temporary table if a query
      contains ROLLUP and there is no key to access the joined
      table.
     @ mysql-test/r/func_gconcat.result
        Bug#47650: Expected change in test result
     @ mysql-test/r/group_by.result
        Bug#47650: Test result
     @ mysql-test/r/olap.result
        Bug#47650: Expected change in test result
     @ mysql-test/t/group_by.test
        Bug#47650: Test case
     @ sql/sql_select.cc
        Bug#47650: Fix and organizing surrounding code according to
        coding standard
[16 Nov 2009 11:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90498

3200 Martin Hansson	2009-11-16
      Bug#47650: using group by with rollup without indexes returns
          incorrect results with where : worked on fix.
      
      An ugly hack that serves to prove a point. If this ugly hack 
      turns out to do the job, we could base the fix on it once we
      implement it properly.
[11 Dec 2009 14:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93695

3200 Martin Hansson	2009-12-11
      Bug#47650: using group by with rollup without indexes
      returns incorrect results with where
      
      An outer join of a const table and a normal table with GROUP
      BY would optimize away GROUP BY, and thus trigger the
      optimization to do away with a temporary table if grouping
      was performed on columns from the const table, executing
      the query with filesort without temporary table. But this
      should not be done if there is a non-indexed access to the
      joined table, since filesort expects either ref access, range
      ditto or table scan. Any WHERE or join condition is lost
      after optimization and it is re-created only for the
      aforementioned key-based access methods.
      
      Fixed by always forcing execution with temporary table in
      the case of ROLLUP with a query involving outer join. This
      is a slightly broader class of queries than need fixing, but
      it is hard to ascertain the position of a ROLLUP field wrt
      outer join with current query representation.
     @ mysql-test/r/join_outer.result
        Bug#47650: Test result
     @ mysql-test/t/join_outer.test
        Bug#47650: Test case
     @ sql/sql_select.cc
        Bug#47650: Fix
[14 Dec 2009 10:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93878

3200 Martin Hansson	2009-12-14
      Bug#47650: using group by with rollup without indexes
      returns incorrect results with where
      
      An outer join of a const table (outer) and a normal table
      (inner) with GROUP BY on a field from the outer table would
      optimize away GROUP BY, and thus trigger the optimization to
      do away with a temporary table if grouping was performed on
      columns from the const table, hence executing the query with
      filesort without temporary table. But this should not be
      done if there is a non-indexed access to the inner table,
      since filesort does not handle joins. It expects either ref
      access, range ditto or table scan. The join condition will
      thus not be applied.
      
      Fixed by always forcing execution with temporary table in
      the case of ROLLUP with a query involving an outer join. This
      is a slightly broader class of queries than need fixing, but
      it is hard to ascertain the position of a ROLLUP field wrt
      outer join with current query representation.
     @ mysql-test/r/join_outer.result
        Bug#47650: Test result
     @ mysql-test/t/join_outer.test
        Bug#47650: Test case
     @ sql/sql_select.cc
        Bug#47650: Fix
[17 Dec 2009 9:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94694

3282 Martin Hansson	2009-12-17
      Bug#47650: using group by with rollup without indexes
      returns incorrect results with where
      
      An outer join of a const table (outer) and a normal table
      (inner) with GROUP BY on a field from the outer table would
      optimize away GROUP BY, and thus trigger the optimization to
      do away with a temporary table if grouping was performed on
      columns from the const table, hence executing the query with
      filesort without temporary table. But this should not be
      done if there is a non-indexed access to the inner table,
      since filesort does not handle joins. It expects either ref
      access, range ditto or table scan. The join condition will
      thus not be applied.
      
      Fixed by always forcing execution with temporary table in
      the case of ROLLUP with a query involving an outer join. This
      is a slightly broader class of queries than need fixing, but
      it is hard to ascertain the position of a ROLLUP field wrt
      outer join with current query representation.
     @ mysql-test/r/join_outer.result
        Bug#47650: Test result
     @ mysql-test/t/join_outer.test
        Bug#47650: Test case
     @ sql/sql_select.cc
        Bug#47650: Fix
[17 Dec 2009 10:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94702

3787 Martin Hansson	2009-12-17 [merge]
      Merge of fix for Bug#47650
[19 Dec 2009 8:27] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091219082307-f3i4fn0tm8trb3c0) (version source revid:alik@sun.com-20091217193315-s3ck6ltg8m1zz6r3) (merge vers: 6.0.14-alpha) (pib:15)
[19 Dec 2009 8:31] Bugs System
Pushed into 5.5.1-m2 (revid:alik@sun.com-20091219082021-f34nq4jytwamozz0) (version source revid:alexey.kopytov@sun.com-20091217150904-ugetvos7v0bo83qe) (merge vers: 5.5.0-beta) (pib:15)
[19 Dec 2009 8:35] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091219082213-nhjjgmphote4ntxj) (version source revid:alik@sun.com-20091217191654-rb9sci536v96viaz) (pib:15)
[15 Jan 2010 8:58] Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:jimw@mysql.com-20091217191514-t4zu0fd4z9534u1q) (merge vers: 5.1.42) (pib:16)
[22 Jan 2010 1:06] Paul Dubois
Noted in 5.1.43, 5.5.1, 6.0.14 changelogs.

Queries containing GROUP BY ... WITH ROLLUP that did not use indexes
could return incorrect results.  

Setting report to NDI pending push to Celosia.
[12 Mar 2010 14:10] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:26] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:40] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[12 Mar 2010 17:20] Paul Dubois
Fixed in earlier 5.1.x, 5.5.x.