Bug #41994 Optimizer do full table scan when use distinct with group by in subquery
Submitted: 9 Jan 2009 11:15 Modified: 9 Mar 2015 18:58
Reporter: Ben Li Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.30, 5.1.31, 6.0.8 OS:Any (test on linux/windows)
Assigned to: Evgeny Potemkin CPU Architecture:Any
Tags: distinct subquery groupby

[9 Jan 2009 11:15] Ben Li
Description:
A simple table:
CREATE TABLE `distinct_bug` (
  `defid` int(10) unsigned NOT NULL,
  `day` date NOT NULL,
  `uid` bigint(20) unsigned NOT NULL,
  `start_min` smallint(5) unsigned NOT NULL,
  `ipid` int(10) unsigned NOT NULL,
  PRIMARY KEY (`defid`,`day`,`uid`,`start_min`)
) ENGINE=InnoDB;

When do a query like:

select count(distinct uid) from distinct_bug where defid = 3 and day between between '2008-10-01' and '2008-10-31';

MySQL use index correctlly, and it's a ranged scan.

But when put the query int a subquery, MySQL do full table scan.

select * from (select count(distinct uid) from distinct_bug where defid = 3 and day between between '2008-10-01' and '2008-10-31') tmp;

This bug exists in innodb/myisam/falcon, in 5.1.30/5.1.31/6.0.5-windows.

How to repeat:
load the attachment distinct_bug.sql into test database

mysql> select count(1) from distinct_bug \G
count(1): 6474

mysql> select count(1) from distinct_bug where defid = 3 and day between '2008-10-01' and '2008-10-31' \G
count(1): 235

Only 235 line in range

mysql> desc select ipid, count(distinct uid) from distinct_bug where defid = 3 and day between '2008-10-01' and '2008-10-31' group by ipid \G
           id: 1
  select_type: SIMPLE
        table: distinct_bug
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 235
        Extra: Using where; Using filesort

mysql> desc select * from
(
select ipid, count(distinct uid) from distinct_bug
where defid = 3 and day between '2008-10-01' and '2008-10-31'
group by ipid
) tmp \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 198
        Extra:
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: distinct_bug
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6931
        Extra: Using filesort
2 rows in set (0.00 sec)

It seems MySQL do a full table scan!

mysql> desc select * from
(
select ipid, count(uid) from distinct_bug
where defid = 3 and day between '2008-10-01' and '2008-10-31'
group by ipid
) tmp \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 198
        Extra:
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: distinct_bug
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 235
        Extra: Using where; Using temporary; Using filesort
2 rows in set (0.00 sec)

Without distinct, it's all right.

I've to use sql below as a workaround

desc select * from
(
select ipid, count(uid) as uv from
  (
   select ipid, uid from distinct_bug
   where defid = 3 and day between '2008-10-01' and '2008-10-31'
   group by uid
  )  uidg
group by ipid having uv > 1
) tmp \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 13
        Extra:
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: <derived3>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 193
        Extra: Using temporary; Using filesort
*************************** 3. row ***************************
           id: 3
  select_type: DERIVED
        table: distinct_bug
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 235
        Extra: Using where; Using temporary; Using filesort
3 rows in set (0.00 sec)

At least it don't do a full table scan.
[9 Jan 2009 11:16] Ben Li
mysqldump file

Attachment: distinct_bug.sql (text/plain), 326.06 KiB.

[9 Jan 2009 16:46] Valeriy Kravchuk
Thank you for a problem report. Verified just as described on your test data, also - with 6.0.8 on Windows:

C:\Program Files\MySQL\MySQL Server 6.0\bin>mysql -uroot -proot -P3311 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 28
Server version: 6.0.8-alpha-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> explain select ipid, count(distinct uid) from distinct_bug where defid =
3 and day
    -> between '2008-10-01' and '2008-10-31' group by ipid \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: distinct_bug
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 235
        Extra: Using where; Using filesort
1 row in set (0.14 sec)

mysql> explain select * from (
    -> select ipid, count(distinct uid) from distinct_bug where defid = 3 and da
y
    -> between '2008-10-01' and '2008-10-31' group by ipid) tmp\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 198
        Extra:
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: distinct_bug
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6982
        Extra: Using filesort
2 rows in set (0.06 sec)

Table's engine does NOT matter.
[22 Aug 2014 20:11] Justin Swanhart
Not reproducible on 5.6.17
[22 Aug 2014 20:16] Justin Swanhart
Not reproducible on 5.5.37 either
[9 Mar 2015 18:58] Roy Lyseng
Bug is no longer reproducible.