Bug #75435 MAX() causes a full table scan when used in a sub-query
Submitted: 7 Jan 2015 14:59 Modified: 12 Jan 2015 18:42
Reporter: Van Stokes Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.21 OS:Any
Assigned to: CPU Architecture:Any

[7 Jan 2015 14:59] Van Stokes
Description:
MySQL Server: 5.6.21
OS: Ubuntu 12.04 / 14.04

MAX() causes a full table scan when used in a sub-query.

How to repeat:
Regular query - works as expected.

mysql> EXPLAIN EXTENDED
    -> SELECT
    -> MAX( fcl_bl.Bol )
    -> FROM eci_lw_prod.fcl_bl
    -> WHERE fcl_bl.sail_date BETWEEN DATE_FORMAT( '20141201', '%Y-%m-%d 00:00:00' )
    -> AND DATE_FORMAT( '20141231', '%Y-%m-%d 23:59:59' )
    -> AND fcl_bl.filetype != 'I'
    -> AND ( fcl_bl.void IS NULL OR fcl_bl.void = 'N' )
    -> GROUP BY fcl_bl.file_no
    -> \g
+----+-------------+--------+-------+---------------------------+--------------+---------+------+------+----------+---------------------------------------------------------------------+
| id | select_type | table  | type  | possible_keys             | key          | key_len | ref  | rows | filtered | Extra                                                               |
+----+-------------+--------+-------+---------------------------+--------------+---------+------+------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | fcl_bl | range | fcl_bl_idx01,fcl_bl_idx03 | fcl_bl_idx03 | 6       | NULL | 1536 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+--------+-------+---------------------------+--------------+---------+------+------+----------+---------------------------------------------------------------------+
1 row in set, 1 warning (0.02 sec)

When used in a sub-query, PRIMARY is a full-table scan. Why?

mysql> EXPLAIN EXTENDED
    -> SELECT
    -> fcl_bl.BolId,
    -> fcl_bl.file_number
    -> FROM fcl_bl
    -> WHERE
    -> fcl_bl.Bol IN (
    -> SELECT
    -> MAX( fcl_bl.Bol )
    -> FROM fcl_bl
    -> WHERE
    -> fcl_bl.sail_date BETWEEN DATE_FORMAT( '20141201', '%Y-%m-%d 00:00:00' )
    -> AND DATE_FORMAT( '20141231', '%Y-%m-%d 23:59:59' )
    -> AND fcl_bl.filetype != 'I'
    -> AND ( fcl_bl.void IS NULL OR fcl_bl.void = 'N' )
    -> GROUP BY fcl_bl.file_no
    -> )
    -> \g
+----+-------------+--------+-------+---------------------------+--------------+---------+------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table  | type  | possible_keys             | key          | key_len | ref  | rows  | filtered | Extra                                                               |
+----+-------------+--------+-------+---------------------------+--------------+---------+------+-------+----------+---------------------------------------------------------------------+
|  1 | PRIMARY     | fcl_bl | ALL   | NULL                      | NULL         | NULL    | NULL | 49098 |   100.00 | Using where                                                         |
|  2 | SUBQUERY    | fcl_bl | range | fcl_bl_idx01,fcl_bl_idx03 | fcl_bl_idx03 | 6       | NULL |  1536 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+--------+-------+---------------------------+--------------+---------+------+-------+----------+---------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

Suggested fix:
The outer query WHERE IN should only be using the results from the inner sub-query.

SHOW CREATE TABLE fcl_bl;

CREATE TABLE `fcl_bl`
......
  PRIMARY KEY (`Bol`),
  UNIQUE KEY `fcl_bl_idx02` (`BolId`),
  KEY `fcl_bl_idx01` (`file_no`),
  KEY `fcl_bl_idx03` (`sail_date`),
  KEY `fcl_bl_idx04` (`file_number`),
  KEY `fcl_bl_idx05` (`ready_to_post`,`manifested_date`),
  KEY `fcl_bl_idx06` (`Shipper_No`),
  KEY `fcl_bl_idx07` (`House_Shipper_No`),
  KEY `fcl_bl_idx08` (`consignee_No`),
  KEY `fcl_bl_idx09` (`houseConsignee`),
  KEY `fcl_bl_idx10` (`notify_party`),
  KEY `fcl_bl_idx11` (`house_notify_party_no`),
  KEY `fcl_bl_idx12` (`Forward_agent_No`),
  KEY `fcl_bl_idx13` (`ssline_no`),
  KEY `fcl_bl_idx14` (`billTrdPrty`),
  KEY `fcl_bl_idx15` (`agent_no`),
  KEY `fcl_bl_idx16` (`eta`)
) ENGINE=InnoDB AUTO_INCREMENT=103835 DEFAULT CHARSET=latin1 |
[12 Jan 2015 15:06] MySQL Verification Team
This is expected behavior. If you explain both EXPLAINs you shall see that query node as a nested query and standalone query have 100 % identical execution plan.

Your problem is that highest level query gets a full table scan.  That is actually a cause of the design problem that was introduced when nested queries were developed, which affects a large number of queries that use nested queries. There are plans to solve this problem, but scheduling is yet unknown.

The only thing that you can try is to have an index on the column `Bol` and to  avoid nested query by using prepared statements or stored function that will create a string with values in brackets returned by your inner query as standalone one.
[12 Jan 2015 18:16] Van Stokes
Thank you for your reply.

I am intrigued by this:

"..stored function that will create a string with values in brackets.."

Would you please provide an example SELECT statement showing how to use string values in brackets?
[12 Jan 2015 18:42] Van Stokes
And to anyone else who finds this, I re-wrote the query like this and it vastly improved performance:

SELECT
 fcl_bl.BolId,
 fcl_bl.file_number
FROM
 (
  SELECT
   MAX( fcl_bl.Bol ) AS 'bol'
  FROM fcl_bl
  WHERE
   fcl_bl.sail_date BETWEEN DATE_FORMAT( '20141201', '%Y-%m-%d 00:00:00' )
   AND DATE_FORMAT( '20141231', '%Y-%m-%d 23:59:59' )
   AND fcl_bl.filetype != 'I'
   AND ( fcl_bl.void IS NULL OR fcl_bl.void = 'N' )
  GROUP BY fcl_bl.file_no
 ) t,
 fcl_bl
WHERE
 fcl_bl.Bol = t.bol
;
[14 Jan 2015 15:54] MySQL Verification Team
SELECT ...... WHERE Bol IN (1,2,3,4,5);
[14 Jan 2015 15:56] MySQL Verification Team
Actually, you would have two strings, one with "SELECT .... IN " and one with "(1,2,3,4,5)", you would concatenate those two strings and execute the statement.