Bug #32942 now() - interval '7200' second is NOT pre-calculated, causing "full table scan"
Submitted: 3 Dec 2007 19:20 Modified: 15 Mar 2008 7:35
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.45 OS:Any
Assigned to: Alexey Botchkov CPU Architecture:Any
Tags: regression
Triage: D3 (Medium)

[3 Dec 2007 19:20] Valeriy Kravchuk
Description:
The following query became less efficient in 5.0.45 comparing to 4.1.18. The explain plan is still showing index range scan but SHOW INNODB STATUS is showing 16 million record selected while in 4.1.18 the same query shows 39K rows selected in engine status.

Here is the query:

SELECT DISTINCT PARTNER_LOCATION_CODE AS name
FROM RC_REQUEST r
LEFT JOIN RC_STAT s USING(SESSION_ID)
WHERE r.CREATION_DATE_TIME >= now() - INTERVAL '7200' SECOND AND substring_index(s.SOURCE,'_',-1) = 'NOVUS78'
AND r.PARTNER_CODE = 'ZA'

Explain plan in 4.1
-------------------
mysql> explain SELECT DISTINCT PARTNER_LOCATION_CODE AS name FROM RC_REQUEST r LEFT JOIN RC_STAT s USING(SESSION_ID) WHERE r.CREATION_DATE_TIME >= now() - INTERVAL '7200' SECOND AND substring_index(s.SOURCE,'_',-1) = 'NOVUS78' AND r.PARTNER_CODE = 'ZA';
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
| 1 | SIMPLE | r | range | IDX_CREATION_DATE_TIME | IDX_CREATION_DATE_TIME | 4 | NULL | 44840 | Using where; Using temporary |
| 1 | SIMPLE | s | ref | IDX_RC_STAT_SESSION_ID | IDX_RC_STAT_SESSION_ID | 255 | BAM.r.SESSION_ID | 4 | Using where; Distinct |
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
2 rows in set (0.00 sec)

Explain plan in 5.0
-------------------
mysql> explain SELECT DISTINCT PARTNER_LOCATION_CODE AS name FROM RC_REQUEST r LEFT JOIN RC_STAT s USING(SESSION_ID) WHERE r.CREATION_DATE_TIME >= now() - INTERVAL '7200' SECOND AND substring_index(s.SOURCE,'_',-1) = 'NOVUS78' AND r.PARTNER_CODE = 'ZA';
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
| 1 | SIMPLE | r | range | IDX_CREATION_DATE_TIME | IDX_CREATION_DATE_TIME | 4 | NULL | 31740 | Using where; Using temporary |
| 1 | SIMPLE | s | ref | IDX_RC_STAT_SESSION_ID | IDX_RC_STAT_SESSION_ID | 257 | BAM.r.SESSION_ID | 4 | Using where; Distinct |
+----+-------------+-------+-------+------------------------+------------------------+---------+------------------+-------+------------------------------+
2 rows in set (0.00 sec)

In 4.1.x:

Before:
Number of rows inserted 508379614, updated 26069, deleted 492909835, read 1224147483
After:
Number of rows inserted 508387007, updated 26069, deleted 492909835, read 1224186812

Delta in read = 1224186812-1224147483 = 39329

In 5.0.x:

Before:
Number of rows inserted 172600496, updated 0, deleted 149796007, read 588748246
After:
Number of rows inserted 172600496, updated 0, deleted 149796007, read 605492489
Delta in read = 605492489-588748246 = 16744243

All the data and results are the same. The difference is explained, I think, by the following EXPLAIN EXTENDED + SHOW WARNINGS results:

From 4.1.x instance ..

mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select distinct sql_no_cache BAM.r.PARTNER_LOCATION_CODE AS
`name` from BAM.RC_REQUEST r join BAM.RC_STAT s where
((BAM.r.CREATION_DATE_TIME >= 20071128105537) and
(substr_index(BAM.s.SOURCE,_latin1'_',-(1)) = _latin1'NOVUS78') and
(BAM.r.PARTNER_CODE = _latin1'ZA') and (BAM.r.SESSION_ID =
BAM.s.SESSION_ID))
1 row in set (0.00 sec)

From 5.0.45 Instance

mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select distinct `BAM`.`r`.`PARTNER_LOCATION_CODE` AS `name`
from `BAM`.`RC_REQUEST` `r` join `BAM`.`RC_STAT` `s` where
((`BAM`.`s`.`SESSION_ID` = `BAM`.`r`.`SESSION_ID`) and
(`BAM`.`r`.`PARTNER_CODE` = _latin1'ZA') and
(`BAM`.`r`.`CREATION_DATE_TIME` >= (now() - interval _latin1'7200'
second)) and (substring_index(`BAM`.`s`.`SOURCE`,_latin1'_',-(1)) =
_latin1'NOVUS78'))
1 row in set (0.00 sec)

How to repeat:
Tables are created as follows:

CREATE TABLE `RC_REQUEST` (
`RECORD_ID` int(20) NOT NULL auto_increment,
`REQUEST_TYPE` char(1) default NULL,
`AFFILIATE` varchar(255) default NULL,
`SESSION_ID` varchar(255) default NULL,
`PARTNER_CODE` varchar(20) default NULL,
`PARTNER_LOCATION_CODE` varchar(32) default NULL,
`DROPOFF_PARTNER_LOCATION_CODE` varchar(32) default NULL,
`ELAPSED_TIME` int(11) default NULL,
`CREATION_DATE_TIME` timestamp NOT NULL default CURRENT_TIMESTAMP on
update CURRENT_TIMESTAMP,
PRIMARY KEY (`RECORD_ID`),
KEY `IDX_CREATION_DATE_TIME` (`CREATION_DATE_TIME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `RC_STAT` (
`RECORD_ID` bigint(20) unsigned NOT NULL auto_increment,
`SESSION_ID` varchar(255) NOT NULL default '',
`ARM_MAILBOX_ID` varchar(255) NOT NULL default '',
`BENCHMARK_TYPE` varchar(64) NOT NULL default '',
`BENCHMARK_LOC` varchar(64) NOT NULL default '',
`ELAPSED_TIME` int(11) NOT NULL default '0',
`SOURCE` varchar(64) NOT NULL default '',
`CREATION_DATE_TIME` timestamp NOT NULL default CURRENT_TIMESTAMP on
update CURRENT_TIMESTAMP,
PRIMARY KEY (`RECORD_ID`),
KEY `IDX_RC_STAT_SESSION_ID` (`SESSION_ID`),
KEY `IDX_CREATION_DATE_TIME` (`CREATION_DATE_TIME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

As for tables' data, see the following comments.

Suggested fix:
Restore behaviour of 4.1.x?
[11 Dec 2007 13:28] Timour Katchaounov
Suggestion:
Check if the problem is related to item_cmpfunc::convert_constant_item().
[15 Feb 2008 7:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42330

ChangeSet@1.2584, 2008-02-15 11:43:55+04:00, holyfoot@mysql.com +1 -0
  Bug #32942 now() - interval '7200' second NOT pre-calculated, causing "full table scan"
  
  Problem is not about intervals and doesn't actually cause 'full table scan'.
  We have an optimization for DISTINCT when we have
  'DISTINCT field_from_first_join_table' we don't need to read all the
  rows from the JOIN-ed table if we found one conforming row.
  It stopped working in 5.0 as we return NESTED_LOOP_OK if we came upon
  that case in the evaluate_join_record() and that doesn't break the
  recordreading loop in sub_select().
  
  Fixed by returning NESTED_LOOP_NO_MORE_ROWS in this case.
[15 Feb 2008 19:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42395

ChangeSet@1.2584, 2008-02-15 22:58:56+04:00, holyfoot@mysql.com +3 -0
  Bug #32942 now() - interval '7200' second NOT pre-calculated, causing "full table scan"
  
  Problem is not about intervals and doesn't actually cause 'full table scan'.
  We have an optimization for DISTINCT when we have
  'DISTINCT field_from_first_join_table' we don't need to read all the
  rows from the JOIN-ed table if we found one conforming row.
  It stopped working in 5.0 as we return NESTED_LOOP_OK if we came upon
  that case in the evaluate_join_record() and that doesn't break the
  recordreading loop in sub_select().
  
  Fixed by returning NESTED_LOOP_NO_MORE_ROWS in this case.
[17 Feb 2008 11:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42451

ChangeSet@1.2584, 2008-02-17 15:48:17+04:00, holyfoot@mysql.com +3 -0
  Bug #32942 now() - interval '7200' second NOT pre-calculated, causing "full table scan"
  
  Problem is not about intervals and doesn't actually cause 'full table scan'.
  We have an optimization for DISTINCT when we have
  'DISTINCT field_from_first_join_table' we don't need to read all the
  rows from the JOIN-ed table if we found one conforming row.
  It stopped working in 5.0 as we return NESTED_LOOP_OK if we came upon
  that case in the evaluate_join_record() and that doesn't break the
  recordreading loop in sub_select().
  
  Fixed by returning NESTED_LOOP_NO_MORE_ROWS in this case.
[13 Mar 2008 19:28] Bugs System
Pushed into 6.0.5-alpha
[13 Mar 2008 19:35] Bugs System
Pushed into 5.1.24-rc
[13 Mar 2008 19:42] Bugs System
Pushed into 5.0.60
[15 Mar 2008 7:35] Jon Stephens
Documented in the 5.0.60, 5.1.24, and 6.0.5 changelogs as follows (as requested by James Day):

        For DISTINCT queries, 4.0 and 4.1 stopped reading
        joined tables as soon as the first matching row was found. However, this
        optimzation was lost in MySQL 5.0, which instead read all matching rows.
        This fix for this regression may result in a major improvement in
        performance for DISTINCT queries in cases where many
        rows match.
[31 Mar 2008 19:52] Jon Stephens
Pushed to 5.1-telco-6.3, documented in the 5.1.23-ndb-6.3.11 changelog.
[27 Jan 2011 7:46] Shane Bester
see bug #59736