Bug #50190 MAX() and MIN() failing for time values with Innodb
Submitted: 8 Jan 2010 16:39 Modified: 8 Feb 2010 11:33
Reporter: Patrick Crews Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0-codebase,pe,next-mr OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: Aggregate Function, innodb, MAX, regression

[8 Jan 2010 16:39] Patrick Crews
Description:
The server (6.0-codebase) with Innodb ICP enabled is returning incorrect values for MAX() on a time column with the Innodb engine.  This does not occur with MyISAM.

For a table with these values (from the attached test case):

INSERT INTO `CC` VALUES (8,'01:27:35','v');
INSERT INTO `CC` VALUES (4,'06:11:01','v');

This query:
SELECT  MAX( `col_time_key`  )  , `col_varchar_key` field3  
FROM CC  
GROUP  BY field3   ;

Is returning this row (incorrectly):
01:27:35	v

How to repeat:
MTR test case.
* Record with ./mtr --record --mysqld=--loose-innodb
* Comment out SET statements as needed for use with other systems.
* Try the test without --loose-innodb and observe the change in result sets
* Diff produced via comparison to mysql-5.1

#/* Server0: MySQL 6.0.14-alpha-debug-log */
#/* Server1: MySQL 5.1.40-gcov-debug-log */

#/* The value of optimizer_switch is distinct between the two servers: */
SET SESSION optimizer_switch = 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=off,materialization=off,semijoin=off' ;
#/* Server 1 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on' */;

#/* The value of optimizer_use_mrr is distinct between the two servers: */

#/* The value of debug is distinct between the two servers: */
# SET SESSION debug = 'd,optimizer_no_icp,optimizer_innodb_ds_mrr' ;
#/* Server 1 : SET SESSION debug = '' */;

#/* Begin test case for query 0 */

--disable_warnings
DROP TABLE /*! IF EXISTS */ CC;
--enable_warnings

CREATE TABLE `CC` (
  `col_int_key` int(11) DEFAULT NULL,
  `col_time_key` time DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_time_key` (`col_time_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (8,'01:27:35','v');
INSERT INTO `CC` VALUES (9,'19:48:31','r');
INSERT INTO `CC` VALUES (9,'00:00:00','a');
INSERT INTO `CC` VALUES (186,'19:53:05','m');
INSERT INTO `CC` VALUES (NULL,'19:18:56','y');
INSERT INTO `CC` VALUES (2,'10:55:12','j');
INSERT INTO `CC` VALUES (3,'00:25:00','d');
INSERT INTO `CC` VALUES (0,'12:35:47','z');
INSERT INTO `CC` VALUES (133,'19:53:03','e');
INSERT INTO `CC` VALUES (1,'17:53:30','h');
INSERT INTO `CC` VALUES (8,'11:35:49','b');
INSERT INTO `CC` VALUES (5,NULL,'s');
INSERT INTO `CC` VALUES (5,'06:01:40','e');
INSERT INTO `CC` VALUES (8,'05:45:11','j');
INSERT INTO `CC` VALUES (6,'00:00:00','e');
INSERT INTO `CC` VALUES (51,'00:00:00','f');
INSERT INTO `CC` VALUES (4,'06:11:01','v');
INSERT INTO `CC` VALUES (7,'13:02:46','x');
INSERT INTO `CC` VALUES (6,'21:44:25','m');
INSERT INTO `CC` VALUES (4,'22:43:58','c');

 
SELECT  MAX( `col_time_key`  )  , `col_varchar_key` field3  
FROM CC  
GROUP  BY field3   ;

/* Diff: */

/* --- /tmp//randgen4222-1262965378-server0.dump	2010-01-08 10:42:58.000000000 -0500
# +++ /tmp//randgen4222-1262965378-server1.dump	2010-01-08 10:42:58.000000000 -0500
# @@ -1,8 +1,8 @@
#  00:00:00	a
#  00:00:00	f
#  00:25:00	d
# -01:27:35	v
# -05:45:11	j
# +06:11:01	v
# +10:55:12	j
#  11:35:49	b
#  12:35:47	z
#  13:02:46	x
# @@ -10,6 +10,6 @@
#  19:18:56	y
#  19:48:31	r
#  19:53:03	e
# -19:53:05	m
# +21:44:25	m
#  22:43:58	c
#  NULL	s */
[8 Jan 2010 16:43] Patrick Crews
EXPLAIN output:
* Innodb:
EXPLAIN SELECT  MAX( `col_time_key`  )  , `col_varchar_key` field3  
FROM CC  
GROUP  BY field3   ;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	CC	index	NULL	col_varchar_key	9	NULL	20	

* MyISAM:
EXPLAIN SELECT  MAX( `col_time_key`  )  , `col_varchar_key` field3  
FROM CC  
GROUP  BY field3   ;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	CC	ALL	NULL	NULL	NULL	NULL	20	Using temporary; Using filesort
[26 Jan 2010 12:26] Jørgen Løland
Simplified test case:

--source include/have_innodb.inc

CREATE TABLE t1 (
  col_time time,
  i int,
  INDEX (i)
) engine=innodb;
INSERT INTO t1 VALUES ('01:27:35',1); # (1)
INSERT INTO t1 VALUES ('06:11:01',1);
INSERT INTO t1 VALUES ('19:53:05',2); # (2)
INSERT INTO t1 VALUES ('21:44:25',2);
INSERT INTO t1 VALUES ('10:55:12',3); # (3)
INSERT INTO t1 VALUES ('05:45:11',3);
INSERT INTO t1 VALUES ('00:25:00',4);
INSERT INTO t1 VALUES ('19:53:03',5);
INSERT INTO t1 VALUES ('06:01:40',5);
INSERT INTO t1 VALUES ('01:27:36',5); # (4)

SELECT MAX(col_time), i
FROM t1 
GROUP BY i;

EXPLAIN
SELECT MAX(col_time), i
FROM t1 
GROUP BY i;

DROP TABLE t1;

# (1) If this record is removed, the entire result set becomes 
#     correct
# (2) Reordering the two records with i=2 produces wrong result
# (3) Reordering the two records with i=3 produces correct result
# (4) Note that time is 1 second after (1). If time is 1 second
#     earlier, 06:01:40 (which is also wrong) is shown instead.
[26 Jan 2010 12:28] Jørgen Løland
MyISAM query output:
--------------------
SELECT MAX(col_time), i
FROM t1 
GROUP BY i;
MAX(col_time)	i
06:11:01	1
21:44:25	2
10:55:12	3
00:25:00	4
19:53:03	5

InnoDB query output (diff vs MyISAM):
-------------------------------------
 MAX(col_time)	i
 06:11:01	1
 21:44:25	2
-10:55:12	3
+05:45:11	3
 00:25:00	4
-19:53:03	5
+01:27:36	5
[4 Feb 2010 14:52] Jørgen Løland
Also fails with datetime:

CREATE TABLE t1 (
col_time datetime,
i int,
INDEX (i)
) engine=innodb;

INSERT INTO t1 VALUES ('2009-11-11 01:27:35',1);
INSERT INTO t1 VALUES ('2009-11-11 06:11:01',1);
INSERT INTO t1 VALUES ('2009-11-11 19:53:05',2);
INSERT INTO t1 VALUES ('2009-11-11 21:44:25',2);
INSERT INTO t1 VALUES ('2009-11-11 10:55:12',3);
INSERT INTO t1 VALUES ('2009-11-11 05:45:11',3);
INSERT INTO t1 VALUES ('2009-11-11 00:25:00',4);
INSERT INTO t1 VALUES ('2009-11-11 19:53:03',5);
INSERT INTO t1 VALUES ('2009-11-11 06:01:40',5);
INSERT INTO t1 VALUES ('2009-11-11 01:27:36',5);

SELECT MAX(col_time), i
FROM t1 
GROUP BY i;
MAX(col_time)	i
2009-11-11 06:11:01	1
2009-11-11 21:44:25	2
2009-11-11 05:45:11	3
2009-11-11 00:25:00	4
2009-11-11 01:27:36	5
[4 Feb 2010 15:36] Jørgen Løland
The problem is this:
 * Item_sum_hybrid (which Item_sum_max inherits) has two items:
   the Item representing the column to aggregate over, and an
   Item_cache_datetime to store the max value (or min value, in
   case of MIN() aggregate function, which also fails BTW)
 * Item_sum_hybrid points to a comparison function, which is
   Arg_comparator::compare_datetime
 * The Arg_comparator, created and set up by Item_sum_hybrid in
   setup(), takes two important arguments: the Item and the
   Item_cache_datetime of Item_sum_hybrid. These are stored as
   Item *a and Item *b, respectively.
 * end_send_group() calls Item_sum_max::add(), which evaluates
   the column value of the current row, and which updates the
   Item_cache_datetime if the current value is higher than the
   cached value. 
 * Item_sum_max::add() calls the comparison function
   compare_datetime(), which calls get_time_value().
   get_time_value() creates an Item_cache_int, used to
   *overwrites the b pointer of Arg_comparator*. However, the
   Item_cache_datetime of the Item_sum_hybrid remains unchanged.
   Now we have two caches for the same Item.
 * In Item_sum_max::add, if cmp->compare()>0 ("this row has ha
   higher value than the cached one"), the cache item of
   Item_sum_hybrid is updated with then new highest value.
   However, the cache pointed to by the b pointer in
   Arg_comparator is not updated since this is a different
   object. Thus, the cached value in b will never change; it will
   contain the value from the first row evaluated throughout
   query execution (in this case, 01:27:35).
[5 Feb 2010 9:57] Jørgen Løland
This problem was introduced when BUG#43668 was merged from 5.1 into 6.0. In 5.1, the Item_sum_max object caches the highest value in an Item_cache_string (Item_cache_datetime in 6.0). Further, in 5.1, the comparison function chosen is compare_binary_string(). This comparison function does not create it's own cache item that overwrites b (in contrast to compare_datetime which is used in 6.0).
[5 Feb 2010 16:31] Jørgen Løland
Depends on BUG#49771.
[8 Feb 2010 7:55] Jørgen Løland
Suggested solution: 

There's no point in creating a cache on top of a cache, so get_time_value() and get_datetime_value() should not create the Item_cache_int on top of the Item_cache_datetime in this case. 

The trick of setting the Item_cache_datetime::used_table_map to 1 (makes const_item() return false) can be used.
[8 Feb 2010 11:33] Evgeny Potemkin
A duplicate of the bug#49771.