| Bug #50190 | MAX() and MIN() failing for time values with Innodb | ||
|---|---|---|---|
| Submitted: | 8 Jan 2010 16:39 | Modified: | 8 Feb 2010 11:33 |
| Reporter: | Patrick Crews | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 6.0-codebase,pe,next-mr | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | Aggregate Function, innodb, MAX, regression | ||
[8 Jan 2010 16:39]
Patrick Crews
[8 Jan 2010 16:43]
Patrick Crews
EXPLAIN output: * Innodb: EXPLAIN SELECT MAX( `col_time_key` ) , `col_varchar_key` field3 FROM CC GROUP BY field3 ; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE CC index NULL col_varchar_key 9 NULL 20 * MyISAM: EXPLAIN SELECT MAX( `col_time_key` ) , `col_varchar_key` field3 FROM CC GROUP BY field3 ; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE CC ALL NULL NULL NULL NULL 20 Using temporary; Using filesort
[26 Jan 2010 12:26]
Jørgen Løland
Simplified test case:
--source include/have_innodb.inc
CREATE TABLE t1 (
col_time time,
i int,
INDEX (i)
) engine=innodb;
INSERT INTO t1 VALUES ('01:27:35',1); # (1)
INSERT INTO t1 VALUES ('06:11:01',1);
INSERT INTO t1 VALUES ('19:53:05',2); # (2)
INSERT INTO t1 VALUES ('21:44:25',2);
INSERT INTO t1 VALUES ('10:55:12',3); # (3)
INSERT INTO t1 VALUES ('05:45:11',3);
INSERT INTO t1 VALUES ('00:25:00',4);
INSERT INTO t1 VALUES ('19:53:03',5);
INSERT INTO t1 VALUES ('06:01:40',5);
INSERT INTO t1 VALUES ('01:27:36',5); # (4)
SELECT MAX(col_time), i
FROM t1
GROUP BY i;
EXPLAIN
SELECT MAX(col_time), i
FROM t1
GROUP BY i;
DROP TABLE t1;
# (1) If this record is removed, the entire result set becomes
# correct
# (2) Reordering the two records with i=2 produces wrong result
# (3) Reordering the two records with i=3 produces correct result
# (4) Note that time is 1 second after (1). If time is 1 second
# earlier, 06:01:40 (which is also wrong) is shown instead.
[26 Jan 2010 12:28]
Jørgen Løland
MyISAM query output: -------------------- SELECT MAX(col_time), i FROM t1 GROUP BY i; MAX(col_time) i 06:11:01 1 21:44:25 2 10:55:12 3 00:25:00 4 19:53:03 5 InnoDB query output (diff vs MyISAM): ------------------------------------- MAX(col_time) i 06:11:01 1 21:44:25 2 -10:55:12 3 +05:45:11 3 00:25:00 4 -19:53:03 5 +01:27:36 5
[4 Feb 2010 14:52]
Jørgen Løland
Also fails with datetime:
CREATE TABLE t1 (
col_time datetime,
i int,
INDEX (i)
) engine=innodb;
INSERT INTO t1 VALUES ('2009-11-11 01:27:35',1);
INSERT INTO t1 VALUES ('2009-11-11 06:11:01',1);
INSERT INTO t1 VALUES ('2009-11-11 19:53:05',2);
INSERT INTO t1 VALUES ('2009-11-11 21:44:25',2);
INSERT INTO t1 VALUES ('2009-11-11 10:55:12',3);
INSERT INTO t1 VALUES ('2009-11-11 05:45:11',3);
INSERT INTO t1 VALUES ('2009-11-11 00:25:00',4);
INSERT INTO t1 VALUES ('2009-11-11 19:53:03',5);
INSERT INTO t1 VALUES ('2009-11-11 06:01:40',5);
INSERT INTO t1 VALUES ('2009-11-11 01:27:36',5);
SELECT MAX(col_time), i
FROM t1
GROUP BY i;
MAX(col_time) i
2009-11-11 06:11:01 1
2009-11-11 21:44:25 2
2009-11-11 05:45:11 3
2009-11-11 00:25:00 4
2009-11-11 01:27:36 5
[4 Feb 2010 15:36]
Jørgen Løland
The problem is this:
* Item_sum_hybrid (which Item_sum_max inherits) has two items:
the Item representing the column to aggregate over, and an
Item_cache_datetime to store the max value (or min value, in
case of MIN() aggregate function, which also fails BTW)
* Item_sum_hybrid points to a comparison function, which is
Arg_comparator::compare_datetime
* The Arg_comparator, created and set up by Item_sum_hybrid in
setup(), takes two important arguments: the Item and the
Item_cache_datetime of Item_sum_hybrid. These are stored as
Item *a and Item *b, respectively.
* end_send_group() calls Item_sum_max::add(), which evaluates
the column value of the current row, and which updates the
Item_cache_datetime if the current value is higher than the
cached value.
* Item_sum_max::add() calls the comparison function
compare_datetime(), which calls get_time_value().
get_time_value() creates an Item_cache_int, used to
*overwrites the b pointer of Arg_comparator*. However, the
Item_cache_datetime of the Item_sum_hybrid remains unchanged.
Now we have two caches for the same Item.
* In Item_sum_max::add, if cmp->compare()>0 ("this row has ha
higher value than the cached one"), the cache item of
Item_sum_hybrid is updated with then new highest value.
However, the cache pointed to by the b pointer in
Arg_comparator is not updated since this is a different
object. Thus, the cached value in b will never change; it will
contain the value from the first row evaluated throughout
query execution (in this case, 01:27:35).
[5 Feb 2010 9:57]
Jørgen Løland
This problem was introduced when BUG#43668 was merged from 5.1 into 6.0. In 5.1, the Item_sum_max object caches the highest value in an Item_cache_string (Item_cache_datetime in 6.0). Further, in 5.1, the comparison function chosen is compare_binary_string(). This comparison function does not create it's own cache item that overwrites b (in contrast to compare_datetime which is used in 6.0).
[5 Feb 2010 16:31]
Jørgen Løland
Depends on BUG#49771.
[8 Feb 2010 7:55]
Jørgen Løland
Suggested solution: There's no point in creating a cache on top of a cache, so get_time_value() and get_datetime_value() should not create the Item_cache_int on top of the Item_cache_datetime in this case. The trick of setting the Item_cache_datetime::used_table_map to 1 (makes const_item() return false) can be used.
[8 Feb 2010 11:33]
Evgeny Potemkin
A duplicate of the bug#49771.
