Bug #47631 Inconsistent reporting of date and time when subtracting MIN/MAX
Submitted: 24 Sep 2009 18:19 Modified: 11 Jan 2010 16:48
Reporter: Matt Walton Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:4.1, 5.0, 5.1, next OS:Any (MS Windows 2003, Linux)
Assigned to: Paul DuBois CPU Architecture:Any
Tags: data mismatch, difference, INDEX, innodb, MAX, min

[24 Sep 2009 18:19] Matt Walton
Description:
The creation of an index on an InnoDB table is causing MySQL to return the incorrect results for a query that takes the difference of a max and min on a timestamp column.  Without the index or if the tables use engine=MyISAM the query will return the correct result of the difference between two columns.

How to repeat:
THE FOLLOWING STEPS ASSUME A DEFAULT STORAGE ENGINE OF INNODB

TEST TABLE SETUP:

CREATE TABLE TEST01
(COL1 INTEGER,
CUR_TRN_DT TIMESTAMP)

INSERT INTO TEST01 VALUES (1, '1993-12-1 12:00:00')
INSERT INTO TEST01 VALUES (2, '1993-12-31 12:00:00')

CREATE TABLE TEST02
(COL1 INTEGER,
CUR_TRN_DT TIMESTAMP)

INSERT INTO TEST02 VALUES (1, '1994-12-1 12:00:00')
INSERT INTO TEST02 VALUES (2, '1994-12-31 12:00:00')

REPRO SQL PASS 1: 
CREATE TABLE TEMP01(
	COLA	INTEGER(4), 
	WJXBFS1	DATE, 
	WJXBFS2	DATE)

Pass 2:
insert into TEMP01
select	0  COLA,
	max(a11.CUR_TRN_DT)  WJXBFS1,
	min(a11.CUR_TRN_DT)  WJXBFS2
from	TEST01	a11

Pass 3:
insert into TEMP01 
select	1  COLA,
	max(a11.CUR_TRN_DT)  WJXBFS1,
	min(a11.CUR_TRN_DT)  WJXBFS2
from	TEST02	a11

Pass 4: 
create index TEMP01_i on TEMP01 (COLA) 

Pass 5:	
select	pa11.COLA  COL_ID,
	max(pa11.WJXBFS1)  WJXBFS1,
	min(pa11.WJXBFS2)  WJXBFS2,
	(max(pa11.WJXBFS1) - min(pa11.WJXBFS2))  WJXBFS3
from	TEMP01	pa11
GROUP BY COLA

THE RESULTS FOR WJXBFS3 FROM PASS #5 SHOW UP AS 0 BUT SHOULD BE 30.  IF YOU REPEAT THESE STEPS 
BUT SKIP THE INDEX CREATION YOU WILL SEE THE CORRECT RESULTS.  ALSO, IF YOU CHANGE THE STORAGE 
ENGINE=MYISAM AND RUN ALL OF THE STEPS YOU WILL SEE THE CORRECT RESULTS.

I will also attach scripts (one with index creation and one without)

Suggested fix:
Have InnoDB to return the correct results for query.
[24 Sep 2009 18:20] Matt Walton
ddl, dml, and select using index

Attachment: ddl_dml.sql (text/plain), 963 bytes.

[24 Sep 2009 18:20] Matt Walton
ddl, dml, and select using no index

Attachment: ddl_dml_noindex.sql (text/plain), 916 bytes.

[24 Sep 2009 18:20] Valeriy Kravchuk
Thank you for the problem report. What exact versions, 5.0.x and 5.1.y, you had used for testing this?
[24 Sep 2009 18:27] Matt Walton
Updated the OS as I've only tested this with MySQL Server on Windows.
[24 Sep 2009 18:29] Matt Walton
Hi,

I've tested this with community editions 5.0.85 and 5.1.30.

thank you,
Matt
[24 Sep 2009 22:06] Sveta Smirnova
Thank you for the report.

Verified as described.
[24 Sep 2009 22:09] Sveta Smirnova
Most likely duplicate of bug #47280
[25 Sep 2009 14:13] Mikhail Izioumtchenko
Sveta, thank you for looking at it. Could you investigate if it's indeed a duplicate, in particular if the patch for bug#47280 helps here, if it doesn't, post the EXPLAIN output with and without the index.
[25 Sep 2009 17:26] Matt Walton
I will take a look at this today.  Thank you.
[25 Sep 2009 17:30] Matt Walton
Actually, I see that the patch is still pending.  Should I expect the patch to be delivered soon?  I will subscribe for updates to that patch status.

Thanks,
Matt
[26 Sep 2009 19:56] Sveta Smirnova
Michael,

not, this is not duplicate: patch does not fix the problem.

Here is EXPLAIN output:

without index

explain select  pa11.COLA  COL_ID,
max(pa11.WJXBFS1)  WJXBFS1,
min(pa11.WJXBFS2)  WJXBFS2,
(max(pa11.WJXBFS1) - min(pa11.WJXBFS2))  WJXBFS3
from    TEMP01  pa11
GROUP BY COLA;
id      1
select_type     SIMPLE
table   pa11
type    ALL
possible_keys   NULL
key     NULL
key_len NULL
ref     NULL
rows    2
Extra   Using temporary; Using filesort

with index

explain select  pa11.COLA  COL_ID,
max(pa11.WJXBFS1)  WJXBFS1,
min(pa11.WJXBFS2)  WJXBFS2,
(max(pa11.WJXBFS1) - min(pa11.WJXBFS2))  WJXBFS3
from    TEMP01  pa11
GROUP BY COLA;
id      1
select_type     SIMPLE
table   pa11
type    index
possible_keys   NULL
key     TEMP01_i
key_len 5
ref     NULL
rows    2
Extra
[28 Sep 2009 14:25] Mikhail Izioumtchenko
Thank you, Svetlana. Jimmy, could you look at it? Apparently without the index MySQL calculates the expression itself, and does a good job. It's not clear what's going on with the index.
[28 Sep 2009 20:41] Jimmy Yang
Hi, this problem seems can be reproduced with a much simply query without any innodb involvement:

1) Create a table 'test'

mysql> CREATE TABLE test(
    -> COLA INTEGER(4),
    -> WJXBFS1 DATE,
    -> WJXBFS2 DATE);
Query OK, 0 rows affected (0.02 sec)

2)Insert a value:

mysql> insert into test values (0, '1994-12-31', '1994-12-1');
Query OK, 1 row affected (0.01 sec)

3) Select, to substract '1994-12-1' from '1994-12-31', and you got 0 instead of 30 when Max/Min is used:

mysql> select (max(pa11.WJXBFS1) - min(pa11.WJXBFS2))   WJXBFS3
    -> from test pa11;
+---------+
| WJXBFS3 |
+---------+
|       0 |   <=== Should be 30
+---------+
1 row in set (0.00 sec)

Without the max/min call, we got the correct answer:

mysql> select ((pa11.WJXBFS1) - (pa11.WJXBFS2))   WJXBFS3
    -> from test  pa11;
+---------+
| WJXBFS3 |
+---------+
|      30 |  <== correct result
+---------+
1 row in set (0.00 sec)

The different comes from what datatype the values are assigned to before the substraction.

Put a break point right at the minus operation happens:

(gdb) break  Item_func_minus::real_op

Step through the functions, for incorrect result case, here is the stack:

my_strntod_8bit()
Item_sum_hybrid::val_real
Item::val_result
Item_ref::val_real
Item_func_minus::real_op
Item_func_numhybrid::val_real
...

In this case, the datatime datatype is incorrectly treated as string, and convert to double lost its accuracy. After the my_strntod_8bit() call, value "1994-12-31" becomes 1994:

(gdb) print str
$2 = 0x3bd81c8 "1994-12-31"
...
(gdb) n
Item_ref::val_real (this=0x3bf0070) at item.cc:6149
(gdb) print tmp
$4 = 1994

So "1994-12-31" -  "1994-12-1" becomes 1994 - 1994, and you get 0.

In the case for getting correct result, the values are treated as datatime, and here is the stack:

Field_newdate::val_int  <== datatype is 'Field_newdate' 
Field_newdate::val_real
Item_ref::val_real
Item_func_minus::real_op
Item_func_numhybrid::val_real
...

Breakpoint 2, Field_newdate::val_int (this=0x3be60f0) at field.cc:5861

(gdb) print j
$6 = 19941231

The value is correctly set in this case, and the value is 19941231 - 19941201 = 30, which is the correct result.

This behavior (due to the different type for the value before minus operation) applies to the case described in original Bug report, however, more likely it is due to the datatype was not correct set in the plan generation time when different plan is chosen for the select operation. It is more attributed to the type of the datetime value assigned to than anything with innodb or index etc. As shown in the script above, it can be reproduced without innodb involvement at all. 

Investigation should continue on the value type setup for these datatime values.

Thanks
Jimmy
[28 Sep 2009 21:03] Jimmy Yang
Add 'show create table' to confirm the table type in previous script:

mysql>
mysql> CREATE TABLE test(
    -> COLA INTEGER(4),
    -> WJXBFS1 DATE,
    -> WJXBFS2 DATE) engine =MyISAM;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test values (0, '1994-12-31', '1994-12-1');
Query OK, 1 row affected (0.00 sec)

mysql> select (max(pa11.WJXBFS1) - min(pa11.WJXBFS2))   WJXBFS3
    -> from test  pa11;
+---------+
| WJXBFS3 |
+---------+
|       0 |
+---------+
1 row in set (4.56 sec)

mysql> show create table test;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                            |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `COLA` int(4) DEFAULT NULL,
  `WJXBFS1` date DEFAULT NULL,
  `WJXBFS2` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Thanks
Jimmy
[15 Oct 2009 3:54] Kevin Lewis
I looked into this bug and found it to be just as Jimmy Yang has described.  When the max() and min() functions are used in the SQL query, the value for Item_sum_max::hybrid_type is set to STRING_RESULT (see enum Item_result STRING_RESULT=0).  I think that this happens during construction of that object with a memset or something.  I cannot see the actual constructor ever being called.  

So Item_sum_max::add() and Item_sum_min::add() convert their results to strings like "1994-12-31" which are not converted properly back to a number in order to subtract later.

I found that if I add the following virtual functions to the Item_sum_max and Item_sum_min classes, then the hybrid_type is set to *int*, which is something that can be added or subtracted.

bool Item_sum_max::setup(THD *thd)
{
  switch (hybrid_field_type)
  {
  case MYSQL_TYPE_TIMESTAMP:
  case MYSQL_TYPE_DATE:
  case MYSQL_TYPE_TIME:
  case MYSQL_TYPE_DATETIME:
  case MYSQL_TYPE_YEAR:
  case MYSQL_TYPE_NEWDATE:
    hybrid_type = INT_RESULT;
  }

  return FALSE;
}

bool Item_sum_min::setup(THD *thd)
{
  switch (hybrid_field_type)
  {
  case MYSQL_TYPE_TIMESTAMP:
  case MYSQL_TYPE_DATE:
  case MYSQL_TYPE_TIME:
  case MYSQL_TYPE_DATETIME:
  case MYSQL_TYPE_YEAR:
  case MYSQL_TYPE_NEWDATE:
    hybrid_type = INT_RESULT;
  }

  return FALSE;
}

This solution seems to solve the conversion problems noted in this bug.  But I need a second opinion from someone who knows the server code better.
[16 Nov 2009 23:20] Kevin Lewis
This bug really should not be a P2 since there is a workaround that aught to be used even after the bug is fixed.  In MySQL, if you want to subtract dates and times, you need to use the appropriate function.  For DATE, DATETIME, and TIMESTAMP, you should use TIMESTAMPDIFF.  For TIME, use SUBTIME.  The following SQL shows the current bug for all four of these data types when mixed with MIN/MAX and subtraction.  It also shows that the correct result can currently be obtained using the appropriate function.

mysql> DROP TABLE t1;
Query OK, 0 rows affected (0.13 sec)

mysql> CREATE TABLE t1(a DATE, b DATETIME, c TIMESTAMP, d TIME);
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO t1 VALUES ('2000-01-01', '2000-01-01 00:00:00', '2000-01-01 00:00:00', '10:10:10');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO t1 VALUES ('2010-10-09', '2010-10-09 07:06:05', '2010-10-09 07:06:05', '50:40:30');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+------------+---------------------+---------------------+----------+
| a          | b                   | c                   | d        |
+------------+---------------------+---------------------+----------+
| 2000-01-01 | 2000-01-01 00:00:00 | 2000-01-01 00:00:00 | 10:10:10 |
| 2010-10-09 | 2010-10-09 07:06:05 | 2010-10-09 07:06:05 | 50:40:30 |
+------------+---------------------+---------------------+----------+
2 rows in set (0.00 sec)

mysql> SELECT MIN(a), MAX(a), MAX(a) - MIN(a),
    ->        TIMESTAMPDIFF(day, MIN(a), MAX(a)) DiffDay FROM t1;
+------------+------------+-----------------+---------+
| MIN(a)     | MAX(a)     | MAX(a) - MIN(a) | DiffDay |
+------------+------------+-----------------+---------+
| 2000-01-01 | 2010-10-09 |              10 |    3934 |
+------------+------------+-----------------+---------+
1 row in set (0.00 sec)

mysql> SELECT MIN(b), MAX(b), MAX(b) - MIN(b),
    ->        TIMESTAMPDIFF(day, MIN(b), MAX(b)) DiffDay FROM t1;
+---------------------+---------------------+-----------------+---------+
| MIN(b)              | MAX(b)              | MAX(b) - MIN(b) | DiffDay |
+---------------------+---------------------+-----------------+---------+
| 2000-01-01 00:00:00 | 2010-10-09 07:06:05 |       10.000000 |    3934 |
+---------------------+---------------------+-----------------+---------+
1 row in set (0.00 sec)

mysql> SELECT MIN(c), MAX(c), MAX(c) - MIN(c),
    ->        TIMESTAMPDIFF(day, MIN(c), MAX(c)) DiffDay FROM t1;
+---------------------+---------------------+-----------------+---------+
| MIN(c)              | MAX(c)              | MAX(c) - MIN(c) | DiffDay |
+---------------------+---------------------+-----------------+---------+
| 2000-01-01 00:00:00 | 2010-10-09 07:06:05 |              10 |    3934 |
+---------------------+---------------------+-----------------+---------+
1 row in set (0.00 sec)

mysql> SELECT MIN(d), MAX(d), MAX(d) - MIN(d), SUBTIME(MAX(d), MIN(d)) FROM t1;
+----------+----------+-----------------+-------------------------+
| MIN(d)   | MAX(d)   | MAX(d) - MIN(d) | SUBTIME(MAX(d), MIN(d)) |
+----------+----------+-----------------+-------------------------+
| 10:10:10 | 50:40:30 |              40 | 40:30:20                |
+----------+----------+-----------------+-------------------------+

After this bug is fixed, MAX(x) - MIN(x) will be doing math on a double integer that is representative of the DATE and/or TIME, but it is not a number that aught to have math done on it since the result is most likely meaningless.  

For example, 2010-10-09 07:06:05 would be converted internally to this integer; 20101009070605.  Returning 20101009070605 - 20000101000000 = 100908070605 is useless as a datetime interval because it does not consider leap years or the actual length of a 'month'.  The appropriate DATE/TIME functions do consider leap years and do not attempt to return an interval in years or months, which are inexact periods of time.  (Please note that 3934 is the real number of days between 2000-01-01 and 2010-10-09 including Feb 29 in 2000, 2004 and 2008.)

I am working on a fix that consistently converts string DATE/TIME types to these large integers even when used with MIN/MAX and other aggregate functions, but that is not the recommended way to do math on DATE and TIME types.
[23 Nov 2009 21:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91355

3208 Kevin Lewis	2009-11-23
      Bug#47631 - Math on DATETIME types is now consistently done as a longlong.
     @ mysql-test/r/cast.result
        Expanded and improved this test while checking for similarities to Bug#47631. All of these results were correct before the fix for 47631.
     @ mysql-test/r/func_time.result
        Bug#47631 - Math on DATETIME types is now consistently done as a longlong.  This afected the results of tests that referred to now()-now().  
        Added an array of tests that were used while debugging 47631.
     @ mysql-test/r/sysdate_is_now.result
        Bug#47631 - Math on DATETIME types is now consistently done as a longlong.
     @ mysql-test/t/cast.test
        Expanded and improved this test while checking for similarities to Bug#47631. All of these results were correct before the fix for 47631.
     @ mysql-test/t/func_time.test
        Bug#47631 - Math on DATETIME types is now consistently done as a longlong.  This afected the results of tests that referred to now()-now().  
        Added an array of tests that were used while debugging 47631.
     @ sql/item.cc
        Bug#47631 - Added Item::is_time(), similar to Item::is_datetime()
     @ sql/item.h
        Bug#47631 - Added Item::is_time(), similar to Item::is_datetime()
     @ sql/item_func.cc
        Bug#47631 - Math on date and time types is now consistently done as a longlong.  This is accomplished by setting the Item_func_minus::hybrid_type to INT_RESULT in calls to fix_length_and_desc and by doing the correct conversion to this longlong by calling get_datetime_value() or get_time_value().
     @ sql/item_func.h
        Bug#47631 - Math on DATETIME types is now consistently done as a longlong.
     @ sql/mysql_priv.h
        Bug#47631 - Allow get_time_value() to be called from item_func.cc like get_datetime_value() is.
[26 Nov 2009 16:58] Evgeny Potemkin
Our manual describes the "-" operator as a numeric one, see http://dev.mysql.com/doc/refman/5.1/en/numeric-functions.html. This means that the correct result isn't guaranteed for any argument types but the numeric ones.
In order to subtract DATE/DATETIME values you should use appropriate functions, see http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html.
Because of this it's not a bug.
[11 Jan 2010 16:44] Paul DuBois
Reclassifying as docs bug, assigning to myself.
[11 Jan 2010 16:48] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Added a note to http://dev.mysql.com/doc/refman/5.1/en/numeric-functions.html:

Arithmetic operators apply to numbers. For other types of values,
alternative operations may be available. For example, to add date
values, use DATE_ADD(); see
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html.