Bug #47631 | Inconsistent reporting of date and time when subtracting MIN/MAX | ||
---|---|---|---|
Submitted: | 24 Sep 2009 18:19 | Modified: | 11 Jan 2010 16:48 |
Reporter: | Matt Walton | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S2 (Serious) |
Version: | 4.1, 5.0, 5.1, next | OS: | Any (MS Windows 2003, Linux) |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
Tags: | data mismatch, difference, INDEX, innodb, MAX, min |
[24 Sep 2009 18:19]
Matt Walton
[24 Sep 2009 18:20]
Matt Walton
ddl, dml, and select using index
Attachment: ddl_dml.sql (text/plain), 963 bytes.
[24 Sep 2009 18:20]
Matt Walton
ddl, dml, and select using no index
Attachment: ddl_dml_noindex.sql (text/plain), 916 bytes.
[24 Sep 2009 18:20]
Valeriy Kravchuk
Thank you for the problem report. What exact versions, 5.0.x and 5.1.y, you had used for testing this?
[24 Sep 2009 18:27]
Matt Walton
Updated the OS as I've only tested this with MySQL Server on Windows.
[24 Sep 2009 18:29]
Matt Walton
Hi, I've tested this with community editions 5.0.85 and 5.1.30. thank you, Matt
[24 Sep 2009 22:06]
Sveta Smirnova
Thank you for the report. Verified as described.
[24 Sep 2009 22:09]
Sveta Smirnova
Most likely duplicate of bug #47280
[25 Sep 2009 14:13]
Mikhail Izioumtchenko
Sveta, thank you for looking at it. Could you investigate if it's indeed a duplicate, in particular if the patch for bug#47280 helps here, if it doesn't, post the EXPLAIN output with and without the index.
[25 Sep 2009 17:26]
Matt Walton
I will take a look at this today. Thank you.
[25 Sep 2009 17:30]
Matt Walton
Actually, I see that the patch is still pending. Should I expect the patch to be delivered soon? I will subscribe for updates to that patch status. Thanks, Matt
[26 Sep 2009 19:56]
Sveta Smirnova
Michael, not, this is not duplicate: patch does not fix the problem. Here is EXPLAIN output: without index explain select pa11.COLA COL_ID, max(pa11.WJXBFS1) WJXBFS1, min(pa11.WJXBFS2) WJXBFS2, (max(pa11.WJXBFS1) - min(pa11.WJXBFS2)) WJXBFS3 from TEMP01 pa11 GROUP BY COLA; id 1 select_type SIMPLE table pa11 type ALL possible_keys NULL key NULL key_len NULL ref NULL rows 2 Extra Using temporary; Using filesort with index explain select pa11.COLA COL_ID, max(pa11.WJXBFS1) WJXBFS1, min(pa11.WJXBFS2) WJXBFS2, (max(pa11.WJXBFS1) - min(pa11.WJXBFS2)) WJXBFS3 from TEMP01 pa11 GROUP BY COLA; id 1 select_type SIMPLE table pa11 type index possible_keys NULL key TEMP01_i key_len 5 ref NULL rows 2 Extra
[28 Sep 2009 14:25]
Mikhail Izioumtchenko
Thank you, Svetlana. Jimmy, could you look at it? Apparently without the index MySQL calculates the expression itself, and does a good job. It's not clear what's going on with the index.
[28 Sep 2009 20:41]
Jimmy Yang
Hi, this problem seems can be reproduced with a much simply query without any innodb involvement: 1) Create a table 'test' mysql> CREATE TABLE test( -> COLA INTEGER(4), -> WJXBFS1 DATE, -> WJXBFS2 DATE); Query OK, 0 rows affected (0.02 sec) 2)Insert a value: mysql> insert into test values (0, '1994-12-31', '1994-12-1'); Query OK, 1 row affected (0.01 sec) 3) Select, to substract '1994-12-1' from '1994-12-31', and you got 0 instead of 30 when Max/Min is used: mysql> select (max(pa11.WJXBFS1) - min(pa11.WJXBFS2)) WJXBFS3 -> from test pa11; +---------+ | WJXBFS3 | +---------+ | 0 | <=== Should be 30 +---------+ 1 row in set (0.00 sec) Without the max/min call, we got the correct answer: mysql> select ((pa11.WJXBFS1) - (pa11.WJXBFS2)) WJXBFS3 -> from test pa11; +---------+ | WJXBFS3 | +---------+ | 30 | <== correct result +---------+ 1 row in set (0.00 sec) The different comes from what datatype the values are assigned to before the substraction. Put a break point right at the minus operation happens: (gdb) break Item_func_minus::real_op Step through the functions, for incorrect result case, here is the stack: my_strntod_8bit() Item_sum_hybrid::val_real Item::val_result Item_ref::val_real Item_func_minus::real_op Item_func_numhybrid::val_real ... In this case, the datatime datatype is incorrectly treated as string, and convert to double lost its accuracy. After the my_strntod_8bit() call, value "1994-12-31" becomes 1994: (gdb) print str $2 = 0x3bd81c8 "1994-12-31" ... (gdb) n Item_ref::val_real (this=0x3bf0070) at item.cc:6149 (gdb) print tmp $4 = 1994 So "1994-12-31" - "1994-12-1" becomes 1994 - 1994, and you get 0. In the case for getting correct result, the values are treated as datatime, and here is the stack: Field_newdate::val_int <== datatype is 'Field_newdate' Field_newdate::val_real Item_ref::val_real Item_func_minus::real_op Item_func_numhybrid::val_real ... Breakpoint 2, Field_newdate::val_int (this=0x3be60f0) at field.cc:5861 (gdb) print j $6 = 19941231 The value is correctly set in this case, and the value is 19941231 - 19941201 = 30, which is the correct result. This behavior (due to the different type for the value before minus operation) applies to the case described in original Bug report, however, more likely it is due to the datatype was not correct set in the plan generation time when different plan is chosen for the select operation. It is more attributed to the type of the datetime value assigned to than anything with innodb or index etc. As shown in the script above, it can be reproduced without innodb involvement at all. Investigation should continue on the value type setup for these datatime values. Thanks Jimmy
[28 Sep 2009 21:03]
Jimmy Yang
Add 'show create table' to confirm the table type in previous script: mysql> mysql> CREATE TABLE test( -> COLA INTEGER(4), -> WJXBFS1 DATE, -> WJXBFS2 DATE) engine =MyISAM; Query OK, 0 rows affected (0.01 sec) mysql> insert into test values (0, '1994-12-31', '1994-12-1'); Query OK, 1 row affected (0.00 sec) mysql> select (max(pa11.WJXBFS1) - min(pa11.WJXBFS2)) WJXBFS3 -> from test pa11; +---------+ | WJXBFS3 | +---------+ | 0 | +---------+ 1 row in set (4.56 sec) mysql> show create table test; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `COLA` int(4) DEFAULT NULL, `WJXBFS1` date DEFAULT NULL, `WJXBFS2` date DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Thanks Jimmy
[15 Oct 2009 3:54]
Kevin Lewis
I looked into this bug and found it to be just as Jimmy Yang has described. When the max() and min() functions are used in the SQL query, the value for Item_sum_max::hybrid_type is set to STRING_RESULT (see enum Item_result STRING_RESULT=0). I think that this happens during construction of that object with a memset or something. I cannot see the actual constructor ever being called. So Item_sum_max::add() and Item_sum_min::add() convert their results to strings like "1994-12-31" which are not converted properly back to a number in order to subtract later. I found that if I add the following virtual functions to the Item_sum_max and Item_sum_min classes, then the hybrid_type is set to *int*, which is something that can be added or subtracted. bool Item_sum_max::setup(THD *thd) { switch (hybrid_field_type) { case MYSQL_TYPE_TIMESTAMP: case MYSQL_TYPE_DATE: case MYSQL_TYPE_TIME: case MYSQL_TYPE_DATETIME: case MYSQL_TYPE_YEAR: case MYSQL_TYPE_NEWDATE: hybrid_type = INT_RESULT; } return FALSE; } bool Item_sum_min::setup(THD *thd) { switch (hybrid_field_type) { case MYSQL_TYPE_TIMESTAMP: case MYSQL_TYPE_DATE: case MYSQL_TYPE_TIME: case MYSQL_TYPE_DATETIME: case MYSQL_TYPE_YEAR: case MYSQL_TYPE_NEWDATE: hybrid_type = INT_RESULT; } return FALSE; } This solution seems to solve the conversion problems noted in this bug. But I need a second opinion from someone who knows the server code better.
[16 Nov 2009 23:20]
Kevin Lewis
This bug really should not be a P2 since there is a workaround that aught to be used even after the bug is fixed. In MySQL, if you want to subtract dates and times, you need to use the appropriate function. For DATE, DATETIME, and TIMESTAMP, you should use TIMESTAMPDIFF. For TIME, use SUBTIME. The following SQL shows the current bug for all four of these data types when mixed with MIN/MAX and subtraction. It also shows that the correct result can currently be obtained using the appropriate function. mysql> DROP TABLE t1; Query OK, 0 rows affected (0.13 sec) mysql> CREATE TABLE t1(a DATE, b DATETIME, c TIMESTAMP, d TIME); Query OK, 0 rows affected (0.02 sec) mysql> INSERT INTO t1 VALUES ('2000-01-01', '2000-01-01 00:00:00', '2000-01-01 00:00:00', '10:10:10'); Query OK, 1 row affected (0.02 sec) mysql> INSERT INTO t1 VALUES ('2010-10-09', '2010-10-09 07:06:05', '2010-10-09 07:06:05', '50:40:30'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM t1; +------------+---------------------+---------------------+----------+ | a | b | c | d | +------------+---------------------+---------------------+----------+ | 2000-01-01 | 2000-01-01 00:00:00 | 2000-01-01 00:00:00 | 10:10:10 | | 2010-10-09 | 2010-10-09 07:06:05 | 2010-10-09 07:06:05 | 50:40:30 | +------------+---------------------+---------------------+----------+ 2 rows in set (0.00 sec) mysql> SELECT MIN(a), MAX(a), MAX(a) - MIN(a), -> TIMESTAMPDIFF(day, MIN(a), MAX(a)) DiffDay FROM t1; +------------+------------+-----------------+---------+ | MIN(a) | MAX(a) | MAX(a) - MIN(a) | DiffDay | +------------+------------+-----------------+---------+ | 2000-01-01 | 2010-10-09 | 10 | 3934 | +------------+------------+-----------------+---------+ 1 row in set (0.00 sec) mysql> SELECT MIN(b), MAX(b), MAX(b) - MIN(b), -> TIMESTAMPDIFF(day, MIN(b), MAX(b)) DiffDay FROM t1; +---------------------+---------------------+-----------------+---------+ | MIN(b) | MAX(b) | MAX(b) - MIN(b) | DiffDay | +---------------------+---------------------+-----------------+---------+ | 2000-01-01 00:00:00 | 2010-10-09 07:06:05 | 10.000000 | 3934 | +---------------------+---------------------+-----------------+---------+ 1 row in set (0.00 sec) mysql> SELECT MIN(c), MAX(c), MAX(c) - MIN(c), -> TIMESTAMPDIFF(day, MIN(c), MAX(c)) DiffDay FROM t1; +---------------------+---------------------+-----------------+---------+ | MIN(c) | MAX(c) | MAX(c) - MIN(c) | DiffDay | +---------------------+---------------------+-----------------+---------+ | 2000-01-01 00:00:00 | 2010-10-09 07:06:05 | 10 | 3934 | +---------------------+---------------------+-----------------+---------+ 1 row in set (0.00 sec) mysql> SELECT MIN(d), MAX(d), MAX(d) - MIN(d), SUBTIME(MAX(d), MIN(d)) FROM t1; +----------+----------+-----------------+-------------------------+ | MIN(d) | MAX(d) | MAX(d) - MIN(d) | SUBTIME(MAX(d), MIN(d)) | +----------+----------+-----------------+-------------------------+ | 10:10:10 | 50:40:30 | 40 | 40:30:20 | +----------+----------+-----------------+-------------------------+ After this bug is fixed, MAX(x) - MIN(x) will be doing math on a double integer that is representative of the DATE and/or TIME, but it is not a number that aught to have math done on it since the result is most likely meaningless. For example, 2010-10-09 07:06:05 would be converted internally to this integer; 20101009070605. Returning 20101009070605 - 20000101000000 = 100908070605 is useless as a datetime interval because it does not consider leap years or the actual length of a 'month'. The appropriate DATE/TIME functions do consider leap years and do not attempt to return an interval in years or months, which are inexact periods of time. (Please note that 3934 is the real number of days between 2000-01-01 and 2010-10-09 including Feb 29 in 2000, 2004 and 2008.) I am working on a fix that consistently converts string DATE/TIME types to these large integers even when used with MIN/MAX and other aggregate functions, but that is not the recommended way to do math on DATE and TIME types.
[23 Nov 2009 21:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91355 3208 Kevin Lewis 2009-11-23 Bug#47631 - Math on DATETIME types is now consistently done as a longlong. @ mysql-test/r/cast.result Expanded and improved this test while checking for similarities to Bug#47631. All of these results were correct before the fix for 47631. @ mysql-test/r/func_time.result Bug#47631 - Math on DATETIME types is now consistently done as a longlong. This afected the results of tests that referred to now()-now(). Added an array of tests that were used while debugging 47631. @ mysql-test/r/sysdate_is_now.result Bug#47631 - Math on DATETIME types is now consistently done as a longlong. @ mysql-test/t/cast.test Expanded and improved this test while checking for similarities to Bug#47631. All of these results were correct before the fix for 47631. @ mysql-test/t/func_time.test Bug#47631 - Math on DATETIME types is now consistently done as a longlong. This afected the results of tests that referred to now()-now(). Added an array of tests that were used while debugging 47631. @ sql/item.cc Bug#47631 - Added Item::is_time(), similar to Item::is_datetime() @ sql/item.h Bug#47631 - Added Item::is_time(), similar to Item::is_datetime() @ sql/item_func.cc Bug#47631 - Math on date and time types is now consistently done as a longlong. This is accomplished by setting the Item_func_minus::hybrid_type to INT_RESULT in calls to fix_length_and_desc and by doing the correct conversion to this longlong by calling get_datetime_value() or get_time_value(). @ sql/item_func.h Bug#47631 - Math on DATETIME types is now consistently done as a longlong. @ sql/mysql_priv.h Bug#47631 - Allow get_time_value() to be called from item_func.cc like get_datetime_value() is.
[26 Nov 2009 16:58]
Evgeny Potemkin
Our manual describes the "-" operator as a numeric one, see http://dev.mysql.com/doc/refman/5.1/en/numeric-functions.html. This means that the correct result isn't guaranteed for any argument types but the numeric ones. In order to subtract DATE/DATETIME values you should use appropriate functions, see http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html. Because of this it's not a bug.
[11 Jan 2010 16:44]
Paul DuBois
Reclassifying as docs bug, assigning to myself.
[11 Jan 2010 16:48]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products. Added a note to http://dev.mysql.com/doc/refman/5.1/en/numeric-functions.html: Arithmetic operators apply to numbers. For other types of values, alternative operations may be available. For example, to add date values, use DATE_ADD(); see http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html.