Bug #38455 Timezone shifts cause bad query results for TIMESTAMP fields
Submitted: 30 Jul 2008 12:06 Modified: 23 Mar 2009 6:27
Reporter: Domas Mituzas Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Data Types Severity:S3 (Non-critical)
Version:4.1, 5.0, 5.1, etc OS:Any
Assigned to: Tatiana Azundris Nuernberg CPU Architecture:Any
Tags: timestamp, timezones
Triage: Triaged: D2 (Serious)

[30 Jul 2008 12:06] Domas Mituzas
Description:
When selecting TIMESTAMP data, WHERE conditions don't care about timezone information in table, thus leading to bad results.

If time is shifted one hour backwards, timestamps in WHERE conditions may mean two absolute time references, but MySQL doesn't care about either, and applies it just on displayed text, not actual time value.

How to repeat:
CREATE TABLE `timetable2` (
  `a` timestamp ,
  `b` int,
  PRIMARY KEY (`a`)
)

mysql> select * from timetable2;
+---------------------+------+
| a                   | b    |
+---------------------+------+
| 2008-10-26 00:00:00 |    1 | 
| 2008-10-26 00:30:00 |    2 | 
| 2008-10-26 01:00:00 |    3 | 
| 2008-10-26 01:30:00 |    4 | 
| 2008-10-26 02:00:00 |    5 | 
| 2008-10-26 02:30:00 |    6 | 
+---------------------+------+
6 rows in set (0.00 sec)

mysql> set time_zone='Europe/Vilnius';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from timetable2;
+---------------------+------+
| a                   | b    |
+---------------------+------+
| 2008-10-26 03:00:00 |    1 | 
| 2008-10-26 03:30:00 |    2 | 
| 2008-10-26 03:00:00 |    3 | 
| 2008-10-26 03:30:00 |    4 | 
| 2008-10-26 04:00:00 |    5 | 
| 2008-10-26 04:30:00 |    6 | 
+---------------------+------+
6 rows in set (0.00 sec)

BAD BAD BAD:
mysql>  select * from timetable2 where a>'2008-10-26 03:00:00';
+---------------------+------+
| a                   | b    |
+---------------------+------+
| 2008-10-26 03:30:00 |    2 | <-- this entry shouldn't show up in case of UT C01:00 offset
| 2008-10-26 03:30:00 |    4 | <-- UTC 01:00 entry missing, if UTC 00:00 offset chosen
| 2008-10-26 04:00:00 |    5 | 
| 2008-10-26 04:30:00 |    6 | 
+---------------------+------+
4 rows in set (0.00 sec)

Suggested fix:
First of all, offset specification in datetime literals is needed, to specify exact reference time in local timezones. 

Then, comparison should be done against underlying type, not display value

And then, 'truncated' warning has to be thrown if time is ambiguous.
[30 Jul 2008 12:09] Domas Mituzas
Other related problem: 

mysql> select * from timetable2 where a='2008-10-26 03:30:00';
+---------------------+------+
| a                   | b    |
+---------------------+------+
| 2008-10-26 03:30:00 |    2 | 
+---------------------+------+
1 row in set (0.00 sec)

Only one value is returned.
[31 Jul 2008 9:50] Sergei Golubchik
I'd say that it is, at best, questionable.

You have two rows with 2008-10-26 03:30:00 - both are shown. I'm not sure it's a bug. Of course, according to this logic the select in your second comment ("other related problem") should have returned two rows.

So, there's an obvious inconsistency.
[17 Feb 2009 11:30] Tatiana Azundris Nuernberg
NO INDEX

As we know, the on-disk format is GMT, so when we val_int() a
Field_timestamp, it factors in TZ:

#0  find_transition_type (t=1224982800, sp=0x7f1fc001ccf8) at tztime.cc:669
#1  0x00000000007f733c in gmt_sec_to_TIME (tmp=0x413e51c0, sec_in_utc=1224982800, sp=0x7f1fc001ccf8) at tztime.cc:723
#2  0x00000000007f74ef in Time_zone_db::gmt_sec_to_TIME (this=0x7f1fc001e588, tmp=0x413e51c0, t=1224982800) at tztime.cc:1262
#3  0x00000000006142da in Field_timestamp::val_int (this=0x16e69c8) at field.cc:4487
#4  0x000000000058935c in Item_field::val_int (this=0x16fea90) at item.cc:1944
#5  0x00000000005d6662 in get_datetime_value (thd=0x7f1fc0008a98, item_arg=0x170e678, cache_arg=0x170e6c0, warn_item=0x170e738, is_null=0x413e531f) at item_cmpfunc.cc:899
#6  0x00000000005c87c3 in Arg_comparator::compare_datetime (this=0x170e678) at item_cmpfunc.cc:985
#7  0x00000000005d9119 in Arg_comparator::compare (this=0x170e678) at item_cmpfunc.h:71
#8  0x00000000005cd70b in Item_func_eq::val_int (this=0x170e5c8) at item_cmpfunc.cc:1568
#9  0x00000000006b4ef8 in evaluate_join_record (join=0x16fee50, join_tab=0x170e1f0, error=0, report_error=0x7f1fc0008e00 "") at sql_select.cc:10707
#10 0x00000000006b5324 in sub_select (join=0x16fee50, join_tab=0x170e1f0, end_of_records=false) at sql_select.cc:10667
#11 0x00000000006bdef7 in do_select (join=0x16fee50, fields=0x7f1fc000a3e0, table=0x0, procedure=0x0) at sql_select.cc:10424
#12 0x00000000006d74a5 in JOIN::exec (this=0x16fee50) at sql_select.cc:2096
#13 0x00000000006d2b16 in mysql_select (thd=0x7f1fc0008a98, rref_pointer_array=0x7f1fc000a500, tables=0x16fe7b8, wild_num=1, fields=@0x7f1fc000a3e0, conds=0x16fec40, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2156153344, result=0x16fee30, unit=0x7f1fc0009f28, select_lex=0x7f1fc000a2d8) at sql_select.cc:2275
#14 0x00000000006d77d3 in handle_select (thd=0x7f1fc0008a98, lex=0x7f1fc0009e98, result=0x16fee30, setup_tables_done_option=0) at sql_select.cc:257
#15 0x000000000065c457 in mysql_execute_command (thd=0x7f1fc0008a98) at sql_parse.cc:2741
#16 0x00000000006645a5 in mysql_parse (thd=0x7f1fc0008a98, inBuf=0x16fe5e8 "SELECT * FROM t1 WHERE f2 = '2008-10-26 02:30:00'", length=49, found_semicolon=0x413e6e60) at sql_parse.cc:6206
#17 0x0000000000667119 in dispatch_command (command=COM_QUERY, thd=0x7f1fc0008a98, packet=0x7f1fc003ff99 "", packet_length=50) at sql_parse.cc:1890
#18 0x000000000066888b in do_command (thd=0x7f1fc0008a98) at sql_parse.cc:1580
#19 0x0000000000669bdd in handle_one_connection (arg=0x7f1fc0008a98) at sql_parse.cc:1186
#20 0x00007f1fc6b6c040 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f1fc62340cd in clone () from /lib64/libc.so.6

That's OK, because this way around, both GMT-values will get
folded into the same local-time value, which is the same as
our string-literal.  Matching succeeds for both lines.
[17 Feb 2009 11:31] Tatiana Azundris Nuernberg
INDEX

#0  TIME_to_gmt_sec (t=0x41bedd60, sp=0x171b128, in_dst_time_gap=0x41bedd93 "") at tztime.cc:894
#1  0x00000000007f4b51 in Time_zone_db::TIME_to_gmt_sec (this=0x171c9b8, t=0x41bedd60, in_dst_time_gap=0x41bedd93 "") at tztime.cc:1246
#2  0x000000000072ccb2 in TIME_to_timestamp (thd=0x1706f78, t=0x41bedd60, in_dst_time_gap=0x41bedd93 "") at time.cc:233
#3  0x0000000000621389 in Field_timestamp::store (this=0x1752750, from=0x1742a38 "2008-10-26 02:30:00", len=19, cs=0xf8eb00) at field.cc:4370
#4  0x0000000000592342 in Item::save_str_value_in_field (this=0x1742a50, field=0x1752750, result=0x1742a60) at item.cc:367
#5  0x0000000000592398 in Item_string::save_in_field (this=0x1742a50, field=0x1752750, no_conversions=true) at item.cc:4732
#6  0x00000000006d8085 in store_key_item::copy_inner (this=0x41bedf20) at sql_select.h:619
#7  0x0000000000606056 in store_key::copy (this=0x41bedf20) at sql_select.h:558
#8  0x00000000006b919b in create_ref_for_key (join=0x1742d10, j=0x17522a0, org_keyuse=0x1741af8, used_tables=0) at sql_select.cc:5298
#9  0x00000000006cb52f in make_join_statistics (join=0x1742d10, tables=0x0, conds=0x1752138, keyuse_array=0x1743ef0) at sql_select.cc:2634
#10 0x00000000006ceab6 in JOIN::optimize (this=0x1742d10) at sql_select.cc:902
#11 0x00000000006d2a78 in mysql_select (thd=0x1706f78, rref_pointer_array=0x17089e0, tables=0x1742678, wild_num=1, fields=@0x17088c0, conds=0x1742b00, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2156153344, result=0x1742cf0, unit=0x1708408, select_lex=0x17087b8) at sql_select.cc:2261
#12 0x00000000006d77d3 in handle_select (thd=0x1706f78, lex=0x1708378, result=0x1742cf0, setup_tables_done_option=0) at sql_select.cc:257
#13 0x000000000065c457 in mysql_execute_command (thd=0x1706f78) at sql_parse.cc:2741
#14 0x00000000006645a5 in mysql_parse (thd=0x1706f78, inBuf=0x17424a8 "SELECT * FROM t1 WHERE f2 = '2008-10-26 02:30:00'", length=49, found_semicolon=0x41befe60) at sql_parse.cc:6206
#15 0x0000000000667119 in dispatch_command (command=COM_QUERY, thd=0x1706f78, packet=0x173e419 "SELECT * FROM t1 WHERE f2 = '2008-10-26 02:30:00'", packet_length=50) at sql_parse.cc:1890
#16 0x000000000066888b in do_command (thd=0x1706f78) at sql_parse.cc:1580
#17 0x0000000000669bdd in handle_one_connection (arg=0x1706f78) at sql_parse.cc:1186

Obviously, the other way around (transforming the localtime string
literal into a GMT value) breaks badly when the localtime is in "the
hour that happens twice" in Autumn (when the clocks are turned back),
because then technically, we'd need *two* GMT results.

(This makes it tempting to do an internal rewrite from FIELD=GMT to
FIELD=GMT_low OR FIELD=GMT_high, but it's not only that. Since we only
ever return GMT_low from the TZ-functions, WHERE FIELD<GMT also fails
because we'd need to scan to GMT_high, not just GMT_low. Of course to
add insult to injury, what looks like a perfectly fine ascending index
in GMT looks like 2:00, 2:30, 2:00, 2:30 in local. Either way, we'd need
to fix up WHERE ... <. and BETWEEN; WHERE ... > is safe.)
[17 Feb 2009 11:33] Tatiana Azundris Nuernberg
tztime.cc notes and formats

(gdb) print sp->revts[i-1]
$8 = 1206842400 <- start  2008-03-30 03:00:00 UTC
(gdb) print sp->revtis[i-1].rt_offset
$13 = 3600      <- UTC+1 => CET
(gdb) print sp->revtis[i-1].rt_type
$14 = 1         <- "hour that never happens"

(gdb) print sp->revts[i]
$7 = 1206846000 <- start  2008-03-30 04:00:00 UTC  (+ 3600 seconds)
(gdb) print sp->revtis[i].rt_offset
$12 = 7200      <- UTC+2 => CEST
(gdb) print sp->revtis[i].rt_type
$15 = 0         <- normal type

(gdb) print sp->revts[i+1]
$6 = 1224990000 <- start  2008-10-26 03:00:00 UTC
(gdb) print sp->revtis[i+1].rt_offset
$11 = 3600      <- UTC+1 => CET
(gdb) print sp->revtis[i+1].rt_type
$16 = 0         <- normal type

# that should make "are we in the evil repeating hour?" something like

if ((i < sp->revcnt - 1) && !sp->revtis[i].rt_type &&
    (sp->revtis[i+1].rt_offset < sp->revtis[i].rt_offset) && /* turning back */
    (sp->revts[i+1] + sp->revtis[i+1].rt_offset - sp->revtis[i].rt_offset))

# relative to TIME_to_gmt_sec() at tztime.cc:951
# half of that is probably implied, anyway (presumably if type
# is 0 for i & i+1, offset(i+1)-offset(i)<0)
[23 Mar 2009 6:31] Tatiana Azundris Nuernberg
THE PROBLEM

Imagine it's autumn, daylight saving time ends, the clocks are turned back, by one hour, say. The same hour happens twice in local time, while UTC remains unfazed. Let's also say localtime is UTC+2.

Are you imagining?

Good. Now consider a TIMESTAMP, which is local time as far as the user is concerned, but UTC on disk. Suppose 0230 localtime is in that magic hour, and we have two rows matching that, one really being 0030 UTC and the other 0130 (both folded into 0230 localtime thanks to the turning back of the clock).

If we search for those rows by local time, two things can happen.

a) We don't have an index.

We do a full table scan, for each row, the UTC value stored for the TIMESTAMP will be converted UTC->local, and the result compared to the literal given by the user.

Here, both UTC values are folded into the same local time value, and we get two resulting rows (which I would consider "least surprising behaviour", for what it's worth).

b) We do have an index.

Now things get sticky. We take the user supplied local-time literal, convert it to UTC, and of course get ONE time-value as result, not two. We then look up the corresponding row(s) using that value, and get ONE resulting wrong. This strikes me as a bug (also opened by Domas as Bug#38455, "affects customer").

POSSIBLE SOLUTIONS

1) Document the limitation (as with TIMEDIFF("2008-12-31 23:59:59", "2009-01-01 00:00:01") => -2s rather than -3 to account for leap second).

2) Detect the extremely rare case of INDEX ^ TIMESTAMP ^ MAGIC_HOUR, that is, a condition specifically referring to that one hour out of 8760 each year, and in that case IGNORE INDEX. This will of course ruin performance for that corner case.

2.r) Requirement: We need to be able to tell whether a date is in The Hour which happens twice.

3) Detect said rare case, work out what the OTHER result value should be (02:30 local time => { 00:30, 01:30 }), and rewrite the conditions accordingly ("key = local" becomes "(key = UTC_low OR key = UTC_high)", "key < local" becomes "key < UTC_high"; similar for <= > => BETWEEN) in the optimizer. This of course implies a lot of effort for what is probably an extremely rare case.

3.r) Requirement: We need to be able to tell whether a date is in The Hour which happens twice. Also, we need to be able to tell whether it actually is an hour, or what.

IMPLEMENTATION

Some of what we do is already done in sql/tztime.cc, but not exposed. The rest could be retooled from existing code. This would extend the interface by adding a few methods.

This of course only works when we have a db of TZ info since AFAIK the POSIX interface does not readily volunteer this information either.
[2 Mar 2010 22:46] Domas Mituzas
Hi! 

This seems to be "Won't fix", though it is epic bug that corrupts data every year for whole hour. 

Are you people in your right mind? :)