Bug #93199 | Timezone/DST and multi-column PRI w/ Timestamp have incorrect handling | ||
---|---|---|---|
Submitted: | 14 Nov 2018 23:29 | Modified: | 23 Aug 2019 12:22 |
Reporter: | Christopher Shumake | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Data Types | Severity: | S3 (Non-critical) |
Version: | 5.7.23 | OS: | Debian |
Assigned to: | CPU Architecture: | Any | |
Tags: | DST, timestamp, timezone |
[14 Nov 2018 23:29]
Christopher Shumake
[15 Nov 2018 13:28]
MySQL Verification Team
Hi, Thank you for your bug report. You write the following: " Setting the session's time zone to a location that follows DST seems to cause the comparison to be unable to compare timestamps within a DST transition hour. " The above is expected behaviour. That is a standard behaviour, because there is a non-existent time within that transition hour. Regardless if an hour is taken back or forth. The same happened when one second was added to some time zone one decade ago. You also write: " Further, joins appear to experience TZ conversions before matching. This fails least-surprise because joining table a against table a by PRI should just work. " The above is exactly the expected behaviour and is documented in our Reference Manual. Simply "12:00:00" in one timezone is not equal to "12:00:00" in another timezone. Now, what I fail to understand is when you write: "TZ conversions appear to happen asymmetrically when functions are applied" What are you trying to ask ??? Can you provide an example ??? If it is about functions dealing with UNIX time, then this is also expected behaviour.
[15 Nov 2018 20:15]
Christopher Shumake
While adding this comment, I found potential to cause data corruption. I'm going to increase severity to S2. Let's go observation by observation because I had too much content and I know I don't understand this perfectly yet. Looking forward to figuring it out with you. Please remember the setup is declared in the start of the original comment (up until Section 5) Section 1: assertion: system-based-PST vs location-based-PST didn't see the same result, so I suspect a DST transition isn't applied appropriately somewhere evidence: username@hostname [dbname]>set session time_zone = 'SYSTEM'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select ts, unix_timestamp(ts), data from foo where ts = '2016-11-06 01:30:00'; +---------------------+--------------------+------+ | ts | unix_timestamp(ts) | data | +---------------------+--------------------+------+ | 2016-11-06 01:30:00 | 1478424600 | b | +---------------------+--------------------+------+ 1 row in set (0.00 sec) username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select ts, unix_timestamp(ts), data from foo where ts = '2016-11-06 01:30:00'; +---------------------+--------------------+------+ | ts | unix_timestamp(ts) | data | +---------------------+--------------------+------+ | 2016-11-06 01:30:00 | 1478421000 | a | +---------------------+--------------------+------+ 1 row in set (0.00 sec) Section 2 assertion: PRI timestamp fields are cartesianing themselves on a PRI based join due to TZ conversions into an ambiguous datetime format, so I think we shouldn't be doing a TZ conversion before joining evidence: username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select * from foo f1 join foo f2 on f1.ts = f2.ts; +---------------------+------+---------------------+------+ | ts | data | ts | data | +---------------------+------+---------------------+------+ | 2016-11-06 01:30:00 | a | 2016-11-06 01:30:00 | a | | 2016-11-06 01:30:00 | b | 2016-11-06 01:30:00 | a | | 2016-11-06 01:30:00 | a | 2016-11-06 01:30:00 | b | | 2016-11-06 01:30:00 | b | 2016-11-06 01:30:00 | b | +---------------------+------+---------------------+------+ 4 rows in set (0.00 sec) Section 3: assertion: an asymmetric tz conversion is happening during joins: evidence: (this is after the reduction to a single row) username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select * from foo f1 join foo f2 on f1.ts = f2.ts; Empty set (0.00 sec) Section 4: (this is where we get to potential data corruption) core assertion: multi-column primary keys with timestamps are mishandled Given: the PRI is multi-column and the timestamp is the second column assertion 1: internal conversions of data are passing through the session's timezone assertion 2: internal conversions make it to disk even on a failed alter evidence: username@hostname [dbname]>set session time_zone = '+0:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>create table bar (user_id int unsigned not null, ts timestamp not null, data char(1) not null); Query OK, 0 rows affected (0.03 sec) username@hostname [dbname]>insert into bar values (1, from_unixtime(1478421000), 'a'), (1, from_unixtime(1478421000 + 3600), 'b'); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 username@hostname [dbname]>alter table bar add primary key(user_id, ts); Query OK, 0 rows affected (0.07 sec) Records: 0 Duplicates: 0 Warnings: 0 username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 1 | 2016-11-06 08:30:00 | 1478421000 | a | | 1 | 2016-11-06 09:30:00 | 1478424600 | b | +---------+---------------------+--------------------+------+ 2 rows in set (0.00 sec) username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>alter table bar drop primary key; Query OK, 2 rows affected (0.08 sec) Records: 2 Duplicates: 0 Warnings: 0 username@hostname [dbname]>alter table bar add primary key(user_id, ts); ERROR 1062 (23000): Duplicate entry '1-2016-11-06 01:30:00' for key 'PRIMARY' — assertion 1 username@hostname [dbname]>set session time_zone = '+0:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar; — assertion 2 +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 1 | 2016-11-06 08:30:00 | 1478421000 | a | | 1 | 2016-11-06 08:30:00 | 1478421000 | b | +---------+---------------------+--------------------+------+ 2 rows in set (0.00 sec) (section 5 and 6 in next comment)
[15 Nov 2018 20:15]
Christopher Shumake
Section 5: (partly a side effect from the bug in Section 4) assertion: principle of least surprise failed by using session timezone to determine what row you meant to request scenario: imagine you’re iterating over rows you just pulled. step 1: get row primary keys by user_id step 2: for each row, re-fetch based on user_id and timestamp primary key values surprising outcome: you can’t use the timestamp you just received. instead, you have to shift the session time zone in the middle of the for-loop in step 2 if you want to fetch the rows evidence: (these are new examples) username@hostname [dbname]>set session time_zone = '+0:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>create table bar (user_id int unsigned not null, ts timestamp not null default current_timestamp, data char(1) not null); Query OK, 0 rows affected (0.03 sec) username@hostname [dbname]>alter table bar add primary key (user_id, ts); Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 username@hostname [dbname]>insert into bar values (1, now(), 'a'), (1, now() - interval 17 minute, 'b'), (3, from_unixtime(1478421000 - 3600), 'c'), (3, from_unixtime(1478421000 + 3600), 'd'), (7, now(), 'e'), (8, now() + interval 9 minute, 'f'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 1 | 2018-11-15 19:27:37 | 1542310057 | b | | 1 | 2018-11-15 19:44:37 | 1542311077 | a | | 3 | 2016-11-06 07:30:00 | 1478417400 | c | | 3 | 2016-11-06 09:30:00 | 1478424600 | d | | 7 | 2018-11-15 19:44:37 | 1542311077 | e | | 8 | 2018-11-15 19:53:37 | 1542311617 | f | +---------+---------------------+--------------------+------+ 6 rows in set (0.00 sec) username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 3 | 2016-11-06 00:30:00 | 1478417400 | c | | 3 | 2016-11-06 01:30:00 | 1478424600 | d | +---------+---------------------+--------------------+------+ 2 rows in set (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 00:30:00'; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 3 | 2016-11-06 00:30:00 | 1478417400 | c | +---------+---------------------+--------------------+------+ 1 row in set (0.01 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 01:30:00'; Empty set (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts + interval 0 hour = '2016-11-06 01:30:00'; — not using the index prevents TZ conversions? +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 3 | 2016-11-06 01:30:00 | 1478424600 | d | +---------+---------------------+--------------------+------+ 1 row in set (0.00 sec) — Next, trying to do a workaround so the user isn’t managing time zone in the middle of a for loop by switching to a not-following-dst tz doesn’t provide a workaround: username@hostname [dbname]>set session time_zone = '-7:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 01:30:00'; Empty set (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 00:30:00'; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 3 | 2016-11-06 00:30:00 | 1478417400 | c | +---------+---------------------+--------------------+------+ 1 row in set (0.00 sec) username@hostname [dbname]>set session time_zone = '-8:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 01:30:00'; +---------+---------------------+--------------------+------+ | user_id | ts | unix_timestamp(ts) | data | +---------+---------------------+--------------------+------+ | 3 | 2016-11-06 01:30:00 | 1478424600 | d | +---------+---------------------+--------------------+------+ 1 row in set (0.00 sec) username@hostname [dbname]>select user_id, ts, unix_timestamp(ts), data from bar where user_id = 3 and ts = '2016-11-06 00:30:00'; Empty set (0.00 sec) (section 6 in next comment)
[15 Nov 2018 20:15]
Christopher Shumake
Section 6: assertion: time zones that don’t follow DST do not suffer from the issue evidence: (continued from Section 5’s state) username@hostname [dbname]>set session time_zone = '-7:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select b1.user_id, b1.ts, unix_timestamp(b1.ts), b1.data, b2.data from bar b1 left join bar b2 on b1.user_id = b2.user_id and b1.ts = b2.ts where b1.user_id = 3; +---------+---------------------+-----------------------+------+------+ | user_id | ts | unix_timestamp(b1.ts) | data | data | +---------+---------------------+-----------------------+------+------+ | 3 | 2016-11-06 00:30:00 | 1478417400 | c | c | | 3 | 2016-11-06 02:30:00 | 1478424600 | d | d | +---------+---------------------+-----------------------+------+------+ 2 rows in set (0.00 sec) username@hostname [dbname]>set session time_zone = '-8:00'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select b1.user_id, b1.ts, unix_timestamp(b1.ts), b1.data, b2.data from bar b1 left join bar b2 on b1.user_id = b2.user_id and b1.ts = b2.ts where b1.user_id = 3; +---------+---------------------+-----------------------+------+------+ | user_id | ts | unix_timestamp(b1.ts) | data | data | +---------+---------------------+-----------------------+------+------+ | 3 | 2016-11-05 23:30:00 | 1478417400 | c | c | | 3 | 2016-11-06 01:30:00 | 1478424600 | d | d | +---------+---------------------+-----------------------+------+------+ 2 rows in set (0.00 sec) username@hostname [dbname]>set session time_zone = 'America/Los_Angeles'; Query OK, 0 rows affected (0.00 sec) username@hostname [dbname]>select b1.user_id, b1.ts, unix_timestamp(b1.ts), b1.data, b2.data from bar b1 left join bar b2 on b1.user_id = b2.user_id and b1.ts = b2.ts where b1.user_id = 3; +---------+---------------------+-----------------------+------+------+ | user_id | ts | unix_timestamp(b1.ts) | data | data | +---------+---------------------+-----------------------+------+------+ | 3 | 2016-11-06 00:30:00 | 1478417400 | c | c | | 3 | 2016-11-06 01:30:00 | 1478424600 | d | NULL | +---------+---------------------+-----------------------+------+------+ 2 rows in set (0.00 sec) In conclusion: * an alter on a timestamp-containing table can fail and silently corrupt the data on disk, given certain conditions * there's definitely a bug in the handling of timestamp fields when they're in a multicolumn primary key. I only tested a two-column PRI with the TS in the second column, but the scope may be wider. * there's probably a bug in the reading of timestamp content when the session is using a DST-following timezone declaration * the transparent TZ conversions fail least-surprise
[16 Nov 2018 13:54]
MySQL Verification Team
Hi, I do fail to understand much of the huge text that you wrote ...... 1. I suppose that by PST you mean "Pacific Standard Time". Hence, the sentence "system-based-PST vs location-based-PST didn't see the same result", makes no sense. Our server will compare the datetime types based on their current denominator in DST. Hence, I do not see what is local versus system PST ..... 2. There is not DST transition, just conversion ..... 3. How do you know how is system timezone set on the computer ?? 4. I guess that by PRI, you mean PK, which is primary key. Hence, what you write " PRI timestamp fields are cartesianing themselves on a PRI based join", does not make sense, since you have shown clearly that PK is fully matched. You get so many columns, because you used '*. 5. I do nto understand what you mean by "an asymmetric tz conversion is happening during joins". What is asymetric ???? 6. Section 4 is expected behaviour since a difference between two times is exactly 3600 seconds, which is one hour ..... However, section 4 proves your point, so I am verifying this bug .....
[1 Mar 2019 18:23]
Christopher Shumake
Would it be useful for me to restate this problem, or break it into component sub-bugs?
[4 Mar 2019 13:22]
MySQL Verification Team
HI, This bug is already verified, so it would be helpful if you would do both.
[21 Aug 2019 19:24]
Christopher Shumake
There's too much content here. I've split this into: https://bugs.mysql.com/bug.php?id=96608 https://bugs.mysql.com/bug.php?id=96609 I've also eliminated most of the other observations. Please close as duplicate.
[23 Aug 2019 12:22]
MySQL Verification Team
This bug is now duplicate of the bugs #96608 and #96609.