Bug #75439 MySQL 5.7 sql_mode problems breaks replication
Submitted: 7 Jan 2015 19:46 Modified: 15 Jul 2015 16:09
Reporter: Simon Mudd (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.5,5.7.6,5.7.7-rc OS:Any
Assigned to: CPU Architecture:Any

[7 Jan 2015 19:46] Simon Mudd
Description:
The sql_mode changes behaviour from 5.6 to 5.7.
I notice a create table statement that replicates fine on a 5.6 master breaks on the 5.7 downstream slave.

Last_SQL_Error: Error 'Invalid default value for 'last_change'' on query. Default database: 'db1'. Query: 'CREATE TABLE `mytable` ( 
`id` int(10) unsigned NOT NULL, 
`field2` tinyint(3) unsigned NOT NULL, 
`last_change` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', 
....

sql_mode on the 5.6 master is: sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
sql_mode on the 5.7 slave is: sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
(both the config in /etc/my.cnf)

Given the sql_mode change the value and this is being replicated the 5.7 stricter behaviour is being applied on the 5.7 slave by the SQL thread when in fact the 5.6 behaviour should be applied.

How to repeat:
see above.

Suggested fix:
Make the SQL thread aware of the master version being 5.6 and treat the sql_mode as if it were running on a 5.6 server to maintain the same behaviour.
[7 Jan 2015 22:25] Simon Mudd
To clarify: this a problem only when migrating a replication chain from MySQL 5.6 to MySQL 5.7.
[8 Jan 2015 5:39] Shane Bester
The issue is that 5.6 and 5.7 have different ideas about what strict mode means.  This works on 5.6 but not on 5.7:

---
set sql_mode='NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES';
drop table if exists t1;
create table t1(a timestamp not null default '0000-00-00 00:00:00')engine=innodb;
show warnings;
---

Relevant documentation:
http://dev.mysql.com/doc/refman/5.7/en/timestamp-initialization.html

"The following examples that use DEFAULT 0 do not work if the SQL mode is set to cause “zero” date values (specified, for example, as 0 '0000-00-00 00:00:00') to be rejected. Such dates are rejected in strict SQL mode (as of MySQL 5.7.4)"

http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_strict_all_tables

"As of MySQL 5.7.4, the deprecated ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE SQL modes do nothing. Instead, their previous effects are included in the effects of strict SQL mode (STRICT_ALL_TABLES or STRICT_TRANS_TABLES). In other words, strict mode means the same thing as the pre-5.7.4 meaning of strict mode plus the ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE modes. "
[8 Jan 2015 7:20] Simon Mudd
Hi Shane,

I'm aware of the change in sql_mode behaviour and that the same "mode" behaves differently in 5.7 vs 5.6.

However, for anyone upgrading from a MySQL 5.6 box and using a 5.7 slave, they/we expect that replication will work "normally" even though the major version changes.
Also the sql_mode is provided as part of the binlog stream so setting the global value on the 5.7 server has no affect whatsoever as the SQL thread is using the value provided _inside_ the binlog for that specific event effectively as a session variable.

If (in this case) the CREATE TABLE succeeds on the master it _must_ also succeed on the slave.
In a similar vein, if I were to insert a zero timestamp "0000-00-00 00:00:00" into the 5.6 master for columns defined this way and this succeeds there (because of the 5.6 sql_mode setting) then this MUST also succeed on the 5.7 slave.

Failure to apply this logic means that it's either impossible to put a 5.7 slave under a 5.6 server (as the 5.7 slave's global/local sql_mode setting is irrelevant) should a situation such as this occur.

The relaylog stream has info about the upstream master version. I guess that could be used but that's only available as the first event in the relaylog. It's also possible that we have a replication chain similar to the following:
5.6 --> 5.7 ---> 5.7

where the second intermediate master is also running 5.7, and in this case the issue needs to be resolved too. I'm not 100% sure if the upstream master version is signalled all the way down to the leaf slaves in a situation like this but it's also the recommended upgrade path if daisy chaining servers and this sort of usage is common where perhaps you run with multiple datacentres and have a "spare master" in the secondary datacentre.

I understand what's happening but the replication breakage is I think not acceptable and a work around which doesn't prevent us from inserting data that was valid in 5.6 via replication on a 5.7 slave.

Also if using STRICT_ALL_TRANS in 5.6 the documentation doesn't tell you what equivalent mode is needed in 5.7 once the 5.7 server is the primary master. I'd guess something like adding ALLOW_INVALID_DATES but that allows more than just the zero date.

Also to note here http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html is there's a lot of talk about dates but no clear mention that this also refers to DATETIME and TIMESTAMP columns. That may be inferred but I don't see it explicitly stated.  I also tend to think due to the complexity of the problem and the fact that it only hits you while migrating servers between the 2 versions that this should be discussed in more detail somewhere else: "Migrating between versions" or "Upgrading a replication topology" and this section of the docs referenced where sql_mode mode is discussed.   This would reduce the size of the section on sql_mode (the explanation of changes) and thus probably be easier, and when migration you'd look at the new section (as there may be other things that are relevant such as referencing explicit_defaults_for_timestamp which has similar gotchas and thus be clearer.

For now from what I can see the server concerned doesn't actually insert the zero timestamp, even though it is a default (the application provides a default value normally) so we're not seeing issues so replication is working now.  I've also suggested to the developers that the remove this usage of a zero timestamp to prevent this being an issue.  This works and is possible for this new table but would not be reasonable on existing systems as it would require a lot of code and table definition changes.
[16 Jan 2015 5:35] Shane Bester
copybug script malfunctioned.  the internal bug is:
 Bug 20367829 - MYSQL 5.7 SQL_MODE PROBLEMS BREAKS REPLICATION
[16 Jan 2015 8:56] Shane Bester
I guess changing the meaning of something existing is always going to be painful. IMHO we should never change something once it has been implemented.

Same thing when old/new datatypes had the same name but underlying implementation changes.  Some collations have had the same issue too, years ago.  InnoDB data files had same problem. .frm version was not maintained correctly too. 

Things need to declared "final", as they say in java terms.  A strict versioning of these things (datatype, collations, variables) would be nice.
[11 Feb 2015 18:33] Simon Mudd
Indeed. Bumped into this again, and ended up changing DEFAULT timestamp values to a value other than '0000-00-00 00:00:00'. [1]

Perhaps changing the meaning of sql_mode was not such a good idea. It's great in isolation but not if you want to migration from existing systems, and no doubt overly complicates replication configurations which is where you get the <old_version> --> <new_version> compatibility issues.

So let's see if 5.7.6 comes up with a solution to this.
[1] Handling any timestamp value other than the zero timestamp is a pain because  you have to enter a "local timezone value" which then gets converted to a GMT value, and whenever you pull out the value you have to pull out the time_t value and then it gets converted back to a local time again.

It would be great to have a way to be able to define a DEFAULT which is timezone agnostic (so probably UTC/GMT) so there's never any confusion about the value that's being stored, and to perhaps cast it to a UTC/GMT value without going through a conversion function.

I made some sort of FR for this years ago (I could go and look for it) and really it would make things much easier for us all.

In the meantime waiting for 5.7.6...
[13 Feb 2015 9:39] Ståle Deraas
Posted by developer:
 
Hi Simon,

One workaround is to set ERROR_FOR_DIVISION_BY_ZERO | NO_ZERO_DATE | NO_ZERO_IN_DATE on the 5.6 server, to get the same behaviour as on the 5.7 server. Would that help you?

/Staale
[13 Feb 2015 15:42] Simon Mudd
No, I don't think so.

The sql_mode is inherited from the upstream replication stream by the SQL thread.

So while I can change the sql_mode on the server this is useless on a slave as the problem is with executing the statements on the slave run by the sql_thread.

So I see in the relay logs:

# at 236
#150213 16:25:55 server id 123456047  end_log_pos 419436684 CRC32 0x16af5e64    Query   thread_id=309035134     exec_time=0     error_code=0
use `mydb`/*!*/;
SET TIMESTAMP=1423841155/*!*/;
SET @@session.pseudo_thread_id=309035134/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
-- real SQL as I'm in SBR mode
INSERT  IGNORE INTO  my_table (a, b, c, d)
VALUES ('324949', '32494901', '1128813', '2015-07-23'), 
...

This sql_mode is the problem as I explained before. It's the sql_mode of the upstream
master that's running 5.6.X. The 5.7 slave interprets this as if it were a 5.7 setting and
that is wrong.

What I'm not sure however is if the slave is able to know for sure the upstream master's
version so that it is able to interpret the required 5.6 "meaning" of this sql_mode and 
apply it to the 5.7 server (inside the SQL thread).

I hope that clarifies the problem.
[13 Feb 2015 16:07] Simon Mudd
And to clarify: The 5.6 upstream master has the settings set which work.  Changing those is not really an option.

Setting: ERROR_FOR_DIVISION_BY_ZERO | NO_ZERO_DATE | NO_ZERO_IN_DATE would be great. It would break my database, and the application that talks to it as the current MySQL behaviour would change and the application is not prepared for that.

Adjusting the application to make it stricter might be a good goal and indeed I tried some time ago to make things stricter. However, as I'm sure is the case for other people, the amount of work required to make an existing application "adapt" to a stricter environment, even if we want that, is hard, intrusive and time-consuming, so on many existing replication chains this is not a short term possibility.  New systems are configured better but also developers get used to using '0000-00-00 00:00:00' as a magic constant so they even ask for this now when perhaps it might not be such a good idea.  What other magic constant(s) can I easily use?

I also think it's the wrong "solution" to push the problem back to us. We did not decide to change the _meaning_ of the sql_mode in 5.7. That's perhaps not such a good idea.
Making the default settings stricter to include the values suggested above is much less of a concern. 5.7 would understand the upstream settings (from 5.6) and apply them even if the rest
of the server is configured differently, and those of us who might need a lazier configuration can configure the server appropriately.

So maybe using a new name (as suggested by one of the posters on Morgan's post about the sql_mode chage) for the sql_mode which is a hard-coded combination of existing modes might be better. It would not break replication and would allow us to replicate safely from 5.6.

The messy setup with zero dates and zero timestamps is self-imposed (from MySQL old days). It's hard to work with other values in MySQL (as defaults or constants), especially with timestamps
as your timezone may not be the same as mine and timestamps default values are not GMT/UTC.

I'd suggest you fix that problem first and then give people time to use defaults that are not zero dates or zero timestamps.
In the end 00-00-00 00:00:00 is really 1970-01-01 00:00:00 UTC but I can't make MySQL store that value or show it to me.
I don't think I use dates with zeros in them much. What bites me are the timestamps.

However, in my opinion you probably do not have time to make such a change now when I guess 5.7 is starting to solidify.
[8 Apr 2015 21:50] Simon Mudd
Also repeatable in 5.7.7-rc which ha been released today.

A heavily mutilated error message shows:

2015-04-08T21:30:28.339983Z 2 [ERROR] Slave SQL for channel '': Error 'Incorrect datetime value: '0000-00-00 00:00:00' for column 'col1' at row 1' on query. Default database: 'mydb'. Query: 'INSERT INTO Table1 (col2, col3, col4, col5, col6, col7, col8, col1, col9, col10, col11, col12, col13, col14, col15) VALUES ('2015-04-01 16:22:11', NULL, '1', '1', '46', '64059', '1', '0000-00-00 00:00:00', 'xxxxxxxx', 'xxxxxxxxx', '1', '2', '3', '0', 'xxxxxxx')', Error_code: 1292
2015-04-08T21:30:28.340075Z 2 [Warning] Slave: Incorrect datetime value: '0000-00-00 00:00:00' for column 'col1' at row 1 Error_code: 1292
2015-04-08T21:30:28.340100Z 2 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000492' position 51252835

col1 is defined as:

  `col1` timestamp NOT NULL DEFAULT '1970-01-01 01:00:01',  -- CET timezoner = epoch + 1 second // this is horribly ugly as zero timestamps fail and I can't specify a UTC epoch+1 value any other way.

> select @@sql_mode;  -- on the slave
+--------------------------------------------+
| @@sql_mode                                 |
+--------------------------------------------+
| STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
+--------------------------------------------+
1 row in set (0.00 sec)

The uptream 5.6 master runs 5.6.17 and has sql_mode configured the same.

Not fixing this issue means that replicated 5.7 slaves when the upstream 5.6 master uses a sql_mode which the 5.7 treats differently will break replication.
This completely breaks the upgrade path to 5.7 from 5.6.  My 5.6 slaves replicate perfectly happily.

Perhaps I'm missing something obvious but I fail to see what it is. The proposed workarounds to change the behaviour of the application on the current live servers to "enforce" strict mode, is not really a realistic option.

So seeing that 5.7.7 is marked as rc seems to imply the behaviour of the GA version may be the same.
[28 May 2015 0:00] Nakoa Mccullough
This bug is also a problem for us.  Is it possible to add a startup flag '--old-strict-mode'  that will revert to the old behavior and produce a warning in the error log that the option is depricated and will be removed in 5.8?  That will eventually get us to the desired behavior and provide a warning to people that the change is coming rather then just have strict mean one thing in the old version and another in the new one.

--Koa
[1 Jun 2015 7:41] Ståle Deraas
Posted by developer:
 

We are working on a solution that will fix this issue.
[15 Jul 2015 16:09] Paul Dubois
Noted in 5.7.8 changelog.

In MySQL 5.7.4, the ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and
NO_ZERO_IN_DATE SQL modes were changed so that they did nothing when
named explicitly. Instead, their effects were included in the effects
of strict SQL mode. The intent was to reduce the number of SQL modes
with an effect dependent on strict mode and make them part of strict
mode itself.

However, the change to make strict mode more strict by including
ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE caused
some problems. For example, in MySQL 5.6 with strict mode but not
NO_ZERO_DATE enabled, TIMESTAMP columns can be defined with DEFAULT
'0000-00-00 00:00:00'. In MySQL 5.7.4 with the same mode settings,
strict mode includes the effect of NO_ZERO_DATE and TIMESTAMP columns
cannot be defined with DEFAULT '0000-00-00 00:00:00'. This causes
replication of CREATE TABLE statements from 5.6 to 5.7.4 to fail if
they contain such TIMESTAMP columns.

The long term plan is still to have the three affected modes be 
included in strict SQL mode and to remove them as explicit modes in a
future MySQL release. But to restore compatibility in MySQL 5.7 with
5.6 strict mode and to provide additional time for affected
applications to be modified, the following changes have been made:

* ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE are no
  longer part of strict SQL mode. This reverts a change made in MySQL
  5.7.4.

* ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE once
  again have an effect when named explicitly. This reverts a change
  made in MySQL 5.7.4.

* ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE are now
  included in the default sql_mode value, which as a result includes
  these modes: ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES,
  NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,
  NO_AUTO_CREATE_USER, and NO_ENGINE_SUBSTITUTION.
            
With the preceding changes, stricter data checking is still enabled
by default, but the individual modes can be disabled in environments
where it is currently desirable or necessary to do so.
            
Although ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and
NO_ZERO_IN_DATE can be used separately from strict mode, it is
intended that they be used together. As a reminder, a warning now
occurs if they are enabled without also enabling strict mode or vice
versa.
[26 Jan 7:57] Umesh Shastry
Bug #94062 marked as duplicate of this one