Description:
It is impossible to transition from 5.5 to 5.8 using chained replication and rolling upgrades without risking that slaves diverge.
In 5.5 and earlier, TIMESTAMP columns defaulted to NOT NULL.
In 5.6, TIMESTAMP columns were changed so that by default the column becomes nullable.
This changed semantics, so therefore it broke cross-version replication, so therefore it made rolling upgrades impossible.
A workaround was introduced in 5.6, where the variable explicit_defaults_for_timestamp controls whether the new or old semantics is used.
However, the workaround was broken by design. The value of explicit_defaults_for_timestamp was not replicated. Instead, the slave sets the value based on the server version found in the Format_description_log_event. This works only for simple master-slave replication. In chained replication 5.5 -> 5.6 -> 5.6, the first slave will correctly use the 5.5 behavior, but the second slave will use 5.6 behavior since Format_description_log_event in a relay log only contains the version of the immediate master. The 5.5 -> 5.6 -> 5.6 topology is a necessary step in a rolling upgrade of a chained replication topology.
Therefore, the only way to do a rolling upgrade of chained replication from 5.5 to 5.6 without risking that slaves diverge, is to force explicit_defaults_for_timestamp=0 on all 5.6 servers. This is not optimal but it is ok since it does not break upgrades or replication.
But even after all servers are 5.6, there is no way to transition from explicit_defaults_timestamp=0 to explicit_defaults_for_timestamp=1 in chained replication without risking that slaves diverge.
In 5.7, explicit_defaults_for_timestamp is deprecated, and there are plans to remove explicit_defaults_for_timestamp in 5.8 and make the server behave as if explicit_defaults_for_timestamp=1.
So altogether this means that there is no way to transition from 5.5 to 5.8 for a chained replication using rolling upgrades, without risking that slaves diverge.
Note: this is really a Runtime bug, but most changes will have to appear in replication code, therefore setting category replication.
How to repeat:
-
Suggested fix:
In 5.7, make explicit_defaults_for_timestamp a dynamic variable. It should be settable only by SUPER, outside transactions, outside stored functions/triggers.
In 5.7, include the value of explicit_defaults_for_timestamp in Gtid_log_event. Make Gtid_log_event::do_apply_event() set the session value according to the value stored in the event, if the value is present in the event (if the value is not present, do not change explicit_defaults_for_timestamp). (A 5.6 server reading such an extended Gtid_log_event will silently skip the extra field, this is ok.)
In 5.7, make mysqlbinlog print SET @@SESSION.EXPLICIT_DEFAULTS_FOR_TIMESTAMP statements for Gtid_log_events that have the extra field. Make mysqlbinlog revert to the default value after each binlog.
Do not change the current behavior where it sets explicit_defaults_for_timestamp based on Format_description_log_event (note that subsequent Gtid_log_events having the extra field will override this). (But good if we fix BUG#20529891).
With these changes, it will be possible to transition from explicit_defaults_for_timestamp=0 to 1 in 5.7, so that a rolling upgrade from 5.7 to 5.8 will be possible.