Bug #118165 Ensure MySQL automatic connection failover handling can handle mixed MySQL "upstream" versions
Submitted: 12 May 8:39 Modified: 17 Jun 16:12
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:8.0+ OS:Any
Assigned to: CPU Architecture:Any

[12 May 8:39] Simon Mudd
Description:
This relates to "Automatic connection failover for Async Replication" and also the similar feature talking with an upstream GR group.

- https://dev.mysql.com/worklog/task/?id=12649
- https://dev.mysql.com/worklog/task/?id=14019

The new facilities provided in MySQL to remove away from home grown technology (e.g. orchestrator) to handle failover scenarios is welcome.  However, it does not see to contemplate the scenario of upgrading a cluster from one major version to another.  In such a case potential upstream replacement servers may not be on the same version as the downstream replica, and specifically Oracle does not support (officially) a MySQL instance replicating from an instance with a higher version.

In systems I manage (which are publicly know) the topology is typical one of:

- a 2-tier GR group with async replicas behind the group
- a 3-tier setup with a primary, intermediate servers (per region) and then leaf nodes below that.

Generally upgrades involve upgrading nodes in the cluster starting at the leaf nodes so working up to the async master or gr primary.

However, this leaves us with situations where we may have:
- mixed versions of GR members
- mixed versions of intermediate masters

Other failure scenarios may complicate this further.

The selection of the upstream node to use is in theory completley under the control of the "user" with intermediate masters, but with a GR cluster all nodes appear to be treated similarly.

Either way it looks possible that the current logic with these new features does not provide any way to handle or consider mixed MySQL major versions: e.g. 8.0 / 8.4 or 9.X

I consider this behaviour buggy as we know that 8.0 is EOL in April next year. So this could affect people using this new functionality moving to 8.4 away for 8.0.

It could also affect people wanting to move to the new 9.X LTS version which will appear around the same date.

Note: that it might be argued that we have to upgrade leaf first. However, we also need to test GR members. If we can not test a higher version of GR without upgrading all leaf nodes first then if we find an issue/bug there is no way to downgrade again. This increases the risks of an upgrade.  So it makes sense to be able to be able to upgrade a single GR member and have async leaf nodes behind it of the same major version.

How to repeat:
See description above.

Suggested fix:
Suggestions for improvement are:
- add a policy configuration to allow instance attempting to do automatic failover to avoid replicating from a higher version of MySQL, choosing an alternative instead [ default policy? ]: never_replicate_from_higher_version
- add a policy configuration to allow connecting to any version: allow_replication_from_higher_version
- add a policy configuration to allow connection to a higher version if no alternative is possible. [ best effort basis ]

Adapt documentation to describe cluster upgrade scenarios and the potential risks with the current setup.
[12 May 11:10] MySQL Verification Team
Hi Simon,

I've updated the severity to S4 - Feature Request, as this behavior aligns with the system's current design and documentation, and does not appear to be a bug.

That said, I reviewed your report and agree with your assessment. Your suggestion highlights a valid area for improvement in how the system handles this scenario.

Thank you for your continued contributions and valuable insights.
[22 May 10:18] Pedro Pinheiro
Posted by developer:
 
Hello,
The discussion does translate to a FR 
Add to async automatic connection failover the options to take in consideration the MySQL Version when selecting a new source in case of connection failover
 to prevent incompatible replication chains

https://dev.mysql.com/doc/refman/8.4/en/replication-asynchronous-connection-failover.html
[26 May 6:19] Simon Mudd
ok, I'll accept that this is a feature request.

However, can this potential problem be mentioned in documentation so people are aware that there's no explicit handling of this situation in the current versions of MySQL?

This is important as people will be moving from 8.0 to 8.4 (as 8.0 goes EOL in less than 1 year) and also when moving from 8.4 to 9.X (as 9.X goes LTS).