Bug #110785 | Wrong docs on implementing replication failover | ||
---|---|---|---|
Submitted: | 24 Apr 2023 16:43 | Modified: | 26 Apr 2023 15:55 |
Reporter: | Mikael Ronström | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | OS: | Any | |
Assigned to: | CPU Architecture: | Any |
[24 Apr 2023 16:43]
Mikael Ronström
[25 Apr 2023 0:17]
MySQL Verification Team
Thanks for the report Mikael, all best
[26 Apr 2023 15:32]
Frazer Clement
Posted by developer: Docs currently describe two use cases related to replication positioning : 1) Failover https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-replication-failover.html 2) Deployment from backup : https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-replication-backups.html In the failover case it is expected that the mysql.ndb_apply_status table contains a valid epoch number, which can be looked up exactly to find the correct next binlog file + position. Looking it up exactly (using = epoch) avoids ambiguity introduced by using inequality operators which can silently skip over epoch gaps. This bug report is concerned with the second Deployment from backup case. --- When restoring a backup using ndb_restore and the --restore-epoch option, a pseudo epoch is added to the mysql.ndb_apply_status of the form <gci>/<0xffffffff>. This represents a logical position : - *after* any sane epoch values <gci>/0 ... <gci>/0xfffffffe - *before* <gci+1>/0 Note that it is not an actual epoch value that a cluster would generate or use as the microGCI is too high. Any actual stored epoch number will be lower or higher than this, therefore an equality comparison will fail. This means that it is guaranteed not to be present in the mysql.ndb_binlog_index table(s) of the source site. Therefore it is required to use an inequality of some sort to find a start file and position. Cases : 1) Source cluster has some post-backup binlog epochs, mysql.ndb_binlog_index contains entries > pseudo epoch. 2) Source cluster has no post-backup binlog epochs, mysql.ndb_binlog_index contains no entries > pseudo epoch 3) Source cluster has no binlog epochs. 4) Source cluster has purged binlog epochs up to some point after pseudo epoch - gap. Ideal positioning in each case : 1) Position at start of first following epoch ({File, Position} > epoch) 2) Position after end of last preceding epoch ({next_file, next_position} < epoch) 3) Position at start of source Binlog files. 4) Reject positioning attempt as pseudo epoch is not usable for positioning. Given that the pseudo epoch can be arbitrarily distant from any preceding or succeeding epoch, it is difficult in the deployment case to determine whether there may be an epoch gap between it and the epochs available (case 4). Cases 1 and 2 are trying to compute the same position, but case 1 requires a recorded epoch > the pseudo epoch and case 2 requires an epoch < the pseudo epoch. In both cases such an epoch may not exist. Therefore changing the deployment use case documentation to use a statement of form 2) is not guaranteed to solve all potential problems, but an argument could be made that it might be more or less likely to succeed than the existing statement of form 1). The need for more 'decision making' in the positioning led to the filing of the following bug on which some work has been done : Bug#34190136 ADDS A STORED ROUTINE TO FETCH BINLOG POSITION AND FILE PATH --- The statement proposed : SELECT -> @file:=SUBSTRING_INDEX(next_file, '/', -1), -> @pos:=next_position -> FROM mysql.ndb_binlog_index -> WHERE epoch <= @latest -> ORDER BY epoch DESC LIMIT 1; is of form 2). If there are no epoch entries preceding the Backup time, but there are entries afterwards then it will not position correctly. The use of <= rather than < indicates either that the nature of the pseudo epoch is misunderstood or that this is intended to also be used as an alternative general channel cutover positioning statement. Currently we document a separate statement for the cutover use case using equality rather than an inequality. This is done in an attempt to improve the 'strictness' of the cutover positioning, at the cost of having > 1 statement form to position replication which probably confuses. --- Summary - Proposed form may be confusing cutover + B+R deployment use cases or merging them on purpose. - Proposal works for the case where epochs exist prior to pseudo epoch and not after. - Proposal does not work for the case where epochs exist after the pseudo epoch and not before. - Proposal does not work for the case where there are no epochs available. Need to consider further how documentation can be improved to best inform users without overwhelming with details.
[26 Apr 2023 15:55]
Mikael Ronström
Thx for a thorough explanation. Wasn't aware that the epoch generated by the Backup wasn't a real epoch. Helps explain why the MTR test case goes through a bit more complex scheme to handle things.