Bug #101876 The manual is overly conservative in parameters for replication crash safety.
Submitted: 5 Dec 2020 4:05 Modified: 5 Dec 2020 6:55
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any

[5 Dec 2020 4:05] Jean-François Gagné
Description:
Hi,

while preparing my talk for the MinervaDB Athena 2020 Conference [1], I was reviewing old bugs and the way they were fixed.  I should have done that before, sorry for the delay.  I am interested in [2] and [3]: Handling an Unexpected Halt of a Replica in 5.7 and 8.0 respectively.

[1]: https://minervadbathena.com/

[2]: https://dev.mysql.com/doc/refman/5.7/en/replication-solutions-unexpected-replica-halt.html

[3]: https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-unexpected-replica-halt.html

Beside that sync_binlog = 1 is missing in [2] for GTID with log-slave-updates and and [3] for up to 8.0.16 (I reported this in Bug#101874), I believe that [2] and [3] are overly conservative in the parameters they. specify for replication crash safety.  IMHO, the following are not needed in all cases:

- MASTER_AUTO_POSITION = 1
- sync_relay_log = 1
- innodb_flush_log_at_trx_commit = 1
- and the missing sync_binlog = 1

I would suggest restructuring this document to separate parameters needed all the time and parameters needed only in some situations.  The parameters needed all the time are, according to my understanding, the following:

- relay_log_info_repository = TABLE
- relay_log_recovery = ON

And the parameters only needed in some situations are:

- MASTER_AUTO_POSITION = 1
- sync_relay_log = 1
- innodb_flush_log_at_trx_commit = 1
- sync_binlog = 1

I give details in Suggested fix on when I think the 4 parameters above are needed.

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
Does not apply to this documentation bug.

Suggested fix:
As I write in the description, I think the following parameters are always needed for replication crash safety:

- relay_log_info_repository = TABLE
- relay_log_recovery = ON

And that these are only needed in some situations:

- MASTER_AUTO_POSITION = 1
- sync_relay_log = 1
- innodb_flush_log_at_trx_commit = 1
- sync_binlog = 1

Below is my understanding of each.

MASTER_AUTO_POSITION = 1
------------------------

This is needed when gtid_mode = ON and parallel replication is used (slave_parallel_workers > 1) with out-of-order committing.  Out-of-order committing can happen either when slave_parallel_type = DATABASE, or when slave_parallel_type = LOGICAL_CLOCK and slave_preserve_commit_order = OFF.

sync_relay_log = 1
------------------

This is needed when gtid_mode = OFF and parallel replication is used (slave_parallel_workers > 1) with out-of-order committing.  Out-of-order committing can happen either when slave_parallel_type = DATABASE, or when slave_parallel_type = LOGICAL_CLOCK and slave_preserve_commit_order = OFF.

innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1
------------------------------------------------------

These should be set when gtid_mode = ON and log-slave-updates is ON (except maybe from 8.0.17 because of WL#9211).
[5 Dec 2020 6:55] MySQL Verification Team
Hello Jean-François,

Thank you for the report!

regards,
Umesh