MySQL Bugs: #82667: Please clarify resiliency introduced by master_info

Bug #82667	Please clarify resiliency introduced by master_info_repository to TABLE.
Submitted:	20 Aug 2016 11:19	Modified:	18 Aug 2018 18:40
Reporter:	Jean-François Gagné	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	5.6 and 5.7 documentations	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

in both links below, I can read "setting relay_log_info_repository and master_info_repository to TABLE can improve resilience to unexpected halts":

-http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html
-http://dev.mysql.com/doc/refman/5.7/en/slave-logs.html

I understand the resilience improvement introduced by setting relay_log_info_repository to TABLE (in combination with relay-log-recovery to TRUE).  However, I do not understand the resilience improvement introduced by setting master_info_repository to TABLE.

Setting master_info_repository to TABLE does not improve replication crash safety.  It does not either improve the level of trust one can have in the consistency of the master_info data and the relay logs (more info in [1]).

[1]: http://blog.booking.com/better_crash_safe_replication_for_mysql.html

I think the documentation should be clearer about this subject.

Many thanks,

JFG

How to repeat:
Documentation bug.

Suggested fix:
Maybe the added resiliency from FILE to TABLE is about the situation introduced by a crash exactly at the moment the file is being updated (which might leave the file in an unreadable state).  However, if this is the case, this could easily be fixed by a file dance: create and _new file, move the master_info file to _old, and move _new to the master_info file.

Posted by developer:
 
Thanks for this comment, sorry you have not had a response yet. In the recent MySQL Server 8.0.2 milestone release, the default for both master_info_repository and relay_log_info_repository has been changed to TABLE. The master info log contains metadata for replication channels, and the developers handling the change of defaults noted that using InnoDB tables provides ACID and MVCC characteristics for the data, therefore the choice was made to change both defaults. I will look at the documentation changes around this and see if the reasons can be clarified more.

Posted by developer:

Thanks again for your comments.

I have now added the following clarification to http://dev.mysql.com/doc/refman/5.7/en/slave-logs.html and also in the 5.6 version. I did not add this to 8.0 because the TABLE setting is now the default.

In MySQL 5.7, setting relay_log_info_repository and master_info_repository to TABLE can improve resilience to unexpected halts. The master info log stores information required for the recovery of the slave's I/O thread, and the relay log info log stores information required for the recovery of the SQL thread. The updates to the tables are committed together with the transaction, meaning that the information in them is always consistent with what has been applied to the database, even in the event of a server halt. The --relay-log-recovery option must be enabled on the slave to guarantee resilience. For more details, see "Handling an Unexpected Halt of a Replication Slave".

I have also added this information to the topic on the slave status logs, http://dev.mysql.com/doc/refman/5.7/en/slave-logs-status.html

I have also clarified in https://dev.mysql.com/doc/refman/5.7/en/replication-solutions-unexpected-slave-halt.html "Handling an Unexpected Halt of a Replication Slave":

The information required for recovery was traditionally stored in files that were updated after the transaction had been applied. This held the risk of losing synchrony with the master depending at which stage of processing a transaction the slave halted at, or even corruption of the files themselves. In MySQL 5.7 you can instead use tables to store recovery information. The updates to the tables are committed together with the transaction, meaning that the information in them is always consistent with what has been applied to the database, even in the event of a server halt. These tables are created using InnoDB, and by using this transactional storage engine the information is always recoverable upon restart.

These changes should be visible online within a couple of days.

Hi, sorry for re-opening this many months after it being closed.

About "the developers handling the change of defaults noted that using InnoDB tables provides ACID and MVCC characteristics for the data", I agree that this is true in the case of relay_log_info_repository but it is not true in the case of master_info_repository.  As explained in [1], the IO thread has no way to synchronize its progress (appending events to the relay logs) and its state (saving data in a FILE or a TABLE).  So the update of master_info_repository cannot be atomic with appending data to the relay logs.

[1]: https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336...

It is a common misconception that crash safe replication needs master_info_repository to TABLE, but this is not the case.  I am not saying that I disagree with having master_info_repository to TABLE, I just believe that is does not introduce any additional resiliency.

Please consider removing the reference to master_info_repository in "setting relay_log_info_repository and master_info_repository to TABLE can improve resilience to unexpected halts" found in [2] and [3].

[2]: http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html

[3]: http://dev.mysql.com/doc/refman/5.7/en/slave-logs.html

Related Bug#92064.

Thanks for looking again into that.

Hi - Thanks for following up on this. Please note the writing team do not get automatically notified about comments on closed bugs, and making a comment on a closed bug does not re-open the bug for us. It's okay to do this for continuity, but please ensure you communicate the comment in another way as well, as you did in this case by mentioning it in a recently opened bug.

I have reworked for Bug 92064 to remove the claim that master_info_repository=TABLE improves resilience in the introduction to the "Handling an Unexpected Halt of a Replication Slave" topic and the "Replication Relay and Status Logs" and "Slave Status Logs" topics. The updates should be visible online soon.
https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-unexpected-slave-halt.html
https://dev.mysql.com/doc/refman/8.0/en/slave-logs.html
https://dev.mysql.com/doc/refman/8.0/en/slave-logs-status.html
and the 5.7 and 5.6 versions.