MySQL Bugs: #89987: super_read_only with master_info_repository=TABLE breaks replication on restart

Bug #89987	super_read_only with master_info_repository=TABLE breaks replication on restart
Submitted:	9 Mar 2018 13:00	Modified:	28 Mar 2019 16:24
Reporter:	Clemens Weiß	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.21	OS:	Ubuntu (16.04)
Assigned to:		CPU Architecture:	x86 (amd64)
Tags:	master_info_repository, replication, super_read_only

Description:
When using a master slave setup with the following slave settings:

master_info_repository=TABLE
super_read_only=ON

replication breaks after a service restart. This is due to old master info being used when the slave starts. Obviously the mysql.slave_master_info can't be updated and outdated master_log_file/master_log_pos information is used.

Here's an excerpt from the error.log, where old and new master_log_file/master_log_pos information of the Slave I/O thread and the resulting replication error can be seen:

2018-03-08T13:52:47.194940Z 0 [Note] Shutting down slave threads
2018-03-08T13:52:47.194959Z 1 [Note] Error reading relay log event for channel '': slave SQL thread was killed
2018-03-08T13:52:47.195329Z 2 [Note] Slave I/O thread killed while reading event for channel ''
2018-03-08T13:52:47.195347Z 2 [Note] Slave I/O thread exiting for channel '', read up to log 'database-01-p1-bin.005036', position 663915679
[...]
2018-03-08T13:53:08.457673Z 2 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'database-01-p1-bin.005036' at position 663915679, relay log '/var/lib/mysql/database-03-p1-relay.000017' position: 91864862
2018-03-08T13:53:08.458437Z 1 [Note] Slave I/O thread for channel '': connected to master 'replic@database-01-p1:3306',replication started in log 'database-01-p1-bin.005036' at position 654086756
2018-03-08T13:53:08.462082Z 0 [Note] Event Scheduler: Loaded 0 events
2018-03-08T13:53:08.462277Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.21-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-03-08T13:53:08.478366Z 2 [ERROR] Slave SQL for channel '': Could not execute Write_rows event on table payment.cake_sessions; Duplicate entry 'c30s1vk4cpuh26j14nbqpgmud2' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log database-01-p1-bin.005036, end_log_pos 654106861, Error_code: 1062
2018-03-08T13:53:08.478379Z 2 [Warning] Slave: Duplicate entry 'c30s1vk4cpuh26j14nbqpgmud2' for key 'PRIMARY' Error_code: 1062
2018-03-08T13:53:08.478383Z 2 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'database-01-p1-bin.005036' position 654086756

This does not happen if you set master_info_repository to FILE or "SET GLOBAL super_read_only=0;" before shutting down.

How to repeat:
Create a master/slave setup with the slave having set

master_info_repository=TABLE
and
super_read_only=ON

Start the slave, and for example create a database on master. Make sure this is replicated on the slave.

Restart the slave service.

START SLAVE and check SHOW SLAVE STATUS, it should now show the old master_log_file, master_log_pos and an replication error indicating that the database already exists.

Hello Clemens,

Thank you for the report.

Thanks,
Umesh

5.7.21 - test results

Attachment: 89987_5.7.21.results (application/octet-stream, text), 14.99 KiB.

Small mtr test case:

rpl_bug-slave.opt:
--super-read-only=ON --master-info-repository=TABLE

rpl_bug.test:
--source include/master-slave.inc
CREATE DATABASE foo;
--source include/sync_slave_sql_with_master.inc
--let $status_items = Read_Master_Log_Pos
--source include/show_slave_status.inc
--let $rpl_server_number = 2
--source include/rpl_restart_server.inc
--source include/show_slave_status.inc

Reproduced on both 5.7 and 8.0.

Posted by developer:
 
The reason is that:
- Normally, the slave position tables are updated by replication threads.
- Replication threads are excluded from read-only checks.
- The receiver thread runs with non-durable settings (sync_relay_log_info=0).
- On server shutdown, a non-replication thread tries to update the slave position tables. It uses a special THD object for the session, which is not excluded from the read-only check.
- Hence the positions are not persisted during server shut-down.

This fixes it (patch against 8.0), by disabling the read-only check also when writing the slave position tables during server shutdown. We should also handle the return value from terminate_slave_threads better in end_slave, so that any failure to write the repositories results in an error in the log.

diff --git a/sql/rpl_info_table_access.cc b/sql/rpl_info_table_access.cc
index 2aaa992f108..5e18e6f1f88 100644
--- a/sql/rpl_info_table_access.cc
+++ b/sql/rpl_info_table_access.cc
@@ -355,6 +355,7 @@ THD *Rpl_info_table_access::create_thd() {
   if (!thd) {
     thd = System_table_access::create_thd();
     thd->system_thread = SYSTEM_THREAD_INFO_REPOSITORY;
+    thd->set_skip_readonly_check();
     thd_created = true;
   }
 
diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc
index fdff9e6a953..b9e4ed54699 100644
--- a/sql/rpl_slave.cc
+++ b/sql/rpl_slave.cc
@@ -1961,6 +1961,10 @@ void end_slave() {
     mi = it->second;
 
     if (mi)
+      /**
+        @todo: check the return value and print error to the log as needed.
+        See stop_slave for example of how to do it.
+      */
       terminate_slave_threads(mi, SLAVE_FORCE_ALL, rpl_stop_slave_timeout);
   }
   channel_map.unlock();

Posted by developer:
 
Changelog entry added for MySQL 8.0.17:

When a slave server logs master status and connection information to a table (master_info_repository=TABLE), which is the default in MySQL 8.0, the mysql.slave_master_info table was not being updated on shutdown if the server was in super read only mode (super_read_only=ON). No error was written to the error log at this time, but replication failed after server startup because the master log file and master log position information was out of date. The thread that updates the master info log at shutdown is now excluded from read-only checks like other replication threads are, so it can update the table even if the server is in super read only mode. Error handling for a slave that is shutting down has also been improved so that any failure to write to the slave status logs results in an error in the error log.

hi, was it fixed in 5.7 by any chance?

Mikhail, this looked fixed in 5.7.27 as in [1], I can read:

Replication: When a slave server logs master status and connection information to a table (master_info_repository=TABLE), which is the default in MySQL 8.0, the mysql.slave_master_info table was not being updated on shutdown if the server was in super read only mode (super_read_only=ON). No error was written to the error log at this time, but replication failed after server startup because the master log file and master log position information was out of date. The thread that updates the master info log at shutdown is now excluded from read-only checks like other replication threads are, so it can update the table even if the server is in super read only mode. Error handling for a slave that is shutting down has also been improved so that any failure to write to the slave status logs results in an error in the error log. (Bug #27675107, Bug #89987)

[1]: https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-27.html

The release notes of 5.7.27 and bug linking was not very poorly.  This is the 3rd fixed bug I find not mentioned in the 5.7.27 release notes, more details in a comment in Bug#93395.

Posted by developer:
 
Yes, this is fixed in 5.7.27, I missed that when adding the changelog entry in the comment above. Sorry for the confusion. It is in the 5.7.27 release notes email and online version.