MySQL Bugs: #93397: Replication does not start if restart MySQL after init without start slave.

Bug #93397	Replication does not start if restart MySQL after init without start slave.
Submitted:	29 Nov 2018 8:43	Modified:	3 Sep 2019 11:06
Reporter:	Jean-François Gagné	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.24	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

after initialising a slave without starting it, if I restart MySQL, the slave will not start after startup of mysqld.  See "How to repeat" for details.

Many thanks for looking into that,

JFG

How to repeat:
-- Create a sandbox with 2 nodes.

~$ dbdeployer deploy multiple -n 2 --gtid --concurrent mysql_5.7.24

-- Initialise the master.

node1 [localhost] {msandbox} ((none)) > select version();
+------------+
| version()  |
+------------+
| 5.7.24-log |
+------------+
1 row in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > reset master;
Query OK, 0 rows affected (0.02 sec)

node1 [localhost] {msandbox} ((none)) > select password('repl');
+-------------------------------------------+
| password('repl')                          |
+-------------------------------------------+
| *A424E797037BF97C19A2E88CF7891C5C2038C039 |
+-------------------------------------------+
1 row in set, 1 warning (0.00 sec)

node1 [localhost] {msandbox} ((none)) > CREATE USER IF NOT EXISTS 'repl'@'127.0.0.1';
Query OK, 0 rows affected (0.00 sec)

node1 [localhost] {msandbox} ((none)) > ALTER USER 'repl'@'127.0.0.1' IDENTIFIED WITH 'mysql_native_password' AS '*A424E797037BF97C19A2E88CF7891C5C2038C039' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
Query OK, 0 rows affected (0.00 sec)

-- Note that below done as root, because of sandbox GRANTs.

node1 [localhost] {root} ((none)) > GRANT REPLICATION SLAVE ON *.* TO 'repl'@'127.0.0.1';
Query OK, 0 rows affected (0.00 sec)

node1 [localhost] {msandbox} ((none)) > select @@port;
+--------+
| @@port |
+--------+
|  24125 |
+--------+
1 row in set (0.00 sec)

-- Now initialise the slave.

node2 [localhost] {msandbox} ((none)) > reset master;
Query OK, 0 rows affected (0.01 sec)

node2 [localhost] {msandbox} ((none)) > change master to master_host = '127.0.0.1', master_port = 24125;
Query OK, 0 rows affected (0.03 sec)

node2 [localhost] {msandbox} ((none)) > change master to master_user = 'repl', master_password = 'repl';
Query OK, 0 rows affected, 2 warnings (0.02 sec)

node2 [localhost] {msandbox} ((none)) > change master to master_auto_position = 1;
Query OK, 0 rows affected (0.02 sec)

-- Here, if I run "start slave;" everything is fine.  But if I restart MySQL, I have below in the error log.

~/sandboxes/multi_msb_mysql_5_7_24$ ./node2/restart
stop /home/jgagne/sandboxes/multi_msb_mysql_5_7_24/node2
. sandbox server started
~/sandboxes/multi_msb_mysql_5_7_24$ tail -n 8 node2/data/msandbox.err 
2018-11-29T08:23:38.741445Z 0 [ERROR] Error during --relay-log-recovery: Could not locate rotate event from the master.
2018-11-29T08:23:38.744004Z 0 [ERROR] Slave: Failed to initialize the master info structure for channel ''; its record may still be present in 'mysql.slave_master_info' table, consider deleting it.
2018-11-29T08:23:38.744029Z 0 [ERROR] Failed to create or recover replication info repositories.
2018-11-29T08:23:38.744033Z 0 [Note] Failed to start slave threads for channel ''
2018-11-29T08:23:38.744046Z 0 [Note] Some of the channels are not created/initialized properly. Check for additional messages above. You will not be able to start replication on those channels until the issue is resolved and the server restarted.
2018-11-29T08:23:38.748547Z 0 [Note] Event Scheduler: Loaded 0 events
2018-11-29T08:23:38.748901Z 0 [Note] /home/jgagne/opt/mysql/mysql_5.7.24/bin/mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/tmp/mysql_sandbox24126.sock'  port: 24126  MySQL Community Server (GPL)

-- And replication is borked.

node2 [localhost] {msandbox} ((none)) > start slave;
ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository

-- It is easy to fix though.

node2 [localhost] {msandbox} ((none)) > reset slave;
Query OK, 0 rows affected (0.03 sec)

node2 [localhost] {msandbox} ((none)) > start slave;
Query OK, 0 rows affected (0.02 sec)

Hello Jean-François,

Thank you for the report and steps.
Observed this with 5.7.24 build.

regards,
Umesh

test results - 5.7.24

Attachment: 93397_5.7.24.results (application/octet-stream, text), 8.28 KiB.

Bug #93615 marked as duplicate of this one

Bug #94730 marked as duplicate of this one

In which version will this problem be fixed?

Related to https://bugs.mysql.com/83713

So adding comment from the other related bug#83713:

* Seen recently on 5.7.25-1.el7 rpm (CentOS 7)  (my /etc/my.cnf config file is available privately in that bug for reference)

root@myhost [(none)]> start slave io_thread;  
ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository    
  
And we saw this in the log:  
  
2019-05-23T10:00:19.306164Z 288564 [ERROR] Slave SQL for channel '': Slave failed to initialize relay log info structure from the repository, Error_code: 1872 

So given there's no reporting of the actual error that happened here (only that an error occurred) can we at least improve logging to provide more information?
Then we can at least determine what the problem is and work out how to avoid it.

Oracle seems unable to reproduce this but a similar issue has also been reported by Derek Perkins in 8.0.

If there are multiple possible problems that can occur when the current error is reported then distinguishing them would be good, as would providing a more detailed error report of the problem seen.

Thanks.

Maybe related: Bug#92882.

Bug #83713 marked as duplicate of this one.

Hi Umesh,

I am not sure Bug#83713 is a duplicate of this bug.  I think is it more related to Bug#92882.

Bug#83713 involves a crash and MTS, and this bug does not involve a crash nor MTS.

However, Bug#92882 involves both a crash and MTS.

I agree the error messages are very similar though, so I guess all of this is related in some way that I do not understand.

Cheers, JFG

Posted by developer:
 
Changelog entry added for MySQL 8.0.19 and 5.7.29:

If a replication slave was set up using a CHANGE MASTER TO statement that did not specify the master log file name and master log position, then shut down before START SLAVE was issued, then restarted with the option --relay-log-recovery set, replication did not start. This happened because the receiver thread had not been started before relay log recovery was attempted, so no log rotation event was available in the relay log to provide the master log file name and master log position. In this situation, the slave now skips relay log recovery and logs a warning, then proceeds to start replication.