MySQL Bugs: #47161: rpl_backup_shutdown fails sporadically on PB2

Bug #47161	rpl_backup_shutdown fails sporadically on PB2
Submitted:	7 Sep 2009 0:09	Modified:	11 Jan 2010 16:14
Reporter:	Luis Soares	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	Tests: Replication	Severity:	S3 (Non-critical)
Version:	5.4	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	experimental, mysql-next-bugfixing, pb2, test failure

Description:
rpl_backup_shutdown failures have been observed lately, in PB2, with the symptom posted below. For a full failure description check howto repeat section.

rpl.rpl_backup_shutdown 'mix'            [ fail ]
        Test ended at 2009-09-05 15:46:40

CURRENT_TEST: rpl.rpl_backup_shutdown
--- /export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/suite/rpl/r/rpl_backup_shutdown.result	2009-09-05 12:55:00.000000000 +0300
+++ /export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/suite/rpl/r/rpl_backup_shutdown.reject	2009-09-05 16:46:40.000000000 +0300
@@ -116,31 +116,38 @@
 #
 # Slave will be stopped as result of restore incident event
 # Connecting to slave...
-# Check the slave status
+**** ERROR: timeout after  seconds while waiting for slave parameter Slave_SQL_Running = No ****
+Message: Failed while waiting for slave SQL thread to stop
+Current connection is 'slave'
+Note: the following output may have changed since the failure was detected
+
+[on slave]
+
+**** SHOW SLAVE STATUS on slave ****
 SHOW SLAVE STATUS;
-Slave_IO_State	#
+Slave_IO_State	
 Master_Host	127.0.0.1
 Master_User	root
-Master_Port	MASTER_PORT
+Master_Port	13000
 Connect_Retry	1
-Master_Log_File	#
-Read_Master_Log_Pos	#
-Relay_Log_File	#
-Relay_Log_Pos	#
-Relay_Master_Log_File	master-bin.000002
-Slave_IO_Running	Yes
-Slave_SQL_Running	No
+Master_Log_File	master-bin.000001
+Read_Master_Log_Pos	2578
+Relay_Log_File	slave-relay-bin.000003
+Relay_Log_Pos	2725
+Relay_Master_Log_File	master-bin.000001
+Slave_IO_Running	No
+Slave_SQL_Running	Yes
 Replicate_Do_DB	
 Replicate_Ignore_DB	
 Replicate_Do_Table	
 Replicate_Ignore_Table	
 Replicate_Wild_Do_Table	
 Replicate_Wild_Ignore_Table	
-Last_Errno	1590
-Last_Error	The incident RESTORE_ON_MASTER occured on the master. Message: A restore operation was initiated on the master.
+Last_Errno	0
+Last_Error	
 Skip_Counter	0
-Exec_Master_Log_Pos	#
-Relay_Log_Space	#
+Exec_Master_Log_Pos	2578
+Relay_Log_Space	2881
 Until_Condition	None
 Until_Log_File	
 Until_Log_Pos	0
@@ -150,105 +157,114 @@
 Master_SSL_Cert	
 Master_SSL_Cipher	
 Master_SSL_Key	
-Seconds_Behind_Master	#
+Seconds_Behind_Master	NULL
 Master_SSL_Verify_Server_Cert	No
-Last_IO_Errno	0
-Last_IO_Error	
-Last_SQL_Errno	1590
-Last_SQL_Error	The incident RESTORE_ON_MASTER occured on the master. Message: A restore operation was initiated on the master.
+Last_IO_Errno	1765
+Last_IO_Error	The master is not allowing slave connections.
+Last_SQL_Errno	0
+Last_SQL_Error	
 Replicate_Ignore_Server_Ids	
 Master_Server_Id	1
-SET global sql_slave_skip_counter=1;
-# Note that slave is stopped with an error
-STOP SLAVE;
-# Perform restore in slave to have sync in master and slave state
-RESTORE FROM 'db12s.bak';
-backup_id
-#
-START SLAVE;
-# Check databases are restored in slave
-SHOW DATABASES;
-Database
-information_schema
-db1
-db2
-mtr
-mysql
-test
-# Connecting to master 
-
-# Exercise objects
-CALL test.show_objects('db1');
-TABLE_NAME	TABLE_TYPE
-t11	BASE TABLE
-t12	BASE TABLE
-v1	VIEW
-routine_name	routine_type
-f1	FUNCTION
-CALL test.show_objects('db2');
-TABLE_NAME	TABLE_TYPE
-t21	BASE TABLE
-routine_name	routine_type
-e1	EVENT
-p1	PROCEDURE
-trgi	TRIGGER
-INSERT INTO db2.t21 VALUES(200,'b');
-CALL db2.p1();
-details
-trigger trg1 fired
-trigger trg1 fired
-SELECT db1.f1();
-db1.f1()
-2
-SELECT * FROM db2.t21;
-id	fname
-100	a
-200	b
 
[...]

mysqltest: In included file "./include/wait_for_slave_param.inc": Result content mismatch

 - saving '/export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/var/log/rpl.rpl_backup_shutdown-mix/' to '/export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/var/log/rpl.rpl_backup_shutdown-mix/'

Retrying test, attempt(2/3)...

How to repeat:
http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2306563

Suggested fix:
n/a

Hi Omer,
the issue is with the slave I/O thread shutdown that hangs and does not execute

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82881

2814 Li-Bing.Song@sun.com	2009-09-10
      Bug #47161  	rpl_backup_shutdown fails sporadically on PB2
      
      The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed" ;
      Thus will result the slave I/O thread to reconnect to master automatically.
      It sometimes happens after 'RESTORE' command has started on master.
      Master will send an error to slave and then close the connection
      if 'RESTORE' command is running, 
      and slave I/O thread will stop after receiving the error.
      So the incident RESTORE_ON_MASTER which leads to stop of the slave SQL thread 
      can not be sent from master to slave and the slave SQL thread will keep on runing.
      As a result, wait_for_slave_sql_to_stop.inc will timeout.
      
      This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master 
      and restart it after 'RESTORE' command has done.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83421

2818 Li-Bing.Song@sun.com	2009-09-16
      Bug #47161 rpl_backup_shutdown fails sporadically on PB2
      
      The master is shutdowned and restarted before 
      "RESTORE FROM 'db12m.bak' is executed".
      
      At that point, slave I/O thread notices that connection has 
      been broken, and attempt to automatically reconnect. 
      Sometimes, connection succeeds before the 'RESTORE' operation 
      is issued on the master. This is no problem, because slave 
      I/O thread will read the INCIDENT EVENT from master binlog 
      and SQL thread will eventually stop. However, if slave I/O 
      thread attempts connection when the 'RESTORE' command already 
      ongoing on the master, then master will refuse it and 
      connection will not be retried. As a consequence, Slave I/O 
      thread will not be able to read the INCIDENT EVENT, 
      ultimately causing slave SQL thread not be aware of the 
      incident event. This will make in slave SQL thread not to stop 
      and ultimately test will time out
      ("source include/wait_for_slave_sql_to_stop.inc").
      
      This patch write code to stop the slave I/O thread 
      before 'RESTORE' command starting on master and restart it 
      after 'RESTORE' command has done.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83795

2817 Li-Bing.Song@sun.com	2009-09-19
      Bug #47161 rpl_backup_shutdown fails sporadically on PB2
            
      The master is shutdowned and restarted before 
      "RESTORE FROM 'db12m.bak' is executed".
            
      At that point, slave I/O thread notices that connection has 
      been broken, and attempt to automatically reconnect. 
      Sometimes, connection succeeds before the 'RESTORE' operation 
      is issued on the master. This is no problem, because slave 
      I/O thread will read the INCIDENT EVENT from master binlog 
      and SQL thread will eventually stop. However, if slave I/O 
      thread attempts connection when the 'RESTORE' command already 
      ongoing on the master, then master will refuse it and 
      connection will not be retried. As a consequence, Slave I/O 
      thread will not be able to read the INCIDENT EVENT, 
      ultimately causing slave SQL thread not be aware of the 
      incident event. This will make in slave SQL thread not to stop 
      and ultimately test will time out
      ("source include/wait_for_slave_sql_to_stop.inc").
            
      This patch write code to stop the slave I/O thread 
      before 'RESTORE' command starting on master and restart it 
      after 'RESTORE' command has done.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83940

2861 Li-Bing.Song@sun.com	2009-09-21
      Bug #47161 rpl_backup_shutdown fails sporadically on PB2
                  
      The master is shutdowned and restarted before 
      "RESTORE FROM 'db12m.bak' is executed".
                  
      At that point, slave I/O thread notices that connection has 
      been broken, and attempt to automatically reconnect. 
      Sometimes, connection succeeds before the 'RESTORE' operation 
      is issued on the master. This is no problem, because slave 
      I/O thread will read the INCIDENT EVENT from master binlog 
      and SQL thread will eventually stop. However, if slave I/O 
      thread attempts connection when the 'RESTORE' command already 
      ongoing on the master, then master will refuse it and 
      connection will not be retried. As a consequence, Slave I/O 
      thread will not be able to read the INCIDENT EVENT, 
      ultimately causing slave SQL thread not be aware of the 
      incident event. This will make in slave SQL thread not to stop 
      and ultimately test will time out
      ("source include/wait_for_slave_sql_to_stop.inc").
                  
      This patch write code to stop the slave I/O thread 
      before 'RESTORE' command starting on master and restart it 
      after 'RESTORE' command has done.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84579

2839 Li-Bing.Song@sun.com	2009-09-25
      Bug #47161 rpl_backup_shutdown fails sporadically on PB2
            
      The master is shutdowned and restarted before 
      "RESTORE FROM 'db12m.bak' is executed".
                  
      At that point, slave I/O thread notices that connection has 
      been broken, and attempt to automatically reconnect. 
      Sometimes, connection succeeds before the 'RESTORE' operation 
      is issued on the master. This is no problem, because slave 
      I/O thread will read the INCIDENT EVENT from master binlog 
      and SQL thread will eventually stop. However, if slave I/O 
      thread attempts connection when the 'RESTORE' command already 
      ongoing on the master, then master will refuse it and 
      connection will not be retried. As a consequence, Slave I/O 
      thread will not be able to read the INCIDENT EVENT, 
      ultimately causing slave SQL thread not be aware of the 
      incident event. This will make in slave SQL thread not to stop 
      and ultimately test will time out
      ("source include/wait_for_slave_sql_to_stop.inc").
                  
      This patch write code to stop the slave I/O thread 
      before 'RESTORE' command starting on master and restart it 
      after 'RESTORE' command has done.

Pushed to mysql-6.0-codebase-bugfixing

Pushed into 6.0.14-alpha (revid:alik@sun.com-20090929093622-1mooerbh12e97zux) (version source revid:alik@sun.com-20090927203924-087s36mrs0uxepwb) (merge vers: 6.0.14-alpha) (pib:11)

Testing only, no end-user changes to document. Closed.

It fails again with the following symptom:
@@ -177,6 +177,7 @@
 mtr
 mysql
 test
+test
 # Connecting to master 

Marking the test experimental.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87726

3644 Alexander Nozdrin	2009-10-22
      Mark rpl_backup_shutdown experimental due to Bug#47161.

Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (merge vers: 6.0.14-alpha) (pib:13)

LiBing, I setting your bug optimistically as a dup of Bug #50176.
I suggest you to try on http://lists.mysql.com/commits/96475 and if it suits indeed then do nothing as the fixes are about to be pushed to mysql-backup-backport.

Thanks,

Andrei