| Bug #47161 | rpl_backup_shutdown fails sporadically on PB2 | ||
|---|---|---|---|
| Submitted: | 7 Sep 2009 0:09 | Modified: | 11 Jan 2010 16:14 |
| Reporter: | Luis Soares | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | Tests: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.4 | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | experimental, mysql-next-bugfixing, pb2, test failure | ||
[10 Sep 2009 6:10]
Libing Song
Hi Omer, the issue is with the slave I/O thread shutdown that hangs and does not execute
[10 Sep 2009 7:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82881 2814 Li-Bing.Song@sun.com 2009-09-10 Bug #47161 rpl_backup_shutdown fails sporadically on PB2 The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed" ; Thus will result the slave I/O thread to reconnect to master automatically. It sometimes happens after 'RESTORE' command has started on master. Master will send an error to slave and then close the connection if 'RESTORE' command is running, and slave I/O thread will stop after receiving the error. So the incident RESTORE_ON_MASTER which leads to stop of the slave SQL thread can not be sent from master to slave and the slave SQL thread will keep on runing. As a result, wait_for_slave_sql_to_stop.inc will timeout. This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master and restart it after 'RESTORE' command has done.
[16 Sep 2009 7:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83421 2818 Li-Bing.Song@sun.com 2009-09-16 Bug #47161 rpl_backup_shutdown fails sporadically on PB2 The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed". At that point, slave I/O thread notices that connection has been broken, and attempt to automatically reconnect. Sometimes, connection succeeds before the 'RESTORE' operation is issued on the master. This is no problem, because slave I/O thread will read the INCIDENT EVENT from master binlog and SQL thread will eventually stop. However, if slave I/O thread attempts connection when the 'RESTORE' command already ongoing on the master, then master will refuse it and connection will not be retried. As a consequence, Slave I/O thread will not be able to read the INCIDENT EVENT, ultimately causing slave SQL thread not be aware of the incident event. This will make in slave SQL thread not to stop and ultimately test will time out ("source include/wait_for_slave_sql_to_stop.inc"). This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master and restart it after 'RESTORE' command has done.
[19 Sep 2009 11:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83795 2817 Li-Bing.Song@sun.com 2009-09-19 Bug #47161 rpl_backup_shutdown fails sporadically on PB2 The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed". At that point, slave I/O thread notices that connection has been broken, and attempt to automatically reconnect. Sometimes, connection succeeds before the 'RESTORE' operation is issued on the master. This is no problem, because slave I/O thread will read the INCIDENT EVENT from master binlog and SQL thread will eventually stop. However, if slave I/O thread attempts connection when the 'RESTORE' command already ongoing on the master, then master will refuse it and connection will not be retried. As a consequence, Slave I/O thread will not be able to read the INCIDENT EVENT, ultimately causing slave SQL thread not be aware of the incident event. This will make in slave SQL thread not to stop and ultimately test will time out ("source include/wait_for_slave_sql_to_stop.inc"). This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master and restart it after 'RESTORE' command has done.
[21 Sep 2009 13:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83940 2861 Li-Bing.Song@sun.com 2009-09-21 Bug #47161 rpl_backup_shutdown fails sporadically on PB2 The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed". At that point, slave I/O thread notices that connection has been broken, and attempt to automatically reconnect. Sometimes, connection succeeds before the 'RESTORE' operation is issued on the master. This is no problem, because slave I/O thread will read the INCIDENT EVENT from master binlog and SQL thread will eventually stop. However, if slave I/O thread attempts connection when the 'RESTORE' command already ongoing on the master, then master will refuse it and connection will not be retried. As a consequence, Slave I/O thread will not be able to read the INCIDENT EVENT, ultimately causing slave SQL thread not be aware of the incident event. This will make in slave SQL thread not to stop and ultimately test will time out ("source include/wait_for_slave_sql_to_stop.inc"). This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master and restart it after 'RESTORE' command has done.
[25 Sep 2009 2:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/84579 2839 Li-Bing.Song@sun.com 2009-09-25 Bug #47161 rpl_backup_shutdown fails sporadically on PB2 The master is shutdowned and restarted before "RESTORE FROM 'db12m.bak' is executed". At that point, slave I/O thread notices that connection has been broken, and attempt to automatically reconnect. Sometimes, connection succeeds before the 'RESTORE' operation is issued on the master. This is no problem, because slave I/O thread will read the INCIDENT EVENT from master binlog and SQL thread will eventually stop. However, if slave I/O thread attempts connection when the 'RESTORE' command already ongoing on the master, then master will refuse it and connection will not be retried. As a consequence, Slave I/O thread will not be able to read the INCIDENT EVENT, ultimately causing slave SQL thread not be aware of the incident event. This will make in slave SQL thread not to stop and ultimately test will time out ("source include/wait_for_slave_sql_to_stop.inc"). This patch write code to stop the slave I/O thread before 'RESTORE' command starting on master and restart it after 'RESTORE' command has done.
[25 Sep 2009 3:10]
Libing Song
Pushed to mysql-6.0-codebase-bugfixing
[30 Sep 2009 8:16]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20090929093622-1mooerbh12e97zux) (version source revid:alik@sun.com-20090927203924-087s36mrs0uxepwb) (merge vers: 6.0.14-alpha) (pib:11)
[1 Oct 2009 8:26]
Jon Stephens
Testing only, no end-user changes to document. Closed.
[22 Oct 2009 6:30]
Alexander Nozdrin
It fails again with the following symptom: @@ -177,6 +177,7 @@ mtr mysql test +test # Connecting to master Marking the test experimental.
[22 Oct 2009 6:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/87726 3644 Alexander Nozdrin 2009-10-22 Mark rpl_backup_shutdown experimental due to Bug#47161.
[22 Oct 2009 6:35]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (merge vers: 6.0.14-alpha) (pib:13)
[11 Jan 2010 16:14]
Andrei Elkin
LiBing, I setting your bug optimistically as a dup of Bug #50176. I suggest you to try on http://lists.mysql.com/commits/96475 and if it suits indeed then do nothing as the fixes are about to be pushed to mysql-backup-backport. Thanks, Andrei

Description: rpl_backup_shutdown failures have been observed lately, in PB2, with the symptom posted below. For a full failure description check howto repeat section. rpl.rpl_backup_shutdown 'mix' [ fail ] Test ended at 2009-09-05 15:46:40 CURRENT_TEST: rpl.rpl_backup_shutdown --- /export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/suite/rpl/r/rpl_backup_shutdown.result 2009-09-05 12:55:00.000000000 +0300 +++ /export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/suite/rpl/r/rpl_backup_shutdown.reject 2009-09-05 16:46:40.000000000 +0300 @@ -116,31 +116,38 @@ # # Slave will be stopped as result of restore incident event # Connecting to slave... -# Check the slave status +**** ERROR: timeout after seconds while waiting for slave parameter Slave_SQL_Running = No **** +Message: Failed while waiting for slave SQL thread to stop +Current connection is 'slave' +Note: the following output may have changed since the failure was detected + +[on slave] + +**** SHOW SLAVE STATUS on slave **** SHOW SLAVE STATUS; -Slave_IO_State # +Slave_IO_State Master_Host 127.0.0.1 Master_User root -Master_Port MASTER_PORT +Master_Port 13000 Connect_Retry 1 -Master_Log_File # -Read_Master_Log_Pos # -Relay_Log_File # -Relay_Log_Pos # -Relay_Master_Log_File master-bin.000002 -Slave_IO_Running Yes -Slave_SQL_Running No +Master_Log_File master-bin.000001 +Read_Master_Log_Pos 2578 +Relay_Log_File slave-relay-bin.000003 +Relay_Log_Pos 2725 +Relay_Master_Log_File master-bin.000001 +Slave_IO_Running No +Slave_SQL_Running Yes Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table -Last_Errno 1590 -Last_Error The incident RESTORE_ON_MASTER occured on the master. Message: A restore operation was initiated on the master. +Last_Errno 0 +Last_Error Skip_Counter 0 -Exec_Master_Log_Pos # -Relay_Log_Space # +Exec_Master_Log_Pos 2578 +Relay_Log_Space 2881 Until_Condition None Until_Log_File Until_Log_Pos 0 @@ -150,105 +157,114 @@ Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key -Seconds_Behind_Master # +Seconds_Behind_Master NULL Master_SSL_Verify_Server_Cert No -Last_IO_Errno 0 -Last_IO_Error -Last_SQL_Errno 1590 -Last_SQL_Error The incident RESTORE_ON_MASTER occured on the master. Message: A restore operation was initiated on the master. +Last_IO_Errno 1765 +Last_IO_Error The master is not allowing slave connections. +Last_SQL_Errno 0 +Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id 1 -SET global sql_slave_skip_counter=1; -# Note that slave is stopped with an error -STOP SLAVE; -# Perform restore in slave to have sync in master and slave state -RESTORE FROM 'db12s.bak'; -backup_id -# -START SLAVE; -# Check databases are restored in slave -SHOW DATABASES; -Database -information_schema -db1 -db2 -mtr -mysql -test -# Connecting to master - -# Exercise objects -CALL test.show_objects('db1'); -TABLE_NAME TABLE_TYPE -t11 BASE TABLE -t12 BASE TABLE -v1 VIEW -routine_name routine_type -f1 FUNCTION -CALL test.show_objects('db2'); -TABLE_NAME TABLE_TYPE -t21 BASE TABLE -routine_name routine_type -e1 EVENT -p1 PROCEDURE -trgi TRIGGER -INSERT INTO db2.t21 VALUES(200,'b'); -CALL db2.p1(); -details -trigger trg1 fired -trigger trg1 fired -SELECT db1.f1(); -db1.f1() -2 -SELECT * FROM db2.t21; -id fname -100 a -200 b [...] mysqltest: In included file "./include/wait_for_slave_param.inc": Result content mismatch - saving '/export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/var/log/rpl.rpl_backup_shutdown-mix/' to '/export/home4/pb2/build/sb_3-None-1252144305.27/mysql-next-bugfixing-gcov/mysql-test/var/log/rpl.rpl_backup_shutdown-mix/' Retrying test, attempt(2/3)... How to repeat: http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2306563 Suggested fix: n/a