Bug #57918 | rpl_get_master_version_and_clock times out sporadically | ||
---|---|---|---|
Submitted: | 2 Nov 2010 8:54 | Modified: | 6 Jan 2011 4:42 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Tests: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6 (Dahlia, Erica) | OS: | Any |
Assigned to: | Daogang Qu | CPU Architecture: | Any |
Tags: | experimental |
[2 Nov 2010 8:54]
Alexander Nozdrin
[18 Nov 2010 8:12]
Alexander Nozdrin
Another symptom: ------------------------------------------------------------ rpl.rpl_get_master_version_and_clock [ fail ] Found warnings/errors in server log file! Test ended at 2010-11-18 00:49:46 line 101118 2:49:33 [ERROR] Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs 101118 2:49:33 [ERROR] Failed on request_dump() 101118 2:49:33 [Warning] Slave I/O: Master command COM_BINLOG_DUMP failed: failed dump request, reconnecting to try again, log 'master-bin.000002' at postion 109, Error_code: 1597 ^ Found warnings in /export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/mysqld.2.err ok - saving '/export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/rpl.rpl_get_master_version_and_clock/' to '/export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/rpl.rpl_get_master_version_and_clock/' Retrying test rpl.rpl_get_master_version_and_clock, attempt(2/3)... rpl.rpl_get_master_version_and_clock [ retry-pass ] 23680 Retrying test rpl.rpl_get_master_version_and_clock, attempt(3/3)... rpl.rpl_get_master_version_and_clock [ retry-pass ] 24193 ------------------------------------------------------------ Log: http://pb2.norway.sun.com/?action=archive_download&archive_id=2546897&pretty=please Tree: http://pb2.norway.sun.com/?template=show_pushes&branch=daily-trunk-bugfixing
[18 Nov 2010 9:03]
Alexander Nozdrin
The test case has been made experimental due to this bug.
[23 Nov 2010 20:17]
Alfranio Tavares Correia Junior
Please, check if the problem is not in . extra/rpl_tests/rpl_get_master_version_and_clock.test and could be easily fixed by the following patch: --- mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test 2010-10-21 13:24:31 +0000 +++ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test 2010-11-23 20:01:42 +0000 @@ -60,6 +60,7 @@ # Send shutdown to the connected server and give # it 10 seconds to die before zapping it shutdown_server 10; +--source include/wait_until_disconnected.inc connection slave; --echo slave is unblocked Any test case that crashes or shuts down a server must follow the following pattern: --exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect # CRASH OR SHUTDOWN --source include/wait_until_disconnected.inc --enable_reconnect --exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect --source include/wait_until_connected_again.inc
[24 Nov 2010 15:04]
Bjørn Munch
Since this test restarts the server, a timeout could potentially have been caused by some test failure combined with Bug #58424. Although this only triggers if the server that's expected to die dies within 100 ms *after* the test client has failed.
[9 Dec 2010 8:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126392 3427 Dao-Gang.Qu@sun.com 2010-12-09 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally inster of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on untill invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment. while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will retrurn normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and try to connect master while the master sever is really shutdown and the test will not have a chance to start master, because during the period the test will stall there to wait for a network error until it exits with a timeout error. To fix these problems to source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[10 Dec 2010 6:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126496 3427 Dao-Gang.Qu@sun.com 2010-12-10 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 9:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127151 3443 Dao-Gang.Qu@sun.com 2010-12-17 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 10:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127152 3444 Dao-Gang.Qu@sun.com 2010-12-17 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 12:57]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:georgi.kodinov@oracle.com-20101217125013-y8pb3az32rtbplc9) (version source revid:georgi.kodinov@oracle.com-20101217112525-t71wcarh16fii9bz) (merge vers: 5.6.1) (pib:24)
[18 Dec 2010 13:11]
Jon Stephens
Testing only; no changelog entry needed. Closed.
[27 Dec 2010 9:36]
Anitha Gopi
Re opening since this test is still in the experimental group. See http://trollheim.norway.sun.com/archive/2712721.disabled_tests.html. If the test is passing now please remove it from collections/default.experimental
[27 Dec 2010 10:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127594 3472 Dao-Gang.Qu@sun.com 2010-12-27 Bug #57918 rpl_get_master_version_and_clock times out sporadically Post fix Get rid of 'rpl_get_master_version_and_clock' entry from collections/default.experimental. @ mysql-test/collections/default.experimental Get rid of 'rpl_get_master_version_and_clock' entry.
[29 Dec 2010 12:52]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:alexander.nozdrin@oracle.com-20101229113850-5rhksndrwz5dwcd7) (version source revid:alexander.nozdrin@oracle.com-20101229113149-c4hnmghox8obr75c) (merge vers: 5.6.2) (pib:24)
[6 Jan 2011 4:42]
Jon Stephens
Testing changes only; no changelog entry needed. Closed.