| Bug #57918 | rpl_get_master_version_and_clock times out sporadically | ||
|---|---|---|---|
| Submitted: | 2 Nov 2010 8:54 | Modified: | 6 Jan 2011 4:42 |
| Reporter: | Alexander Nozdrin | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Tests: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.6 (Dahlia, Erica) | OS: | Any |
| Assigned to: | Daogang Qu | CPU Architecture: | Any |
| Tags: | experimental | ||
[18 Nov 2010 8:12]
Alexander Nozdrin
Another symptom:
------------------------------------------------------------
rpl.rpl_get_master_version_and_clock [ fail ] Found warnings/errors in server log file!
Test ended at 2010-11-18 00:49:46
line
101118 2:49:33 [ERROR] Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs
101118 2:49:33 [ERROR] Failed on request_dump()
101118 2:49:33 [Warning] Slave I/O: Master command COM_BINLOG_DUMP failed: failed dump request, reconnecting to try again, log 'master-bin.000002' at postion 109, Error_code: 1597
^ Found warnings in /export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/mysqld.2.err
ok
- saving '/export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/rpl.rpl_get_master_version_and_clock/' to '/export/home2/pb2/test/sb_2-2544708-1290036552.5/mysql-5.6.1-m5-linux-x86_64-test/mysql-test/var-n_mix/log/rpl.rpl_get_master_version_and_clock/'
Retrying test rpl.rpl_get_master_version_and_clock, attempt(2/3)...
rpl.rpl_get_master_version_and_clock [ retry-pass ] 23680
Retrying test rpl.rpl_get_master_version_and_clock, attempt(3/3)...
rpl.rpl_get_master_version_and_clock [ retry-pass ] 24193
------------------------------------------------------------
Log: http://pb2.norway.sun.com/?action=archive_download&archive_id=2546897&pretty=please
Tree: http://pb2.norway.sun.com/?template=show_pushes&branch=daily-trunk-bugfixing
[18 Nov 2010 9:03]
Alexander Nozdrin
The test case has been made experimental due to this bug.
[23 Nov 2010 20:17]
Alfranio Tavares Correia Junior
Please, check if the problem is not in . extra/rpl_tests/rpl_get_master_version_and_clock.test and could be easily fixed by the following patch: --- mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test 2010-10-21 13:24:31 +0000 +++ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test 2010-11-23 20:01:42 +0000 @@ -60,6 +60,7 @@ # Send shutdown to the connected server and give # it 10 seconds to die before zapping it shutdown_server 10; +--source include/wait_until_disconnected.inc connection slave; --echo slave is unblocked Any test case that crashes or shuts down a server must follow the following pattern: --exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect # CRASH OR SHUTDOWN --source include/wait_until_disconnected.inc --enable_reconnect --exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect --source include/wait_until_connected_again.inc
[24 Nov 2010 15:04]
Bjørn Munch
Since this test restarts the server, a timeout could potentially have been caused by some test failure combined with Bug #58424. Although this only triggers if the server that's expected to die dies within 100 ms *after* the test client has failed.
[9 Dec 2010 8:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126392 3427 Dao-Gang.Qu@sun.com 2010-12-09 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally inster of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on untill invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment. while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will retrurn normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and try to connect master while the master sever is really shutdown and the test will not have a chance to start master, because during the period the test will stall there to wait for a network error until it exits with a timeout error. To fix these problems to source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[10 Dec 2010 6:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126496 3427 Dao-Gang.Qu@sun.com 2010-12-10 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 9:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127151 3443 Dao-Gang.Qu@sun.com 2010-12-17 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 10:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127152 3444 Dao-Gang.Qu@sun.com 2010-12-17 Bug #57918 rpl_get_master_version_and_clock times out sporadically The reason is that sometimes the slave is unblocked before the master server is really shutdown. Symptom 1, found warnings/errors in server log file. In the situation, the 'get_master_version_and_clock' func will return normally instead of network error expected. Then the 'handle_slave_io' func will go to invoke 'get_master_uuid' func, which will return normally if the master server is still alive. Then the process will go on until invoke 'request_dump' func, which will print the "Error on COM_BINLOG_DUMP: 2003 Can't connect to MySQL server on '127.0.0.1' (111), will retry in 1 secs" error as symptom reported if the server is really shutdown in the moment, while the test is waiting for the network error. Symptom 2, timeout after 900 seconds In the situation, the 'get_master_version_and_clock', get_master_uuid, and 'request_dump' funcs will return normally if the master server is alive during the period, then the 'handle_slave_io' func will enter 'while (!io_slave_killed(thd,mi))' block and exit with error when fail to reconnect master, which is really shutdown at that time. Then the test will not have a chance to start master, because during the period it will stall there to wait for a network error until it exits with a timeout error. To fix these problems, we source wait_until_disconnected.inc after shutdown master server for guaranteeing the master server is really shutdown before the slave is unblocked. @ mysql-test/extra/rpl_tests/rpl_get_master_version_and_clock.test Update test to guarantee the master server is really shutdown before the slave is unblocked.
[17 Dec 2010 12:57]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:georgi.kodinov@oracle.com-20101217125013-y8pb3az32rtbplc9) (version source revid:georgi.kodinov@oracle.com-20101217112525-t71wcarh16fii9bz) (merge vers: 5.6.1) (pib:24)
[18 Dec 2010 13:11]
Jon Stephens
Testing only; no changelog entry needed. Closed.
[27 Dec 2010 9:36]
Anitha Gopi
Re opening since this test is still in the experimental group. See http://trollheim.norway.sun.com/archive/2712721.disabled_tests.html. If the test is passing now please remove it from collections/default.experimental
[27 Dec 2010 10:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127594 3472 Dao-Gang.Qu@sun.com 2010-12-27 Bug #57918 rpl_get_master_version_and_clock times out sporadically Post fix Get rid of 'rpl_get_master_version_and_clock' entry from collections/default.experimental. @ mysql-test/collections/default.experimental Get rid of 'rpl_get_master_version_and_clock' entry.
[29 Dec 2010 12:52]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:alexander.nozdrin@oracle.com-20101229113850-5rhksndrwz5dwcd7) (version source revid:alexander.nozdrin@oracle.com-20101229113149-c4hnmghox8obr75c) (merge vers: 5.6.2) (pib:24)
[6 Jan 2011 4:42]
Jon Stephens
Testing changes only; no changelog entry needed. Closed.

Description: rpl.rpl_get_master_version_and_clock [ fail ] timeout after 900 seconds Test ended at 2010-10-30 22:43:38 Test case timeout after 900 seconds == /export/home/pb2/test/sb_1-2471220-1288466864.57/mysql-5.6.99-m5-linux-x86_64-test/mysql-test/var-n_mix/log/rpl_get_master_version_and_clock.log == **** SHOW BINLOG EVENTS on slave **** SHOW BINLOG EVENTS IN 'slave-bin.000001'; Log_name Pos Event_type Server_id End_log_pos Info slave-bin.000001 4 Format_desc 2 109 Server ver: 5.6.99-m5-debug-log, Binlog ver: 4 slave-bin.000001 109 Query 2 176 BEGIN slave-bin.000001 176 Query 2 402 use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'Slave I/O: Master command COM_REGISTER_SLAVE failed: .*' COLLATE 'latin1_swedish_ci')) slave-bin.000001 402 Query 2 470 COMMIT slave-bin.000001 470 Query 2 537 BEGIN slave-bin.000001 537 Query 2 816 use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'Slave I/O: .* failed with error: Lost connection to MySQL server at \'reading initial communication packet\'' COLLATE 'latin1_swedish_ci')) slave-bin.000001 816 Query 2 884 COMMIT slave-bin.000001 884 Query 2 951 BEGIN slave-bin.000001 951 Query 2 1218 use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; .*' COLLATE 'latin1_swedish_ci')) slave-bin.000001 1218 Query 2 1286 COMMIT slave-bin.000001 1286 Query 2 1353 BEGIN slave-bin.000001 1353 Query 2 1562 use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'Slave I/O thread .* register on master' COLLATE 'latin1_swedish_ci')) slave-bin.000001 1562 Query 2 1630 COMMIT How to repeat: Log: http://pb2.norway.sun.com/?action=archive_download&archive_id=2472613&pretty=please XRef: http://pb2.norway.sun.com/?template=mysql_show_test_failure&test_failure_id=&test_output_i...