Bug #28144 | "Slave_IO_Running" differs in replication tests | ||
---|---|---|---|
Submitted: | 27 Apr 2007 19:00 | Modified: | 27 Jun 2007 20:04 |
Reporter: | Joerg Bruehe | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.40 | OS: | Other (Sol10x86 + Linux) |
Assigned to: | Ramil Kalimullin | CPU Architecture: | Any |
[27 Apr 2007 19:00]
Joerg Bruehe
[11 May 2007 16:42]
Jonathan Miller
Are there any errors in the slave error logs? I can not see why it would be failing in rpl_ssl, but rpl_log_pos is poorly done and probably should be rewritten. Other issues is that I do not have a Solaris to test with. /Jeb
[12 May 2007 15:15]
Joerg Bruehe
When we do a release build, we run a series (once for each configuration) of configure ; make ; test actions, and for each such action we copy back the generated package and the log. The tests are run using the "--force" option, so they continue after a failure. When we analyze the log and extract the failure information (if any), the next configuration is already being built and tested - maybe even more. So any trace files have likely been overwritten by the continuing test, they are already gone when we analyze, and even the archive of this test build has been deleted in the three weeks that passed since the run. As regards the machine: It will be possible to repeat the test on that build host, the tools will support parallel runs. From my build experience, "Slave IO running" is a field that every now and then produces a different result - it is mentioned in several different bug reports. So IMO either we find a better way to synchronize master and slave for such a report, to guarantee stable results, or we use the "--replace-result" mechanism to ignore this field (provided its value is not critical ti judge the test success).
[14 May 2007 19:46]
BJ Dierkes
I have the same issue on Redhat Enterprise Linux ES 3 - x86_64 rpl_ssl [ fail ] Errors are (from /var/rpebuildfarm/build/mysql-5.0.41-1.rs/BUILD/mysql-5.0.41/mysql-test/var/log/mysqltest-time) : mysqltest: Result length mismatch (the last lines may be the most important ones) Below are the diffs between actual and expected results: ------------------------------------------------------- *** r/rpl_ssl.result Wed May 2 18:10:47 2007 --- r/rpl_ssl.reject Mon May 14 22:29:40 2007 *************** *** 68,74 **** Relay_Log_File # Relay_Log_Pos # Relay_Master_Log_File master-bin.000001 ! Slave_IO_Running Yes Slave_SQL_Running Yes Replicate_Do_DB Replicate_Ignore_DB --- 68,74 ---- Relay_Log_File # Relay_Log_Pos # Relay_Master_Log_File master-bin.000001 ! Slave_IO_Running No Slave_SQL_Running Yes Replicate_Do_DB Replicate_Ignore_DB ------------------------------------------------------- Please follow the instructions outlined at http://www.mysql.com/doc/en/Reporting_mysqltest_bugs.html to find the reason to this problem and how to report this. Result from queries before failure can be found in r/rpl_ssl.log Aborting: rpl_ssl failed in ps-protocol mode. To continue, re-run with '--force'. Stopping All Servers make: *** [test-ps] Error 1
[30 May 2007 14:40]
Lars Thalmann
See BUG#28772.
[8 Jun 2007 12:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28395 ChangeSet@1.2529, 2007-06-08 17:51:29+05:00, ramil@mysql.com +5 -0 Fix for bug #28144: "Slave_IO_Running" differs in replication tests Problem: Slave_IO_Running values depend on circumstances while they are not important, actually. Fix: replace them with #
[8 Jun 2007 13:58]
Jonathan Miller
Hi, Please note I disagree with proposed patch. We need to understand why it differs, not mask it! Once we understand why it differs then we could patch it correctly. Just my $.02 /Jeb
[8 Jun 2007 14:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28403 ChangeSet@1.2529, 2007-06-08 19:01:39+05:00, ramil@mysql.com +5 -0 Fix for bug #28144: "Slave_IO_Running" differs in replication tests Problem: Slave_IO_Running values depend on circumstances while they are not important, actually. Fix: replace them with #, remove SHOW SLAVE STATUS if we don't really need it.
[13 Jun 2007 8:40]
Ramil Kalimullin
Jeb, the reason why it differs is clear: we don't wait for an explicit (desirable?) state of the slave, just use some incoherent combinations of delays etc. instead, so we get some uncertain slave's parameters (e.g Slave_IO_Running) sometimes as them may depend on external factors such as machine/network load, etc. Please also see emails from Magnus and Mats.
[13 Jun 2007 19:17]
Jonathan Miller
Relay_Log_Pos can be nondeterministic (I do agree that not all items always come out the same), but Slave_IO_State is not nondeterministic, it is very deterministic, and if it has stopped when we don't expect it then there is an issue. And if it does not stop when expected there is an issue. The slave IO Thread is deterministic just as is sql thread. the should run and stop on all systems just them same. If not, please show me in the docs where we say that they act differently on different platforms. My suggestion would be to figure out what the state of it should be during that test. If different on some platforms then it is not expected and should be investigated as why it behaves the way it does. I disagree with masking a deterministic field because the value is not what was expect and is not exactly what we are testing for. In addition, we do a lot of things in testing maybe one thing and we should expect that all these things we use to act the same when they are deterministic, other wise it becomes a nightmare for QA and for customer that are trying to use the product. Best wishes,
[15 Jun 2007 5:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28834 ChangeSet@1.2529, 2007-06-15 10:47:10+05:00, ramil@mysql.com +8 -0 Fix for bug #28144: "Slave_IO_Running" differs in replication tests Problem: show slave status may return different Slave_IO_Running values running some tests. Fix: wait for a certain slave state if needed to get tests more predictable.
[15 Jun 2007 11:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28858 ChangeSet@1.2536, 2007-06-15 16:09:28+05:00, ramil@mysql.com +5 -0 Fix for bug #28144: "Slave_IO_Running" differs in replication tests Problem: show slave status may return different Slave_IO_Running values running some tests. Fix: wait for a certain slave state if needed to get tests more predictable.
[22 Jun 2007 18:07]
Bugs System
Pushed into 5.1.20-beta
[22 Jun 2007 18:09]
Bugs System
Pushed into 5.0.46
[27 Jun 2007 20:04]
Paul DuBois
Test case changes. No changelog entry needed.