Bug #58718 | Second rpl test sporadically fails with error 1220 | ||
---|---|---|---|
Submitted: | 3 Dec 2010 18:53 | Modified: | 8 Feb 2011 10:19 |
Reporter: | Nirbhay Choubey | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Tools: MTR / mysql-test-run | Severity: | S3 (Non-critical) |
Version: | mysql-5.5, 5.6.1 | OS: | Any |
Assigned to: | Bjørn Munch | CPU Architecture: | Any |
Tags: | regression |
[3 Dec 2010 18:53]
Nirbhay Choubey
[3 Dec 2010 18:56]
Nirbhay Choubey
Testcase for this bug.
Attachment: bug.tar.gz (application/x-gzip, text), 509 bytes.
[6 Dec 2010 15:42]
Bjørn Munch
Some rpl test will have to restart the server before and/or after they have been run. The reason this doesn't fail if the "internal check" for the first check fails, is that this results in a server restart. If I try the supplied example, I have to use --nocheck-testcase and then the second test just hangs. So I can't say what is the exact cause in this case. Bug #49978 are adding some cleanup to rpl tests. But if a test still needs to restart the server after it's run, it can add this at any point in the test: call mtr.force_restart(); If a test has to start on a fresh server for some reason, add this to the <test>-master.opt file: --force-restart
[7 Dec 2010 9:23]
Bjørn Munch
When this happens, it's usually the first test that leaves the DB in a state which affects the next test. There is nothing mtr/mysqltest can do about that. In this example, experiments show that it's the change master in the first test that does it; if I comment that out the second test also works. In general, each test should if possible reset the state. Bug #49978 is fixing some of that. If that's not possible, a restart may be forced by "call mtr.force_restart();" as mentioned previously. Changing category to Tests/Replication.
[15 Dec 2010 13:04]
Sveta Smirnova
Thank you for the report. Verified as described. Not repeatable with 5.1
[3 Jan 2011 16:53]
Luis Soares
I don't think there is a bug here. In the test presented by Nirbhay there is a change in replication setup data, in particular the following: 1. different rpl user ('rpl'), used to connect to master, is created 2. replication slave threads are stopped 3. IO thread connection details are changed so the it now uses the 'rpl' user: CHANGE MASTER 4. replication is started again 5. the 'rpl' user is dropped on the master 6. replication slave is stopped 7. test file ends This means that the replication test did not reset the connection data and when MTR starts the second test, it will fail to start the IO thread. Should the test writer had reset the slave data used to connect to the master, in rpl_1, then there would be no issue at all when MTR sets rpl_2 to execute. In fact, this probably has nothing to do with MTR running a second rpl_2 test case after rpl_1. For instance, things could go awfully wrong if we did include a subtest in rpl_1 after the existing test instructions and without reseting replication connection data. For example, by just adding the following two lines after the last 'stop slave;': start slave; -- source include/wait_for_slave_io_to_start.inc I think that as Bjorn states, for such functional/structural rpl changes, either the test writer deals with the need to reset the slave's state or forces a server restart (in such a way that it implicitly resets the slave's data). Nirbhay, were you thinking on something more specific that I failed to spot ?
[4 Jan 2011 15:17]
Nirbhay Choubey
8<8<8<8<8< <nirbhay> And now I see 'Timeout in include/wait_for_slave_param.inc' failure in the 2nd test. <luis> yes <nirbhay> Possibly due to after effects of fix for bug#49978. <luis> possibly <luis> but in the 1st rpl test, the replication topology is effectively broken and there is no way that MTR will notice that <luis> restoring the topology is responsibility of the test writer, so that further tests are not affected <nirbhay> I see. <luis> the problem here is that the replication setup was broken by the tester and there is no way to recover unless the test writer resets it or MTR is instructed to restart servers from scratch (with defaults) <nirbhay> I was thinking of, if there is a way for MTR to sense such broken topology, and automatically force_restart from the next test. <luis> right, but then you would have to ask bjorn that he implements something like checktestcase for MTR so that it would check SHOW SLAVE STATUS output and if it was not according to the expected, then it would restart the servers <luis> not sure how feasible it is though.. 8<8<8<8<8< Bjorn, Is there a way for MTR to sense and force a restart for the subsequent test, if it executes a test written in a way that breaks the replication topology (as rpl_1.test)? (Instead of making the test writer to conform to some correct rpl test format)
[4 Jan 2011 15:24]
Bjørn Munch
To the last comment: no, that would require MTR itself to have much more detailed knowledge that I think it ought to have, and it would have to do that for every test. What might be possible, is to write some test code for checking topology and then do "call mtr.force_restart()" only if necessary.
[5 Jan 2011 9:23]
Bjørn Munch
It shouldn't be MTR's responsibility to check the database for any possible inconsistency or change after a test run. If this is something that's not covered by the general "check-testcase" then it would need to be coded in the test itself (or in some common include file used by several tests). Any test that is thought likely to cause trouble for the following tests even when successful, can avoid the trouble by forcing a restart, as explained in a previous comment.
[8 Feb 2011 10:19]
Bjørn Munch
Closing to get it off my list