MySQL Bugs: #78922: Autotest testNodeRestart -n Bug18612SR sometimes did not clear error.

Bug #78922	Autotest testNodeRestart -n Bug18612SR sometimes did not clear error.
Submitted:	22 Oct 2015 11:59	Modified:	3 Mar 2016 15:23
Reporter:	Mauritz Sundell	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Tests: Cluster	Severity:	S3 (Non-critical)
Version:	7.4.8	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Autotest testNodeRestart -n Bug18612SR sometimes did not clear error.

Occasionally only half cluster stopped during test instead of
whole cluster.

In these cases errors injecting in the surviving partition were
still set when cluster was started again.

This could introduce test failures, both test itself and in
following test runned on same cluster.

Test testIndex -n DeferredError has been seen failing due to this.

How to repeat:
Code inspection:
In function runBug18612SR one waits for all nodes to stop:
    g_err << "Waiting cluster/nodes no-start" << endl;
    if (restarter.waitClusterNoStart(30))
      if (restarter.waitNodesNoStart(partition0, cnt/2, 10))
        if (restarter.waitNodesNoStart(partition1, cnt/2, 10))
          return NDBT_FAILED;
But the above can succeed if stopping only one partition succeeds.
In that case the error inserts might not been cleared for all nodes in the other partition.

Also on can see data nodes crashing on error 932 for following tests not even use error 932, like testIndex -n DeferredError in 7.4.

Suggested fix:
Makes sure to clear injected error in cases there only half cluster stops.

Posted by developer:
 
Pushed to 7.2.23, 7.3.12, 7.4.9, 7.5.0.
Test case fix.
No documentation needed.

Testing only, no changelog entry needed. Fixed in NDB 7.2.23, 7.3.12, 7.4.9, 7.5.0. Closed.