Bug #82050 Setting LCP status to IDLE when LCP is ongoing crashes in System restart
Submitted: 29 Jun 2016 13:48 Modified: 8 Jul 2016 11:30
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.5.3 OS:Any
Assigned to: CPU Architecture:Any

[29 Jun 2016 13:48] Mikael Ronström
Description:
While performing a System Restart we normally restore all nodes from REDO log and LCPs. However
in some situations a node might need a copy phase before it is done with the system restart. In
this case the node will wait for the other nodes to complete their startup before performing the
copy phase.

This means that even in a System Restart we could start up a LCP before coming to phase 4 of
the node start up in DIH. Thus it isn't correct to initialise the LCP status to IDLE in all
cases for a System restart.

How to repeat:
Run autotest suite. Occasionally bumps into this problem.

Suggested fix:
Avoid initialising LCP status when performing this special variant of a System restart.
[8 Jul 2016 11:30] Jon Stephens
DOcumented fix in the NDB 7.5.4 changelog as follows:

    Usually, when performing a system restart, all nodes are
    restored from redo logs and local checkpoints (LCPs), but in
    some cases some node might require a copy phase before it is
    finished with the system restart. When this happens, the node in
    question waits for all other nodes to start up completely before
    performing the copy phase. Notwithstanding the fact that it is
    thus possible to begin a local checkpoint before reaching start
    phase 4 in DBDIH, LCP status was initialized to IDLE in all
    cases for a system restart. Now, when performing this variant of
    a system restart, the LCP status is no longer initialised.

Closed.