Bug #74154 Node restart blocked by DICT LOCK held by schema transaction
Submitted: 30 Sep 2014 13:58 Modified: 4 Nov 2014 18:05
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.3.7 OS:Any
Assigned to: CPU Architecture:Any

[30 Sep 2014 13:58] Ole John Aske
Description:

Found from analyzing the logs from the AutoTest: 'testDict -n SchemaTrans'

That test gets stuck for a while in the test case 'y1' while
it is waiting for a node to restart, The test client finally times out
after a while - *then* the restarting node is allowed to restart.

However, this is not acceptable behavior as restart should not
be blocked by a client.

From the logs:

....
CASE v2 st_test_snf_parse+1 - slave node fail in parse phase, abort
CASE w1 st_test_sr_parse+0 - system restart in parse phase, commit
CASE w2 st_test_sr_parse+1 - system restart in parse phase, abort
CASE x1 st_test_mnf_parse+0 - master node fail in parse phase, commit
CASE x2 st_test_mnf_parse+1 - master node fail in parse phase, abort
CASE x3 st_test_mnf_prepare+0 - master node fail in prepare phase
CASE x4 st_test_mnf_commit1+0 - master node fail in start of commit phase
CASE x5 st_test_mnf_commit2+0 - master node fail in end of commit phase
CASE y1 st_test_mnf_start_partial+0 - master node fail in start phase, partial rollback
waitNodesState(STARTED, -1) resetting number of attempts 0
waitNodesState(STARTED, -1) resetting number of attempts 1
waitNodesState(STARTED, -1) resetting number of attempts 2
....
waitNodesState(STARTED, -1) timeout after 121 attemps
FAIL 6866 c.restarter->waitClusterStarted() == 0
FAIL 7345 (*test.func)(c, test.arg) == NDBT_OK
FAIL 7416 st_test(c, test) == NDBT_OK
nodes up:3,4,5 down: unknown:2
  |- runSchemaTrans FAILED [2014-09-20 21:15:58]
- SchemaTrans FAILED [2014-09-20 21:15:58]

---------And the mgmd log: ---------------

2014-09-30 15:28:08 [MgmtSrvr] INFO     -- Node 3: GCP Take over completed   <<--- 2) Node 3 becomes new master
...
2014-09-30 15:28:08 [MgmtSrvr] INFO     -- Node 3: DICT : SCHEMA_TRANS_BEGIN_REQ trylock 0 Success  <<-- 3) Test client grabs DICT lock
2014-09-30 15:28:08 [MgmtSrvr] ALERT    -- Node 5: Node 2 Disconnected
2014-09-30 15:28:08 [MgmtSrvr] INFO     -- Nodeid 2 allocated for NDB at 127.0.0.1
2014-09-30 15:28:08 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
...
2014-09-30 15:28:10 [MgmtSrvr] INFO     -- Node 2: Start phase 0 completed 
...
2014-09-30 15:28:10 [MgmtSrvr] INFO     -- Node 2: Start phase 1 completed 
...
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 2: Start phase 2 completed (node restart)
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 3: DICT : DICT_LOCK_REQ lock 5 Queued      <<-- 4) Node 2 need DICT lock, and is queued
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 3: DICT: lock request by node queued 2 for NodeRestart
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 3: - sender: 0xfa0003 data: 1 X granted extra: 3
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 3: - sender: 0xf60002 data: 0 S  extra: 1
2014-09-30 15:28:11 [MgmtSrvr] INFO     -- Node 2: Receive arbitrator node 1 [ticket=69630001d116ca44]
...
< Nothing happens for a long time, then testDict client times out>
...
2014-09-30 15:33:44 [MgmtSrvr] ALERT    -- Node 4: Node 6 Disconnected
2014-09-30 15:33:44 [MgmtSrvr] INFO     -- Node 4: Communication to Node 6 closed
2014-09-30 15:33:44 [MgmtSrvr] INFO     -- Node 2: Communication to Node 6 closed  <<-- 5) Dict lock is release when client terminate
2014-09-30 15:33:44 [MgmtSrvr] INFO     -- Node 2: Start phase 3 completed (node restart)
2014-09-30 15:33:44 [MgmtSrvr] INFO     -- Node 2: Start phase 4 completed (node restart)
...
< Node 2 then starts normally ...>

So what seems to happen is (Anotated in logs above):

1) testDict restart Master Node 2 while the test case y1 tries to start a schema transaction.
2) Node 3 is elected as new master. (Node 2 still down)

3) The testclient retries 'start schema transaction' against new master (Node 3),
   this time with success, and thus gets the DICT lock.
4) Node 2 restarts from phase 0, and reach phase 3 where it needs the DICT lock.
   As the lock is still held by test client, the lock request is queued.

... We are the stalled until:

5) Test client is terminated, and resources held by it is cleaned up.
   As DICT lock is released, node 2 restart is allowed to continue.

However, the test has failed as it did not observe that the node finally restarted.

How to repeat:
Run 'testDict -n SchemaTrans'
[30 Sep 2014 14:36] Ole John Aske
Posted by developer:
 
This DICT lock holdup is caused by a race condition in the ::dictSignal() retry logic,
and how early failures of TRANS_BEGIN_REQ are handled:

Completion of TRANS_BEGIN_REQ is normally signaled by a TRANS_BEGIN_REF
or a TRANS_BEGIN_CONF. However, if the master node of a schema 
transaction fails (NODE_FAILURE), API clients are informed about this
by the new master sending a TRANS_END_REPort. As the new master can't
possibly know whether the failed master ever sent a CONF, there 
might arrive TRANS_END_REP signal without the API ever seeing the
'CONF' of the starting transaction

This has caused a race in the ::dictSignal() retry logic where it
repeats a TRANS_BEGIN_REQ. This signal is repeated when certain
ignorable errors are seen, like 'not a master', 'busy' and
'node failure' ++... Until the operation is either REFed, CONFed
or transaction reported failed by TRANS_END_REP.

However, as TRANS_END_REP is an asynch signal sent by the new
master, this signal may arrive after a new TRANS_BEGIN_REQ is sent.
The END_REP is then interpreted as a failure of the REQ, while
it actually may succeed and return a CONF later. 

As the dict API has already completed polling, and incorrectly
reported a failure, the later arriving CONF is never seen.
We then hold a dict lock which we are not aware of, and
this will not be released until the client terminates.
[4 Nov 2014 18:05] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

Fixed in MySQL Cluster NDB 7.1.34, 7.2.19, 7.3.8. Documented fix in these changelogs as follows:

        When a client retried against a new master a schema transaction
        that failed previously against the previous master while the
        latter was restarting, the lock obtained by this transaction on
        the new master prevented the previous master from progressing
        past start phase 3 until the client was terminated, and
        resources held by it were cleaned up.

Closed.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html