Bug #90709 DBTC (Line: 10345) Check i != end failed.
Submitted: 1 May 2018 7:48 Modified: 19 Jun 2018 16:20
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.6.4 OS:Any
Assigned to: CPU Architecture:Any

[1 May 2018 7:48] Daniël van Eeden
Description:
When restarting node 48 node 25 failed.

2018-05-01 09:37:21 [ndbd] INFO     -- findNeighbours from: 5168 old (left: 17 right: 30) new (17 30)
2018-05-01 09:37:21 [ndbd] INFO     -- NR Status: node=48,OLD=Initial state,NEW=Node failed, fail handling ongoing
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Started failure handling for node 48
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Step NF_BLOCK_HANDLE completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Inserting failed node 48 into takeover queue, length 1
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: GCP completion 481634/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Step NF_CHECK_TRANSACTION completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Step NF_CHECK_SCAN completed, failure handling for node 48 waiting for NF_TAKEOVER.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Started failure handling for node 48
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Step NF_BLOCK_HANDLE completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: GCP completion 481634/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Inserting failed node 48 into takeover queue, length 1
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Step NF_CHECK_TRANSACTION completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Step NF_CHECK_SCAN completed, failure handling for node 48 waiting for NF_TAKEOVER.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Started failure handling for node 48
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Step NF_BLOCK_HANDLE completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: GCP completion 481634/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Inserting failed node 48 into takeover queue, length 1
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Removed node 48 from takeover queue, 0 failed nodes remaining
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Step NF_TAKEOVER completed, failure handling for node 48 complete.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 2: Completing GCP 481634/1 on node failure takeover completion.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Removed node 48 from takeover queue, 0 failed nodes remaining
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Step NF_TAKEOVER completed, failure handling for node 48 complete.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 4: Completing GCP 481634/1 on node failure takeover completion.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Removed node 48 from takeover queue, 0 failed nodes remaining
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Step NF_TAKEOVER completed, failure handling for node 48 waiting for NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Completing GCP 481634/1 on node failure takeover completion.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Step NF_CHECK_TRANSACTION completed, failure handling for node 48 waiting for NF_CHECK_SCAN.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 1: Step NF_CHECK_SCAN completed, failure handling for node 48 complete.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 3: Started failure handling for node 48
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 3: Step NF_BLOCK_HANDLE completed, failure handling for node 48 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC 3: GCP completion 481634/1 waiting for node failure handling (1) to complete. Seizing record for GCP.
2018-05-01 09:37:21 [ndbd] INFO     -- /export/home2/pb2/build/sb_1-28202170-1523555460.9/rpm/BUILD/mysql-cluster-gpl-7.6.4/mysql-cluster-gpl-7.6.4/storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
2018-05-01 09:37:21 [ndbd] INFO     -- DBTC (Line: 10345) 0x00000002 Check i != end failed
2018-05-01 09:37:21 [ndbd] INFO     -- Error handler shutting down system
2018-05-01 09:37:21 [ndbd] INFO     -- Error handler shutdown completed - exiting
2018-05-01 09:37:36 [ndbd] ALERT    -- Node 25: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please
 report a bug). Temporary error, restart node'.

Time: Tuesday 1 May 2018 - 09:37:21
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 10345) 0x00000002 Check i != end failed
Program: ndbmtd
Pid: 4020 thr: 14
Version: mysql-5.7.20 ndb-7.6.4
Trace file name: ndb_25_trace.log.14_t14
Trace file path: /var/lib/mysql-cluster/ndb_25_trace.log.14 [t1..t17]
***EOM***

How to repeat:
See description
[1 May 2018 21:40] Mikael Ronström
The problem is that DBTC number 5 (where presumably the crash occurs) receives
TAKE_OVERTCCONF before it received TAKE_OVERTCREQ. Will investigate how this
can happen and take some action to handle this race condition.
[1 May 2018 22:04] Mikael Ronström
Looked a bit deeper and found that there exists code to ensure that this race cannot happen.
Most likely this code needs some adaption to multiple TCs.

So would be great to get an error report to see the signal sendings from all involved threads at the
time of the crash.
[5 May 2018 1:41] MySQL Verification Team
Hi Daniel,

I cannot reproduce this. 

Can you reproduce this on-will? Can you please upload the full ndb_error_log when you hit this issue again?

kind regards
Bogdan
[22 May 2018 12:59] Mikael Ronström
The bug is fixed and a patch is pushed to the 7.6 tree, will be part of 7.6.7.
[19 Jun 2018 16:20] Jon Stephens
Fixed in NDB 7.6.7 as part of the work done for WL#9638.

Closed.