MySQL Bugs: #98779: DBACC (Line: 3886) 0x00000002 Check opPtrP->m_key_or_scan_info.m

Bug #98779	DBACC (Line: 3886) 0x00000002 Check opPtrP->m_key_or_scan_info.m_scanOpDeleteCou
Submitted:	28 Feb 2020 15:07	Modified:	17 Mar 2020 17:30
Reporter:	Daniel Hope	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	ndb-7.6.9	OS:	Ubuntu
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Everything is running fine, then 

2020-02-28 14:49:54 [ndbd] INFO     -- /export/home/pb2/build/sb_0-32108591-1546544912.28/release/mysql-cluster-gpl-7.6.9/storage/ndb/src/kernel/blocks/dbacc/DbaccMain.cpp
2020-02-28 14:49:54 [ndbd] INFO     -- DBACC (Line: 3886) 0x00000002 Check opPtrP->m_key_or_scan_info.m_scanOpDeleteCountOpRef != 0 failed
2020-02-28 14:49:54 [ndbd] INFO     -- Error handler shutting down system
2020-02-28 14:49:54 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
No idea, it appears "random"

Just to be clear this crashes the cluster and requires restarting to get it back online, its only a 2 node cluster, tonight I will add a third to see if it helps prevent the error.

Hi,

I am not able to reproduce this. You cannot add "one more node" you need to add "two more" as I assume you are running with noofreplica=2. Any other number is "unsupported" (or "beta" or "do not use in production"). 

I cannot reproduce this. You can upload the full ndb_error_reporter log, we might extract some additional info from there but so far this looks like improperly sized cluster but without being able to reproduce I can't say more.

Kind regards
Bogdan

Hey 

So I didn't add a node in the end anyway, thanks for clearing up that I can't!

Could you just clear up "looks like improperly sized cluster" for me ? 

I have attached the ndb_error_reporter output for completeness

Hi,

data nodes run in groups so you need always N * noofreplica number of data nodes. Assuming you use default noofreplica=2 you need to have even number of data nodes, so adding two nodes at a time.

Improperly sized means that your configuration (hardware & config) does not cover your load. Properly configured the same hardware might be able to handle the load. This is something a MySQL Support team can help you with. Properly configuring MySQL Cluster is not simple task. 

On the other hand it can be a bug but I need a way to reproduce it.

all best
Bogdan

Documented fix as follows in the NDB 7.5.17, 7.6.13, and 8.0.19 changelogs:

    A transaction which inserts a child row may run concurrently
    with a transaction which deletes the parent row for that child.
    One of the transactions should be aborted in this case, lest an
    orphaned child row result.

    Before committing an insert on a child row, a read of the parent
    row is triggered to confirm that the parent exists. Similarly,
    before committing a delete on a parent row, a read or scan is
    performed to confirm that no child rows exist. When insert and
    delete transactions were run concurrently, their prepare and
    commit operations could interact in such a way that both
    transactions committed. This occurred because the triggered
    reads were performed using CommittedRead locks (see
    NdbOperation::LockMode), which are not strong enough to prevent
    such error scenarios.

    This problem is fixed by using the stronger SimpleRead lock mode
    for both triggered reads. The use of SimpleRead locks ensures
    that at least one transaction aborts in every possible scenario
    involving concurrent child-insertion and parent-deletion
    transactions.

So upgrading the cluster will prevent the error ?

Yes.