Bug #92955 NDB cluster crashed during DROP DATABASE operation
Submitted: 26 Oct 2018 7:42 Modified: 6 Nov 2018 14:12
Reporter: Eduardo Ortega Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.6.7 OS:CentOS (7.5.1804)
Assigned to: CPU Architecture:x86 (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz)
Tags: crash, ndb

[26 Oct 2018 7:42] Eduardo Ortega
Description:
We have a 48 data nodes cluster, with a few SQL nodes. One of the SQL nodes replicates from an upstream InnoDB-based chain, into the NDB cluster. We were in the process of dropping the database, when the cluster segfaulted (extract of the logs below, full error report attached):
...
2018-10-26 09:01:04 [ndbd] INFO     -- DROP_TAB_REQ: tab: 357, tabLcpStatus: 3
2018-10-26 09:01:04 [ndbd] INFO     -- DROP_TAB_REQ: tab: 356, tabLcpStatus set to 3
2018-10-26 09:01:04 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 165 (ms)
2018-10-26 09:01:04 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 166 (ms)
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
2018-10-26 09:01:04 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 200 (ms)
2018-10-26 09:01:05 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 172 (ms)
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
2018-10-26 09:01:05 [ndbd] INFO     -- Received signal 11. Running error handler.
2018-10-26 09:01:05 [ndbd] WARNING  -- Ndb kernel thread 11 is stuck in: JobHandling in block: 247, gsn: 353 elapsed=100
2018-10-26 09:01:05 [ndbd] INFO     -- Watchdog: User time: 71835849  System time: 18438457
...

At the time of the crash, the SQL replication thread on the SQL node was stopped.

How to repeat:
Cluster is still restarting, so we have been unable to test whether the behavior is repeatable. That being said, this feels like a bad enough bug to be reported right away.
[26 Oct 2018 12:55] Eduardo Ortega
The issue has been reproduced, but only when dropping this particular database. We created others and then dropped them, and that worked without issue.
[29 Oct 2018 14:00] Eduardo Ortega
This is how the back trace looks from gdb at the time of crash:

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4f937fa700 (LWP 1045)]
0x00000000005e58f6 in seize (ptr=<synthetic pointer>, this=0x15878f0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/ArrayPool.hpp:755
755     /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/ArrayPool.hpp: No such file or directory.
(gdb) bt
#0  0x00000000005e58f6 in seize (ptr=<synthetic pointer>, this=0x15878f0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/ArrayPool.hpp:755
#1  Dbtup::scanProcedure (this=this@entry=0x1579fc0, signal=signal@entry=0x7f4f937ed1c0, regOperPtr=regOperPtr@entry=0x7f4f6065c190, handle=handle@entry=0x7f4f937ec7a0, 
    isCopy=isCopy@entry=false)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/blocks/dbtup/DbtupStoredProcDef.cpp:151
#2  0x00000000005e5cf9 in Dbtup::execSTORED_PROCREQ (this=0x1579fc0, signal=signal@entry=0x7f4f937ed1c0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/blocks/dbtup/DbtupStoredProcDef.cpp:73
#3  0x000000000054d2dc in Dblqh::accScanConfScanLab (this=0x13d99c0, signal=0x7f4f937ed1c0, tcConnectptr=...)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp:12639
#4  0x000000000054e1d9 in Dblqh::execSCAN_FRAGREQ (this=0x13d99c0, signal=0x7f4f937ed1c0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp:12339
#5  0x000000000077a5f8 in executeFunction (f=<optimized out>, signal=0x7f4f937ed1c0, gsn=<optimized out>, this=0x13d99c0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/SimulatedBlock.hpp:1585
#6  executeFunction_async (signal=0x7f4f937ed1c0, gsn=<optimized out>, this=0x13d99c0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/SimulatedBlock.hpp:1552
#7  execute_signals (max_signals=262080, sig=0x7f4f937ed1c0, r=0xa75c, h=0x163c, q=0x153c, selfptr=0x14bc)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/mt.cpp:5593
#8  run_job_buffers (selfptr=selfptr@entry=0x7f6554115b00, sig=sig@entry=0x7f4f937ed1c0, send_sum=@0x7f4f937ecb80: 1, flush_sum=@0x7f4f937ecb90: 0, 
    pending_send=@0x7f4f937ecb70: false)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/mt.cpp:5641
#9  0x000000000077cb0e in mt_job_thread_main (thr_arg=<optimized out>)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/vm/mt.cpp:6613
#10 0x000000000070fd6f in ndb_thread_wrapper (_ss=0xe8d0f0)
    at /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/common/portlib/NdbThread.c:258
#11 0x00007f6555e60e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f6554f5ebad in clone () from /lib64/libc.so.6
(gdb) info locals
ff = 0
(gdb)

We are generating a core dump file from it.
[6 Nov 2018 14:12] Jon Stephens
Documented fix as follows in the NDB 7.6.9 and 8.0.15 changelogs:

    A DROP DATABASE operation involving very large tables could lead
    to an unplanned shutdown of the cluster.

Closed.