Description:
2025-02-28 07:14:12 [ndbd] INFO -- Transporter 2 to node 3 disconnected in state: 0
2025-02-28 07:14:12 [ndbd] INFO -- findNeighbours from: 5587 old (left: 3 right: 3) new (65535 65535)
2025-02-28 07:14:12 [ndbd] ALERT -- Network partitioning - arbitration required
2025-02-28 07:14:12 [ndbd] INFO -- President restarts arbitration thread [state=7]
2025-02-28 07:14:12 [ndbd] ALERT -- Arbitration won - positive reply from node 1
2025-02-28 07:14:12 [ndbd] INFO -- NR Status: node=3,OLD=Initial state,NEW=Node failed, fail handling ongoing
2025-02-28 07:14:12 [ndbd] INFO -- Master takeover started from 3
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Started failure handling for node 3
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Starting take over of node 3
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 3 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2025-02-28 07:14:12 [ndbd] INFO -- start_resend(1,
2025-02-28 07:14:12 [ndbd] INFO -- empty bucket (7189111/13 7189111/12) -> active
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 3 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: GCP completion 7189111/13 waiting for node failure handling (1) to complete. Seizing record for GCP.
2025-02-28 07:14:12 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart ongoing
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 3 waiting for NF_TAKEOVER.
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Completed take over of failed node 3
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 3 complete.
2025-02-28 07:14:12 [ndbd] INFO -- DBTC 0: Completing GCP 7189111/13 on node failure takeover completion.
2025-02-28 07:14:12 [ndbd] INFO -- Started arbitrator node 1 [ticket=9bf65594a472dc87]
2025-02-28 07:14:13 [ndbd] INFO -- NR Status: node=3,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2025-02-28 07:14:13 [ndbd] INFO -- Node 3 has completed node fail handling
2025-02-28 07:14:25 [ndbd] INFO -- Adjusting disk write speed bounds due to : Node restart finished
For help with below stacktrace consult:
https://dev.mysql.com/doc/refman/en/using-stack-trace.html
Also note that stack_bottom and thread_stack will always show up as zero.
2025-02-28 07:14:42 [ndbd] INFO -- Received signal 8. Running error handler.
Base address/slide: 0x56248e7c2000
With use of addr2line, llvm-symbolizer, or, atos, subtract the addresses in
stacktrace with the base address before passing them to tool.
For tools that have options for slide use that, e.g.:
llvm-symbolizer --adjust-vma=0x56248e7c2000 ...
atos -s 0x56248e7c2000 ...
stack_bottom = 0 thread_stack 0x0
#0 0x7ff02aabc51f <unknown>
#1 0x56248ea9d02b _ZN5Dbspj13scanFrag_sendEP6Signal3PtrINS_7RequestEES2_INS_8TreeNodeEE
#2 0x56248ea91d81 _ZN5Dbspj16execSCAN_NEXTREQEP6Signal
#3 0x56248ed94b8f <unknown>
#4 0x56248ed059d4 _ZN13FastScheduler5doJobEj
#5 0x56248ed1f036 _ZN12ThreadConfig13ipControlLoopEP9NdbThread
#6 0x56248e8f360b _Z8ndbd_runbiPKciS0_bbbjiimS0_i
#7 0x56248e8f3dd4 _Z9real_mainiPPc
#8 0x56248e8f5477 _Z9angel_runPKcRK6VectorI10BaseStringES0_iS0_bbbiiS0_i
#9 0x56248e8f4159 _Z9real_mainiPPc
#10 0x56248e8e0881 main
#11 0x7ff02aaa3d8f <unknown>
#12 0x7ff02aaa3e3f __libc_start_main
#13 0x56248e8e9d24 _start
#14 0xffffffffffffffff <unknown>
2025-02-28 07:14:42 [ndbd] INFO -- Signal 8 received; Floating point exception
2025-02-28 07:14:42 [ndbd] INFO -- ./storage/ndb/src/kernel/ndbd.cpp
2025-02-28 07:14:42 [ndbd] INFO -- Error handler signal shutting down system
2025-02-28 07:14:42 [ndbd] INFO -- Error handler shutdown completed - exiting
2025-02-28 07:14:43 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 8. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error mess
age, please report a bug). Temporary error, restart node'.
How to repeat:
this happened on our production node that runs a number databases, it happened after we moved some additional databases.
we have tried to recreate on our acceptance system but so far no luck.
we plan to start our migration again on production again soon, but more slowly.
Anything we can provide to ensure this is not going to be an issue going for please let us know
Suggested fix:
in order to get it back up and running all we did was revert the services using the database back to there the old setups, all the data that was migrated stayed