Description:
Setup: 1 management node on first EC2 and 1 data node + SQL node on second EC2. This is also reproducible using more data nodes but we're keeping simple to explain. This problem is only reproducible when useShm is set as true in config.ini. In other cases, its working
config.ini
[ndbd default]
# Options affecting ndbd processes on all data nodes:
# https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-params-ndbd.html
#DataMemory=384G
UseShm=true
NoOfReplicas=1
#LockPagesInMainMemory=1
AutomaticThreadConfig=1
#NumCPUs=32
[ndb_mgmd]
# Management process options:
hostname=10.90.252.99 # Hostname of the manager
datadir=/var/lib/mysql-cluster # Directory for the log files
[ndbd]
hostname=10.90.252.122 # Hostname/IP of the first data node
NodeId=2 # Node ID for this data node
datadir=/usr/local/mysql/data # Remote directory for the data files
[mysqld]
# SQL node options:
hostname=10.90.252.122
ubuntu@ip-10-90-252-99:/var/lib/mysql-cluster$ ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> SHOW
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 1 node(s)
id=2 (not connected, accepting connect from 10.90.252.122)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @10.90.252.99 (mysql-8.0.39 ndb-8.0.39)
[mysqld(API)] 1 node(s)
id=3 (not connected, accepting connect from 10.90.252.122)
We observe this in logs on data node
2024-08-01 07:11:53 [MgmtSrvr] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Detailed logs: https://docs.google.com/document/d/1HrNFt8ElbrzjqCBw5kzzKMHUTTXzUFMGKGuJ-gA2MWI/edit
How to repeat:
Set up management node in one machine and data node + SQL node in second machine with this config.ini, this is reproducible to us every time.
[ndbd default]
# Options affecting ndbd processes on all data nodes:
# https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-params-ndbd.html
#DataMemory=384G
UseShm=true
NoOfReplicas=1
#LockPagesInMainMemory=1
AutomaticThreadConfig=1
#NumCPUs=32
[ndb_mgmd]
# Management process options:
hostname=10.90.252.99 # Hostname of the manager
datadir=/var/lib/mysql-cluster # Directory for the log files
[ndbd]
hostname=10.90.252.122 # Hostname/IP of the first data node
NodeId=2 # Node ID for this data node
datadir=/usr/local/mysql/data # Remote directory for the data files
[mysqld]
# SQL node options:
hostname=10.90.252.122
Details steps that I follow are here: https://docs.google.com/document/d/1HrNFt8ElbrzjqCBw5kzzKMHUTTXzUFMGKGuJ-gA2MWI/edit