Bug #102854 ndbd was killed because of Out of memory
Submitted: 8 Mar 2021 9:26 Modified: 14 Mar 2021 19:52
Reporter: Xianglei Zhu Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.7.22-ndb-7.5.10-cluster-gpl OS:Linux (7)
Assigned to: MySQL Verification Team CPU Architecture:x86

[8 Mar 2021 9:26] Xianglei Zhu
Description:
cluster log
-----------
2021-03-06 07:21:21 [MgmtSrvr] INFO     -- Node 11: LDM(0): Completed LCP, #frags = 74 #records = 4114, #bytes = 7017992
2021-03-06 07:21:21 [MgmtSrvr] INFO     -- Node 11: Local checkpoint 2689 completed
2021-03-06 14:01:02 [MgmtSrvr] ALERT    -- Node 2: Node 11 Disconnected
2021-03-06 14:01:02 [MgmtSrvr] ALERT    -- Node 11: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 9.
2021-03-06 14:57:27 [MgmtSrvr] ALERT    -- Node 2: Node 1 Disconnected
2021-03-06 14:57:30 [MgmtSrvr] WARNING  -- Failed to allocate nodeid for API at 172.22.99.192. Returned error: 'No free node id found for mysqld(API).'

ndb log:
--------
2021-03-06 02:00:04 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 156 (ms)
2021-03-06 14:01:02 [ndbd] INFO     -- Child process terminated by signal 9
2021-03-06 14:01:02 [ndbd] ALERT    -- Node 11: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 9.

message:
-------
Mar  6 14:01:02 HXVM-DNS-02 kernel: vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar  6 14:01:02 HXVM-DNS-02 kernel: vmtoolsd cpuset=/ mems_allowed=0
....omitted
Mar  6 14:01:02 HXVM-DNS-02 kernel: Out of memory: Kill process 110009 (ndbd) score 386 or sacrifice child
Mar  6 14:01:02 HXVM-DNS-02 kernel: Killed process 110009 (ndbd) total-vm:6700076kB, anon-rss:6477888kB, file-rss:11364kB, shmem-rss:0kB

How to repeat:
unknow
[8 Mar 2021 9:30] Xianglei Zhu
message of ndb crashed

Attachment: messages (application/octet-stream, text), 11.47 KiB.

[10 Mar 2021 17:09] MySQL Verification Team
Hi,

This is not enough data to do anything about it. Your node tried to allocate more ram than what you have available and Linux kernel killed it.

What is your config? What nodes are running on this host? What other apps run on host? How much RAM is there? 

Linux kernel will, when out of RAM, kill the task that uses most RAM, it is possible that you had 10 other processes eating RAM and Linux OOM'ed the ndb node without it being it's fault. It is possible your configuration is wrong. None of that I can say from the information provided. 

There are no known memory issues with ndbcluster so this is most probably poor sizing and improper configuration. For this, our Support team can help you properly configure the system. If you do still think this is a bug please provide more data.

kind regards
[11 Mar 2021 10:53] Xianglei Zhu
config.ini

Attachment: config.ini (application/octet-stream, text), 2.39 KiB.

[11 Mar 2021 10:54] Xianglei Zhu
my.cnf

Attachment: my.cnf (application/octet-stream, text), 1016 bytes.

[11 Mar 2021 11:10] Xianglei Zhu
thanks for your reponse,
Server RAM:12G,2*CPU(Xeon(R) Gold 6140),this mysql cluster database less than 200MB,only for DNS,cluster params refer to config.ini&my.cnf.
[12 Mar 2021 2:39] Xianglei Zhu
I find some additional&dubious evidences for this problem:
mysql> show create table history;
+---------+-----------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                               |
+---------+-----------------------------------------------------------------+
| history | CREATE TABLE `history` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `msg` varchar(256) DEFAULT NULL,
  `detail` longtext,
  `created_by` varchar(128) DEFAULT NULL,
  `created_on` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ndbcluster AUTO_INCREMENT=3073 DEFAULT CHARSET=utf8 |
+---------+-----------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from history;
+----------+
| count(*) |
+----------+
|      167 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT detail FROM history ORDER BY created_on DESC LIMIT 2;
ERROR 1296 (HY000): Got error 4350 'Transaction already aborted' from NDBCLUSTER
mysql> SELECT id FROM history ORDER BY created_on DESC LIMIT 2;
+------+
| id   |
+------+
| 3093 |
| 2626 |
+------+
2 rows in set (0.00 sec)

mysql> SELECT detail FROM history WHERE id=3093;
Empty set (0.00 sec)

mysql> SELECT length(detail) FROM history WHERE id=3093;
Empty set (0.00 sec)

mysql> SELECT id,msg,created_by,created_on FROM history WHERE id=3093;
+------+-------------------------------------------+------------+---------------------+
| id   | msg                                       | created_by | created_on          |
+------+-------------------------------------------+------------+---------------------+
| 3093 | Apply record changes to domain imipay.com | hanjinwei  | 2019-03-28 00:51:09 |
+------+-------------------------------------------+------------+---------------------+
1 row in set (0.00 sec)
mysql> SELECT detail FROM history WHERE id in(2626,3093) ORDER BY created_on DESC;
ERROR 1296 (HY000): Got error 4350 'Transaction already aborted' from NDBCLUSTER
mysql> SELECT detail FROM history WHERE id in(2626,3093) ORDER BY created_on;--it is ok
[12 Mar 2021 2:44] Xianglei Zhu
supplement:
mysql> SELECT length(detail) FROM history WHERE id=2626;
+----------------+
| length(detail) |
+----------------+
|          36677 |
+----------------+
1 row in set (0.00 sec)
[14 Mar 2021 19:52] MySQL Verification Team
Hi,

I cannot reproduce any of the issues you shown, but you are using a version that is too old. You have to upgrade to 7.5.21 at least. There is a number of bugs between 7.5.10 and 7.5.21, we are talking bout 3 years of development and bugfixes here. If you can reproduce this with 7.5.21 than it would make sense to check what is going on but whatever it is that's affecting you with 7.5.10 we can't do anything to fix it if you don't upgrade.

kind regards