Bug #54603 no storage nodes connected (timed out)
Submitted: 18 Jun 2010 3:57 Modified: 21 Jul 2010 11:37
Reporter: lee zhenhua Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Other (centos 5.2 x86_64)
Assigned to: CPU Architecture:Any
Tags: mysql version 5.1.35, ndb-7.0.7, no storage nodes connected (timed out)

[18 Jun 2010 3:57] lee zhenhua
Description:
hi ,mysql cluster i have runing 7days,and i restart mgm node .then the api node con't connect the storage.I find a lot of information in Internet,but no result.

this is mgm node port status
[root@MGMM logs]# netstat -an |grep 1186
tcp        0      0 0.0.0.0:1186                0.0.0.0:*                   LISTEN      
tcp        0      0 192.168.1.159:1186          192.168.1.145:33803         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.145:33804         ESTABLISHED 
tcp      100      0 192.168.1.159:1186          192.168.1.151:54590         ESTABLISHED 
tcp        0      0 127.0.0.1:59735             127.0.0.1:1186              ESTABLISHED 
tcp        0      0 127.0.0.1:1186              127.0.0.1:59735             ESTABLISHED 
tcp        0      0 192.168.1.159:55964         192.168.1.159:1186          ESTABLISHED 
tcp        0      0 192.168.1.159:55965         192.168.1.159:1186          ESTABLISHED 
tcp        0      0 192.168.1.159:55963         192.168.1.159:1186          ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.153:44929         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.159:55964         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.159:55965         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.159:55963         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.150:34933         ESTABLISHED 
tcp        0      0 192.168.1.159:1186          192.168.1.152:57210         ESTABLISHED 

next is mgm log

2010-06-18 11:13:26 [MgmSrvr] INFO     -- Got initial configuration from '/mysql/mysql-cluster/conf/config.ini', will try to set it 
when all ndb_mgmd(s) started
2010-06-18 11:13:26 [MgmSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.159, m_reserved_nodes 00000000000000
00000000000000000000000000000000000000000000000002.
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Node 1: Node 1 Connected
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Id: 1, Command port: *:1186
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Starting initial configuration change
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Configuration 1 commited
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Config change completed! New generation: 1
[root@MGMM logs]# tail -f ndb_1_cluster.log 
2010-06-18 11:13:26 [MgmSrvr] INFO     -- Got initial configuration from '/mysql/mysql-cluster/conf/config.ini', will try to set it when all ndb_mgmd(s) started
2010-06-18 11:13:26 [MgmSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.159, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Node 1: Node 1 Connected
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Id: 1, Command port: *:1186
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Starting initial configuration change
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Configuration 1 commited
2010-06-18 11:13:27 [MgmSrvr] INFO     -- Config change completed! New generation: 1
2010-06-18 11:14:32 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 reserved for ip 192.168.1.150, m_reserved_nodes 000000000000000000000000000000000000000000000000000000000000000a.
2010-06-18 11:14:32 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2010-06-18 11:14:33 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2010-06-18 11:14:39 [MgmSrvr] ALERT    -- Node 1: Node 3 Disconnected
2010-06-18 11:14:39 [MgmSrvr] ALERT    -- Node 1: Node 3 Disconnected
2010-06-18 11:14:41 [MgmSrvr] INFO     -- Mgmt server state: nodeid 4 reserved for ip 192.168.1.151, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2010-06-18 11:14:42 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2010-06-18 11:14:42 [MgmSrvr] INFO     -- Mgmt server state: nodeid 4 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2010-06-18 11:14:42 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2010-06-18 11:14:43 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 4, 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 no-wait:  ]
2010-06-18 11:14:46 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 4, 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 no-wait:  ]
2010-06-18 11:14:48 [MgmSrvr] ALERT    -- Node 1: Node 4 Disconnected
2010-06-18 11:14:49 [MgmSrvr] ALERT    -- Node 1: Node 4 Disconnected
2010-06-18 11:14:49 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 4, 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 no-wait:  ]
2010-06-18 11:14:51 [MgmSrvr] INFO     -- Node 3: Node 4 Connected
2010-06-18 11:14:51 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:14:52 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2010-06-18 11:14:52 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:14:55 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:14:57 [MgmSrvr] INFO     -- Mgmt server state: nodeid 5 reserved 

How to repeat:
for ip 192.168.1.152, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000022.
2010-06-18 11:14:57 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2010-06-18 11:14:58 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:14:59 [MgmSrvr] INFO     -- Mgmt server state: nodeid 5 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2010-06-18 11:15:00 [MgmSrvr] INFO     -- Mgmt server state: nodeid 6 reserved for ip 192.168.1.153, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000042.
2010-06-18 11:15:01 [MgmSrvr] INFO     -- Node 1: Node 6 Connected
2010-06-18 11:15:01 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:15:02 [MgmSrvr] INFO     -- Mgmt server state: nodeid 6 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2010-06-18 11:15:04 [MgmSrvr] ALERT    -- Node 1: Node 5 Disconnected
2010-06-18 11:15:04 [MgmSrvr] ALERT    -- Node 1: Node 5 Disconnected
2010-06-18 11:15:04 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 5 and 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3 and 4 no-wait:  ]
2010-06-18 11:15:06 [MgmSrvr] INFO     -- Node 3: Node 5 Connected
2010-06-18 11:15:06 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3, 4 and 5 no-wait:  ]
2010-06-18 11:15:06 [MgmSrvr] INFO     -- Node 4: Node 5 Connected
2010-06-18 11:15:07 [MgmSrvr] ALERT    -- Node 1: Node 6 Disconnected
2010-06-18 11:15:07 [MgmSrvr] INFO     -- Node 3: Initial start, waiting for 6 to connect,  nodes [ all: 3, 4, 5 and 6 connected: 3, 4 and 5 no-wait:  ]
2010-06-18 11:15:07 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2010-06-18 11:15:07 [MgmSrvr] ALERT    -- Node 1: Node 6 Disconnected
2010-06-18 11:15:09 [MgmSrvr] INFO     -- Node 4: Node 6 Connected
2010-06-18 11:15:09 [MgmSrvr] INFO     -- Node 3: Node 6 Connected
2010-06-18 11:15:09 [MgmSrvr] INFO     -- Node 5: Node 6 Connected
2010-06-18 11:15:09 [MgmSrvr] INFO     -- Node 3: Initial start with nodes 3, 4, 5 and 6 [ missing:  no-wait:  ]
2010-06-18 11:15:09 [MgmSrvr] INFO     -- Node 3: CM_REGCONF president = 3, own Node = 3, our dynamic id = 1
2010-06-18 11:15:10 [MgmSrvr] INFO     -- Node 1: Node 6 Connected
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 3: Node 4: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: CM_REGCONF president = 3, own Node = 4, our dynamic id = 2
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: Node 3: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: Start phase 1 completed 
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 3: Node 5: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: Node 5: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: CM_REGCONF president = 3, own Node = 5, our dynamic id = 3
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: Node 3: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: Node 4: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: Start phase 1 completed 
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 3: Node 6: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 3: Start phase 1 completed 
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: Node 6: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: Node 6: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: CM_REGCONF president = 3, own Node = 6, our dynamic id = 4
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: Node 3: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: Node 4: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: Node 5: API mysql-5.1.35 ndb-7.0.7
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: Start phase 1 completed 
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 4: Start phase 2 completed (initial start)
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 3: Start phase 2 completed (initial start)
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 5: Start phase 2 completed (initial start)
2010-06-18 11:15:11 [MgmSrvr] INFO     -- Node 6: Start phase 2 completed (initial start)
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 3: Start phase 3 completed (initial start)
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 3: Local redo log file initialization status:
#Total files: 4000, Completed: 0
#Total MBytes: 1024000, Completed: 0
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 4: Start phase 3 completed (initial start)
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 4: Local redo log file initialization status:
#Total files: 4000, Completed: 0
#Total MBytes: 1024000, Completed: 0
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 5: Start phase 3 completed (initial start)
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 5: Local redo log file initialization status:
#Total files: 4000, Completed: 0
#Total MBytes: 1024000, Completed: 0
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 6: Start phase 3 completed (initial start)
2010-06-18 11:15:12 [MgmSrvr] INFO     -- Node 6: Local redo log file initialization status:
#Total files: 4000, Completed: 0
#Total MBytes: 1024000, Completed: 0
2010-06-18 11:18:48 [MgmSrvr] INFO     -- Mgmt server state: nodeid 11 reserved for ip 192.168.1.145, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000802.
2010-06-18 11:18:48 [MgmSrvr] INFO     -- Node 11: mysqld --server-id=0

next is data node log

WOPool::init(61, 9)
RWPool::init(22, 13)
WARNING: timerHandlingLab now: 665245465 sent: 665245185 diff: 280
RWPool::init(42, 18)
RWPool::init(62, 13)
Using 1 fragments per node
RWPool::init(c2, 18)
RWPool::init(e2, 16)
WOPool::init(41, 8)
RWPool::init(82, 12)
RWPool::init(a2, 52)
WOPool::init(21, 10)
Time moved forward with 8085 milliseconds
WARNING: timerHandlingLab now: 665253836 sent: 665245761 diff: 8075
2010-06-18 11:15:10 [ndbd] INFO     -- findNeighbours from: 1954 old (left: 65535 right: 65535) new (4 4)
2010-06-18 11:15:10 [ndbd] INFO     -- findNeighbours from: 1954 old (left: 4 right: 4) new (5 4)
2010-06-18 11:15:11 [ndbd] INFO     -- findNeighbours from: 1954 old (left: 5 right: 4) new (6 4)

next is api node log
[root@SQL1 var]# tail -100 SQL1.err 
100618 11:18:47 mysqld_safe Starting mysqld daemon with databases from /mysql/var
100618 11:18:47  InnoDB: Started; log sequence number 0 46429
100618 11:18:47 [Note] NDB: NodeID is 11, management server '192.168.1.159:1186'
100618 11:18:48 [Note] NDB[0]: NodeID: 11, no storage nodes connected (timed out)
100618 11:18:48 [Warning] NDB: server id set to zero will cause any other mysqld with bin log to log with wrong server id
100618 11:18:48 [Note] Starting Cluster Binlog Thread
100618 11:18:48 [Note] Event Scheduler: Loaded 0 events
100618 11:18:48 [Note] /mysql/libexec/mysqld: ready for connections.
Version: '5.1.35-ndb-7.0.7'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution

next is api node log
[root@SQL1 var]# tail -100 SQL1.err 
100618 11:18:47 mysqld_safe Starting mysqld daemon with databases from /mysql/var
100618 11:18:47  InnoDB: Started; log sequence number 0 46429
100618 11:18:47 [Note] NDB: NodeID is 11, management server '192.168.1.159:1186'
100618 11:18:48 [Note] NDB[0]: NodeID: 11, no storage nodes connected (timed out)
100618 11:18:48 [Warning] NDB: server id set to zero will cause any other mysqld with bin log to log with wrong server id
100618 11:18:48 [Note] Starting Cluster Binlog Thread
100618 11:18:48 [Note] Event Scheduler: Loaded 0 events
100618 11:18:48 [Note] /mysql/libexec/mysqld: ready for connections.
Version: '5.1.35-ndb-7.0.7'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution

Suggested fix:
   in internet other people use mysqld_safe--user=mysql & start the mysqld service and disable the iptables 、sellinux .  i use these not result.
[21 Jun 2010 11:37] Hartmut Holzgraefe
According to the log snippets you've shown there was not only a management node restart but a full cluster restart, and the data nodes had not finished starting yet. So the mysql api nodes could not connect to the cluster as it was still starting, not up and running yet.

Can you confirm that this was indeed the case?
[21 Jul 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".