Description:
In testing DN failure and recovery for cluster, Data Node ID3 had just completed recovery when Data Node ID2 failed. I had issued a sig 11 of ID3 and then restarted it.
Cluster log as follows:
2006-07-12 21:12:13 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed
2006-07-12 21:12:13 [MgmSrvr] ALERT -- Node 3: Node 5 Disconnected
2006-07-12 21:12:13 [MgmSrvr] INFO -- Node 3: Communication to Node 5 closed
2006-07-12 21:12:13 [MgmSrvr] INFO -- Node 3: Started arbitrator node 1 [ticket=1eff0005642750b0]
2006-07-12 21:12:14 [MgmSrvr] INFO -- Mgmt server state: nodeid 5 freed, m_reserved_nodes 0000000000000012.
2006-07-12 21:12:14 [MgmSrvr] INFO -- Mgmt server state: nodeid 4 freed, m_reserved_nodes 0000000000000002.
2006-07-12 21:12:15 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Error log for ID2:
Time: Wednesday 12 July 2006 - 21:12:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 13547) 0x0000000e
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 29686
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.12 (beta)
Trace File:
BLOCK ADDR ADDR ADDR ADDR ADDR ADDR ADDR ADDR
?009195 009206 009243 002971
DBTUX 060382
DBTUX 060285 060304 060306
DBTUX 060388 060434 010224 010224 010224 060781 060781 060781
060781 060781 060781
DBTUP 028109
DBTUX 040064 040577 040580 060796 060844 060852
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 459739878 gsn: 420 "DICT_UNLOCK_ORD" prio: 1
s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 20556091 length: 2 trace: 1 #sec: 0 fragInf: 0
H'00000000 H'00000001
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 459739877 gsn: 517 "DROP_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: -1 length: 3 trace: 0 #sec: 0 fragInf: 0
H'00036319 H'00fa0002 H'00000010
--------------- Signal ----------------
r.bn: 258 "DBTUX", r.proc: 2, r.sigId: 459739876 gsn: 332 "NEXT_SCANREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 459739870 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000010 H'ffffffff H'00000002
--------------- Signal ----------------
r.bn: 258 "DBTUX", r.proc: 2, r.sigId: 459739875 gsn: 332 "NEXT_SCANREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 459739869 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000027 H'ffffffff H'00000002
--------------- Signal ----------------
r.bn: 248 "DBACC", r.proc: 2, r.sigId: 459739874 gsn: 332 "NEXT_SCANREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 459739868 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000006 H'000075c5 H'00000003
--------------- Signal ----------------
r.bn: 248 "DBACC", r.proc: 2, r.sigId: 459739873 gsn: 72 "ACC_CHECK_SCAN" prio: 1
s.bn: 248 "DBACC", s.proc: 2, s.sigId: 459739867 length: 2 trace: 1 #sec: 0 fragInf: 0
H'0000000b H'00000000
How to repeat:
This happened during an attempt to repeat bug#21015
I have not been able to repeat it yet.
Testing
Master Cluster
3 Hosts
N07: NDBD
N08: NDBD, MySQLD, ndb_mgm
N09: MySQLD
Slave Cluster
3 Hosts
N10: NDBD
N11: NDBD, ndb_mgm
N12: MySQLD
Test running on master:
N08: cid_ndb_dd.pl
N09: DBT2
I had issues a killall -11 ndbd on N07. No MySQLD crash this time as I think that I had missed the delete call. So I restarted NDBD on N07 and allowed it to recover. Then at the end of the recovery, NDBD on N08 failed and the program running on this host reported:
Table TESTER.t1 Dropped.
Database TESTER Dropped.
Disconnected from MySQLD
************ End Loop **************
********* Start New Loop ***********
Connected to MySQLD
Database Created.
Creating TESTER.t1, please stand by....
DBD::mysql::st execute failed: Table 't1' already exists at cid_ndb_dd.pl line 460.
Create Table Error: Table 't1' already exists at cid_ndb_dd.pl line 460.
This seems like another bug.