Bug #9994 NDB randomly crashes on one node
Submitted: 19 Apr 2005 8:51 Modified: 16 Sep 2005 11:17
Reporter: Isabel Garcia Lorenzo Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1.11 OS:Linux (Red Hat Linux AS 3.0 (update 4))
Assigned to: Pekka Nousiainen CPU Architecture:Any

[19 Apr 2005 8:51] Isabel Garcia Lorenzo
Description:
After working for hours with intense load, the NDB process on one of our two storage node goes down.

Our config.ini is:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=1000M
IndexMemory=200M
MaxNoOfConcurrentOperations=300000
MaxNoOfConcurrentTransactions=10000
MaxNoOfAttributes=15000
MaxNoOfTables=1600 
ArbitrationTimeout=10000
MaxNoOfOrderedIndexes=1000
MaxNoOfUniqueHashIndexes=1000
LockPagesInMainMemory=TRUE
StartFailureTimeout=0  # Unlimited 
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Managment Server
[NDB_MGMD]
HostName=192.168.200.11         # the IP of THIS SERVER
DataDir= /data/db/mysql-cluster
LogDestination=SYSLOG:facility=local4
# Storage Engines
[NDBD]
HostName=192.168.200.9          # the IP of the FIRST SERVER
DataDir= /data/db/mysql-cluster
[NDBD]
HostName=192.168.200.10         # the IP of the SECOND SERVER
DataDir=/data/db/mysql-cluster
# 2 MySQL Clients
# I personally leave this blank to allow rapid changes of the mysql clients;
# you can enter the hostnames of the above two servers here. I suggest you dont.
[MYSQLD]
HostName=192.168.200.9
[MYSQLD]
HostName=192.168.200.10
[MYSQLD]
[MYSQLD]

The NDB log shows the fallowing:

Date/Time: Tuesday 19 April 2005 - 10:00:03
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: Dbdict.cpp
Object of reference: DBDICT (Line: 9356) 0x0000000a
ProgramName: /usr/libexec/ndbd
ProcessID: 32594
TraceFile: /data/db/mysql-cluster/ndb_3_trace.log.18

Begining and end of ndb_3_trace.log.18

JAM CONTENTS up->down left->right ?=not block entry
BLOCK   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR   
       ?005653 005720 002780 004866 005781 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 006147 002381 
DBLQH   005628 005653 005720 002780 004866 005781 
DBLQH   002483 002504 
DBLQH   005243 005268 005736 002821 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 
DBLQH   002483 002504 
DBLQH   005243 005268 005736 002821 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 
DBLQH   002483 002504 
DBLQH   005243 005268 005736 002821 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 002504 
DBLQH   005243 005268 005736 002824 004866 005781 
DBLQH   002483 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 006147 002381 
DBLQH   005628 005653 005720 002777 004866 005781 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 006147 002381 
DBLQH   005628 005653 005720 002780 004866 005781 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 006147 002381 
DBLQH   005628 005653 005720 002780 004866 005781 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 006147 002381 
DBLQH   005628 005653 005720 002780 004866 005781 002504 
DBLQH   005243 005264 005541 005612 
DBTUP   005276 005291 005300 007432 007435 005206 005186 005247 
DBACC   002344 005741 005699 005709 005718 002381 
DBLQH   005628 005653 005720 002780 004866 005781 
DBLQH   002483 002504 
DBLQH   005243 005268 005736 002821 004866 005781 
DBLQH   002583 002587 002590 002594 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000674 000929 
DBTC    005968 006002 006026 
DBDICT  000331 000335 004394 
NDBFS   000915 000728 000934 
DBDIH   001074 
DBDICT  004302 
DBTC    000315 
DBDICT  004518 004409 
DBDICT  003698 003721 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  003698 003759 
DBLQH   002483 002538 
DBLQH   002562 
DBUTIL  002366 
DBDICT  031730 
DBDICT  003823 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000934 
DBDIH   000294 000484 007129 007131 007145 007147 
DBACC   000108 000115 
DBDICT  005668 005714 
DBTC    005968 006002 006026 
DBDICT  000171 000174 000280 000280 000280 000280 000280 000280 
        000280 000280 000280 000280 000280 000225 005790 
DBDICT  030858 030881 
DBDICT  030858 030881 030887 030889 030898 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  005668 005714 
DBDICT  000171 000174 000280 000280 000280 000280 000280 000280 
        000280 000280 000280 000280 000280 000225 005790 
DBDICT  030858 030881 
DBDICT  030858 030881 030887 030889 030898 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDIH   000294 000391 009110 009110 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000934 
DBDIH   000294 000464 007209 
DBTC    008295 
DBTC    005968 006002 006026 
DBDIH   009131 009135 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006043 006049 006055 
DBDICT  006043 006049 006249 
DBDICT  006174 006192 006214 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDIH   009131 009152 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006174 006192 006232 006256 006326 006333 006376 006382 
        006418 
DBDICT  002688 004587 004596 004599 001427 001430 004665 001504 
        001507 
DBDIH   006158 006176 006300 006308 006308 006311 006300 006308 
        006308 006311 
DBDICT  003610 000280 000280 
DBDICT  003854 003893 000759 000777 000840 
NDBFS   000152 000536 000541 000144 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000661 000929 
DBTC    005968 006002 006026 
DBDICT  000372 000380 
NDBFS   000397 000401 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000686 000929 
DBTC    005968 006002 006026 
DBDICT  000497 000504 
NDBFS   000203 000227 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBACC   000108 000149 012108 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000674 000929 
DBTC    005968 006002 006026 
DBDICT  000331 000335 000909 000840 
NDBFS   000152 000536 000541 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  005668 005714 
DBDICT  000171 000174 000280 000280 000280 000280 000280 000280 
        000280 000280 000280 000280 000280 000225 005790 
DBDICT  030858 030881 
DBDICT  030858 030881 030887 030889 030898 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000661 000929 
DBTC    005968 006002 006026 
DBDICT  000372 000380 
NDBFS   000397 000401 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000686 000929 
DBTC    005968 006002 006026 
DBDICT  000497 000504 
NDBFS   000203 000227 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBTUP   024489 024538 
DBDICT  005668 005714 
DBDICT  000171 000174 000280 000280 000225 005790 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000674 000929 
DBTC    005968 006002 006026 
DBDICT  000331 000335 003962 000577 
NDBFS   000152 000536 000541 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000661 000929 
DBTC    005968 006002 006026 
DBDICT  000372 000405 
NDBFS   000397 000401 000144 
NDBFS   000915 000728 000934 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006590 006598 006600 
DBDICT  006590 006598 006768 
DBDICT  006700 006717 006739 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006700 006717 006757 006798 
DBDICT  009111 009121 009129 
DBDICT  009111 009121 009362 
DBDICT  009215 009233 009300 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  009215 009233 009345 009590 
DBDICT  010753 010797 
DBDICT  010753 010797 010969 
DBDICT  010880 010897 010940 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  010880 010897 010958 010975 
DBDICT  011123 011131 011139 
DBDICT  011123 011131 011351 011542 011555 
DBDICT  011222 011239 011286 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  011222 011239 011305 
DBDICT  011123 011187 011434 
DBLQH   018552 
DBTUP   007166 007093 007321 007321 007323 
DBLQH   018562 
DBDICT  010880 010897 010923 011482 011542 011555 
DBDICT  011222 011239 011286 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  011222 011239 011323 
DBDICT  011123 011187 011434 011443 011542 011555 
DBDICT  011222 011239 011286 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  011222 011239 011314 
DBDICT  011123 011196 011502 011542 011555 
DBDICT  011222 011239 011286 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
QMGR    000088 000112 001449 001472 001486 
NDBFS   000915 000917 000728 000730 000656 000686 000929 
DBDICT  011222 011239 011291 011542 011558 
DBTC    005968 006002 006026 
DBDICT  000497 
NDBFS   000203 000227 000144 
NDBFS   000915 000728 000934 
DBDICT  011222 011239 011254 010992 
DBDICT  010753 010855 011016 011024 
DBDICT  010880 010897 010940 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  010880 010897 010945 
DBDICT  010880 010897 010902 009648 
DBDICT  009111 009190 009710 
DBDICT  009215 009233 009300 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  009215 009233 009305 
DBDICT  009215 009233 009248 006815 006828 
DBDICT  004964 004982 
DBDICT  005575 006841 006843 
DBDICT  006590 006673 006861 
DBDICT  006700 006717 006739 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006700 006717 006744 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006590 006598 006600 
DBDICT  006590 006598 006768 
DBDICT  006700 006717 006739 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  006700 006717 006757 006798 
DBDICT  009111 009121 009129 
DBDICT  009111 009121 009362 
DBDICT  009215 009233 009300 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  009215 009233 009345 009590 009639 
DBDICT  009111 009181 009432 009437 
DBDICT  009215 009233 009300 
DBLQH   002583 
DBTC    003973 
DBTUP   002032 
DBDICT  009215 009233 009356 

...
...
...
...

--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413519 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413518 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00001800
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413518 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413517 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00001400
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413517 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413516 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00001000
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413516 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413515 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00000c00
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413515 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413514 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00000800
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413514 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 394413510 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'00000400
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 394413513 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 394413509 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 394413512 gsn: 257 "FSCLOSEREQ" prio: 0
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413511 length: 4 trace: 0 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 12116
 UserReference: H'00fa0003
 Flags: H'00000000, Don't remove file
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413511 gsn: 270 "FSWRITECONF" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 394413509 length: 1 trace: 0 #sec: 0 fragInf: 0
 UserPointer: 0
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413510 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 3, s.sigId: 394413508 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 394413509 gsn: 164 "CONTINUEB" prio: 0
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 394413507 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel every 10ms
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 3, r.sigId: 394413508 gsn: 164 "CONTINUEB" prio: 0
s.bn: 252 "QMGR", s.proc: 3, s.sigId: 394413506 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413505 gsn: 511 "CREATE_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 5045590 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'000000f0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413504 gsn: 511 "CREATE_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413503 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'000000f0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413503 gsn: 510 "CREATE_INDX_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413502 length: 8 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'000000f0 H'00000006 H'00000006 H'13579753 H'13579753
 H'00000001
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413502 gsn: 588 "CREATE_TABLE_REF" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413501 length: 7 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'00000003 H'000002bd H'00000000 H'096c2e08 H'082748e0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413501 gsn: 587 "CREATE_TABLE_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413500 length: 2 trace: 0 #sec: 1 fragInf: 0
 H'00000506 H'00fa0003
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413500 gsn: 511 "CREATE_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 5045590 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'00000010
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413499 gsn: 511 "CREATE_INDX_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413498 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000506 H'00fa0003 H'00000010
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413498 gsn: 510 "CREATE_INDX_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413497 length: 9 trace: 0 #sec: 2 fragInf: 0
 H'00000000 H'80120005 H'00000001 H'00000006 H'00000006 H'13579753 H'13579753
 H'00000001 H'00000506
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413497 gsn: 510 "CREATE_INDX_REQ" prio: 1
s.bn: 32786 "API", s.proc: 5, s.sigId: 0 length: 8 trace: 0 #sec: 2 fragInf: 0
 H'00000000 H'80120005 H'00000001 H'00000006 H'00000006 H'13579753 H'13579753
 H'00000001
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 394413496 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 394413492 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 394413495 gsn: 272 "FSWRITEREQ" prio: 0
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 394413494 length: 8 trace: 0 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 12116
 UserReference: H'00fa0003 Operation flag: H'00000011 (Sync, Format=Array of pages)
 varIndex: 1
 numberOfPages: 1
 pageData:  H'00000000, H'00000000

--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 394413494 gsn: 259 "FSOPENCONF" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 394413492 length: 3 trace: 0 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 12116
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 394413493 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 3, s.sigId: 394413491 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004

How to repeat:
Heavily loading the cluster with inserts/updates. It doesn't seem to happen on a deterministic way.
[19 Apr 2005 9:25] Pekka Nousiainen
Probably caused by simultaneous table/index drop
from 2 different mysql sessions.
[19 Apr 2005 10:29] Martin Skold
Looking at the trace it is not a random event causing the
data node. A failed CREATE TABLE seems to run into
a bug when trying to clean up (looks like some problem
when dropping related indexes of the table).
Is this really a showstopper?
Did the node recover after restart and was the 
CREATE TABLE possible to perform later?
[19 Apr 2005 15:17] Isabel Garcia Lorenzo
Thanx for the wuick response!

Not sure what table caused the error, but after a cluster shutdown we can't bring up any of the 2 NDB nodes. The error we get is pretty similar to the one we reported before (maybe the first error belonged to the NDB node trying to restart?)

Right now, when we try to start an NDB node, this is what we get:

CLUSTER LOG
------------------
Apr 19 16:54:18 vtlinuxdes3 NDB[19393]: [MgmSrvr] NDB Cluster Management Server. Version 4.1.11
Apr 19 16:54:18 vtlinuxdes3 NDB[19393]: [MgmSrvr] Id: 1, Command port: 1187
Apr 19 16:54:18 vtlinuxdes3 ndb_mgmd: ndb_mgmd startup succeeded
Apr 19 16:54:37 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 2 reserved for ip 192.168.200.11, m_reserved_nodes 0000000000000006.
Apr 19 16:54:37 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 2 Connected
Apr 19 16:55:10 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 1 completed 
Apr 19 16:56:10 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 2 completed (system restart)
Apr 19 16:56:10 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 3 completed (system restart)
Apr 19 16:56:34 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 2 Disconnected
Apr 19 16:56:35 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000002.
Apr 19 16:59:05 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 3 reserved for ip 192.168.200.12, m_reserved_nodes 000000000000000a.
Apr 19 16:59:06 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 3 Connected
Apr 19 16:59:19 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 2 reserved for ip 192.168.200.11, m_reserved_nodes 000000000000000e.
Apr 19 16:59:19 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 2 Connected
Apr 19 16:59:20 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: Node 2 Connected
Apr 19 16:59:20 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Node 3 Connected
Apr 19 16:59:20 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 1 completed 
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: CM_REGCONF president = 2, own Node = 3, our dynamic id = 2
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Node 3: API version 4.1.11
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: Node 2: API version 4.1.11
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: Start phase 1 completed 
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 2 completed (system restart)
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: Start phase 2 completed (system restart)
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 2: Start phase 3 completed (system restart)
Apr 19 16:59:21 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 3: Start phase 3 completed (system restart)
Apr 19 16:59:45 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 2 Disconnected
Apr 19 16:59:45 vtlinuxdes3 NDB[19393]: [MgmSrvr] Node 1: Node 3 Disconnected
Apr 19 16:59:45 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 2 freed, m_reserved_nodes 000000000000000a.
Apr 19 16:59:45 vtlinuxdes3 NDB[19393]: [MgmSrvr] Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000002.

NDB LOG
------------
Date/Time: Tuesday 19 April 2005 - 16:59:44
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: Dbdict.cpp
Object of reference: DBDICT (Line: 2108) 0x00000002
ProgramName: ndbd
ProcessID: 3709
TraceFile: /data/db/mysql-cluster/ndb_2_trace.log.18
***EOM***

TRACE FILE
-------------
JAM CONTENTS up->down left->right ?=not block entry
BLOCK   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR   ADDR
DBDICT ?002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243
        002095 002237 002243 002095 002237 002243 002095 002237
        002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243
        002095 002237 002243 002095 002237 002243 002095 002237
        002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243
        002095 002237 002243 002095 002237 002243 002095 002237
        002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243
        002095 002237 002243 002095 002237 002243 002095 002237
        002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243
        002095 002237 002243 002095 002237 002243 002095 002237
        002243 002095 002237 002243 002095 002237 002243 002095
        002237 002243 002095 002237 002243 002095 002237 002243

--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 675211 gsn: 403 "TC_SCHVERCONF" prio: 1
s.bn: 245 "DBTC", s.proc: 2, s.sigId: 675210 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000082 H'00000083
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 675210 gsn: 404 "TC_SCHVERREQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 675209 length: 6 trace: 0 #sec: 0 fragInf: 0
 H'00000082 H'00000006 H'00000000 H'00fa0002 H'00000006 H'00000083
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 675209 gsn: 396 "TAB_COMMITCONF" prio: 1
s.bn: 246 "DBDIH", s.proc: 2, s.sigId: 675208 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000083 H'00000002 H'00000082
--------------- Signal ----------------
r.bn: 246 "DBDIH", r.proc: 2, r.sigId: 675208 gsn: 398 "TAB_COMMITREQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 675207 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000083 H'00fa0002 H'00000082
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 675207 gsn: 396 "TAB_COMMITCONF" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 675206 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000083 H'00000002 H'00000082
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 675206 gsn: 398 "TAB_COMMITREQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 675205 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000083 H'00fa0002 H'00000082
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 675205 gsn: 185 "DIADDTABCONF" prio: 1
s.bn: 246 "DBDIH", s.proc: 2, s.sigId: 675204 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000083
--------------- Signal ----------------
r.bn: 246 "DBDIH", r.proc: 2, r.sigId: 675204 gsn: 109 "ADD_FRAGCONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 675203 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00018ea8 H'00000001
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 675203 gsn: 308 "LQHADDATTCONF" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 675202 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000083 H'ffffff00 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 675202 gsn: 674 "TUX_ADD_ATTRCONF" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 675201 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 675201 gsn: 674 "TUX_ADD_ATTRCONF" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 675200 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000

....
....
...
...

--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670288 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670287 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000001 H'00000000 H'00000009 H'0018023e H'0008000e
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670287 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670286 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670286 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670285 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000 H'00000008 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670285 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670284 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670284 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670283 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000001 H'00000000 H'00000008 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670283 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670282 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670282 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670281 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000 H'00000007 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670281 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670280 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670280 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670279 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000001 H'00000000 H'00000007 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670279 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670278 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670278 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670277 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000 H'00000006 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670277 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670276 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670276 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670275 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000001 H'00000000 H'00000006 H'00010331 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670275 gsn: 310 "LQHADDATTREQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 670274 length: 19 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000005 H'0000006d H'ffffff00 H'00000006 H'00010331 H'00000001
 H'00000007 H'00010331 H'00000001 H'00000008 H'00010331 H'00000001 H'00000009
 H'0018023e H'0008000e H'0000000a H'00010251 H'00000007
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 670274 gsn: 308 "LQHADDATTCONF" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670273 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000006d H'00000232 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 670273 gsn: 415 "TUP_ADD_ATTCONF" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 670272 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 670272 gsn: 417 "TUP_ADD_ATTRREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 670271 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000 H'00000005 H'00010351 H'00000007
[20 Apr 2005 12:45] Isabel Garcia Lorenzo
Thanx for the wuick response!

Not sure what table caused the error, but after a cluster shutdown we can't
bring up any of the 2 NDB nodes. The error we get is pretty similar to the one
we reported before (maybe the first error belonged to the NDB node trying to
restart?)
[21 Apr 2005 13:53] Martin Skold
Did you try starting node with ndbd --initial
[22 Apr 2005 8:49] Francisco Javier Garcia Humphries
We have started with --initial. We lose all our NDB tables (as expected) and restore/recreate them. The problem is that the ndbrequire issue keeps happening again and again, corrupting our nodes every time.

Any idea on why this is happening so often?
[22 Apr 2005 8:49] Francisco Javier Garcia Humphries
We have started with --initial. We lose all our NDB tables (as expected) and restore/recreate them. The problem is that the ndbrequire issue keeps happening again and again, corrupting our nodes every time.

Any idea on why this is happening so often?
[22 Apr 2005 8:50] Francisco Javier Garcia Humphries
We have started with --initial. We lose all our NDB tables (as expected) and restore/recreate them. The problem is that the ndbrequire issue keeps happening again and again, corrupting our nodes every time.

Any idea on why this is happening so often?
[22 Apr 2005 8:56] Jonas Oreland
Hi, just checking...

Could it be that you have changed MaxNoOfTables (order similar)

The ndbrequire indicates that a table is defined with a tableId larger than
  currently allocated pool.
[22 Apr 2005 9:38] Francisco Javier Garcia Humphries
Hi

We have used different values for MaxNoOfTables, but we haven't changed the value since the last --initial. Right now, i've tried to change the parameter to the max (1600) and tried to start, but nodes are still not coming up (disconnect on phase 4).
[22 Apr 2005 9:43] Isabel Garcia Lorenzo
Trace from the last startup attempt with Max Tables 1600

Attachment: ndb_2_trace.log.zip (application/x-zip-compressed, text), 70.04 KiB.

[22 Apr 2005 9:43] Isabel Garcia Lorenzo
We have added a trace file from the last startup attempt.
[22 May 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 Sep 2005 13:11] Pekka Nousiainen
Patched fixes from 5.0 bug#11355.
[16 Sep 2005 11:17] Jon Stephens
Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

Additional info:

Documented fix in 5.0.10 changelog.