Bug #16481 DD: Cluster fails during load of TPC-B tables
Submitted: 13 Jan 2006 14:07 Modified: 14 Feb 2006 9:15
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.6-alpha OS:Linux (Linux)
Assigned to: Jonas Oreland CPU Architecture:Any

[13 Jan 2006 14:07] Jonathan Miller
Description:
Using 3 host cluster (ndb08 = ndb_mgmd, mysqld) (ndb07 = 2 data nodes) (ndb09 = 2 data nodes)

LG and TS create:
$sth = $dbhM->prepare("CREATE LOGFILE GROUP TPCB_LOG
                           ADD UNDOFILE './tpcb_log/undofile.dat'
                           INITIAL_SIZE 150M
                           UNDO_BUFFER_SIZE = 1M
                           ENGINE=NDB;")
    $sth = $dbhM->prepare("CREATE TABLESPACE TPCB_TS
                           ADD DATAFILE './tpcb_ts/datafile.dat'
                           USE LOGFILE GROUP TPCB_LOG
                           INITIAL_SIZE 350M
                           ENGINE=NDB;")
      or die "Prepare CREATE TABLESPACE error: ", $dbhM->errstr;

NDB Errors:
Time: Friday 13 January 2006 - 14:45:51
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4009
Trace: /space/run/ndb_4_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 14:46:11
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4086
Trace: /space/run/ndb_4_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

Current byte-offset of file-pointer is: 1067

Time: Friday 13 January 2006 - 14:45:54
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4012
Trace: /space/run/ndb_6_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 14:46:10
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4085
Trace: /space/run/ndb_6_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 14:45:53
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 376
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 14:46:10
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 454
Trace: /space/run/ndb_5_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

Current byte-offset of file-pointer is: 1067

Time: Friday 13 January 2006 - 14:45:54
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 379
Trace: /space/run/ndb_7_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
 
Time: Friday 13 January 2006 - 14:46:10
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 455
Trace: /space/run/ndb_7_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

[ndbdev@ndb07 run]$ cat *err*
Current byte-offset of file-pointer is: 1067

Time: Friday 13 January 2006 - 14:45:53
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 376
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 14:46:10
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 454
Trace: /space/run/ndb_5_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

Current byte-offset of file-pointer is: 1067

Time: Friday 13 January 2006 - 14:45:54
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 379
Trace: /space/run/ndb_7_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
 
Time: Friday 13 January 2006 - 14:46:10
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 455
Trace: /space/run/ndb_7_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

Cluster log:

2006-01-13 14:45:52 [MgmSrvr] INFO     -- Node 7: GCP Take over completed
2006-01-13 14:45:53 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2006-01-13 14:45:53 [MgmSrvr] WARNING  -- Allocate nodeid (4) failed. Connection from ip XX.XXX.1.94. Returned error string "Id 4 already allocated by another node."
2006-01-13 14:45:53 [MgmSrvr] INFO     -- Mgmt server state: node id's  4 5 6 7 connected but not reserved
2006-01-13 14:45:53 [MgmSrvr] INFO     -- Mgmt server state: node id's  1 not connected but reserved
2006-01-13 14:45:54 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2006-01-13 14:45:54 [MgmSrvr] ALERT    -- Node 6: Node 5 Disconnected
2006-01-13 14:45:54 [MgmSrvr] INFO     -- Node 6: Communication to Node 5 closed
2006-01-13 14:45:54 [MgmSrvr] ALERT    -- Node 7: Node 5 Disconnected
2006-01-13 14:45:54 [MgmSrvr] INFO     -- Node 7: Communication to Node 5 closed
2006-01-13 14:45:54 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2006-01-13 14:45:54 [MgmSrvr] WARNING  -- Allocate nodeid (5) failed. Connection from ip XX.XXX.1.92. Returned error string "Id 5 already allocated by another node."

2006-01-13 14:46:11 [MgmSrvr] ALERT    -- Node 7: Forced node shutdown completed, restarting. Occured during startphase 4. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temp

How to repeat:
Setup a 3 host cluster and run the load_tpcb.pl script.
[13 Jan 2006 14:36] Jonathan Miller
Tried to use 2 undo logs:

$sth = $dbhM->prepare("ALTER LOGFILE GROUP TPCB_LOG
                           ADD UNDOFILE './tpcb_log/undofile2.dat'
                           INITIAL_SIZE 150M
                           ENGINE=NDB;")
Cluster actually failed much faster then before.

NDB Errors:

Time: Friday 13 January 2006 - 15:24:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 961
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 1037
Trace: /space/run/ndb_5_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 15:24:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4651
Trace: /space/run/ndb_4_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4728
Trace: /space/run/ndb_4_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:14
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4654
Trace: /space/run/ndb_6_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4727
Trace: /space/run/ndb_6_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:13
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 964
Trace: /space/run/ndb_7_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
 
Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 1038
Trace: /space/run/ndb_7_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
[3 Feb 2006 14:35] Jonas Oreland
I fixed a bunch of bugs in this area...probably this has been fixed

The problems is related to "small" undo_buffer_size
[7 Feb 2006 5:53] Jonas Oreland
pushed into 5.1.7
[14 Feb 2006 9:15] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented fix in 5.1.7 changelog; closed bug.