| Bug #16481 | DD: Cluster fails during load of TPC-B tables | ||
|---|---|---|---|
| Submitted: | 13 Jan 2006 14:07 | Modified: | 14 Feb 2006 9:15 |
| Reporter: | Jonathan Miller | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
| Version: | 5.1.6-alpha | OS: | Linux (Linux) |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[13 Jan 2006 14:36]
Jonathan Miller
Tried to use 2 undo logs:
$sth = $dbhM->prepare("ALTER LOGFILE GROUP TPCB_LOG
ADD UNDOFILE './tpcb_log/undofile2.dat'
INITIAL_SIZE 150M
ENGINE=NDB;")
Cluster actually failed much faster then before.
NDB Errors:
Time: Friday 13 January 2006 - 15:24:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 961
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 1037
Trace: /space/run/ndb_5_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1605) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4651
Trace: /space/run/ndb_4_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4728
Trace: /space/run/ndb_4_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:14
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4654
Trace: /space/run/ndb_6_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 4727
Trace: /space/run/ndb_6_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:13
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 964
Trace: /space/run/ndb_7_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***
Time: Friday 13 January 2006 - 15:24:30
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 2092) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 1038
Trace: /space/run/ndb_7_trace.log.2
Version: Version 5.1.6 (alpha)
***EOM***
[3 Feb 2006 14:35]
Jonas Oreland
I fixed a bunch of bugs in this area...probably this has been fixed The problems is related to "small" undo_buffer_size
[7 Feb 2006 5:53]
Jonas Oreland
pushed into 5.1.7
[14 Feb 2006 9:15]
Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.
If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information
about accessing the source trees is available at
http://www.mysql.com/doc/en/Installing_source_tree.html
Additional info:
Documented fix in 5.1.7 changelog; closed bug.

Description: Using 3 host cluster (ndb08 = ndb_mgmd, mysqld) (ndb07 = 2 data nodes) (ndb09 = 2 data nodes) LG and TS create: $sth = $dbhM->prepare("CREATE LOGFILE GROUP TPCB_LOG ADD UNDOFILE './tpcb_log/undofile.dat' INITIAL_SIZE 150M UNDO_BUFFER_SIZE = 1M ENGINE=NDB;") $sth = $dbhM->prepare("CREATE TABLESPACE TPCB_TS ADD DATAFILE './tpcb_ts/datafile.dat' USE LOGFILE GROUP TPCB_LOG INITIAL_SIZE 350M ENGINE=NDB;") or die "Prepare CREATE TABLESPACE error: ", $dbhM->errstr; NDB Errors: Time: Friday 13 January 2006 - 14:45:51 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 1605) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 4009 Trace: /space/run/ndb_4_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:11 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 4086 Trace: /space/run/ndb_4_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** Current byte-offset of file-pointer is: 1067 Time: Friday 13 January 2006 - 14:45:54 Status: Temporary error, restart node Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 3826) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 4012 Trace: /space/run/ndb_6_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:10 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 4085 Trace: /space/run/ndb_6_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:45:53 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 1605) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 376 Trace: /space/run/ndb_5_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:10 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 454 Trace: /space/run/ndb_5_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** Current byte-offset of file-pointer is: 1067 Time: Friday 13 January 2006 - 14:45:54 Status: Temporary error, restart node Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 3826) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 379 Trace: /space/run/ndb_7_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:10 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 455 Trace: /space/run/ndb_7_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** [ndbdev@ndb07 run]$ cat *err* Current byte-offset of file-pointer is: 1067 Time: Friday 13 January 2006 - 14:45:53 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 1605) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 376 Trace: /space/run/ndb_5_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:10 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 454 Trace: /space/run/ndb_5_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** Current byte-offset of file-pointer is: 1067 Time: Friday 13 January 2006 - 14:45:54 Status: Temporary error, restart node Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 3826) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 379 Trace: /space/run/ndb_7_trace.log.1 Version: Version 5.1.6 (alpha) ***EOM*** Time: Friday 13 January 2006 - 14:46:10 Status: Temporary error, restart node Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug) Error: 2343 Error data: lgman.cpp Error object: LGMAN (Line: 2092) 0x00000008 Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 455 Trace: /space/run/ndb_7_trace.log.2 Version: Version 5.1.6 (alpha) ***EOM*** Cluster log: 2006-01-13 14:45:52 [MgmSrvr] INFO -- Node 7: GCP Take over completed 2006-01-13 14:45:53 [MgmSrvr] ALERT -- Node 4: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. 2006-01-13 14:45:53 [MgmSrvr] WARNING -- Allocate nodeid (4) failed. Connection from ip XX.XXX.1.94. Returned error string "Id 4 already allocated by another node." 2006-01-13 14:45:53 [MgmSrvr] INFO -- Mgmt server state: node id's 4 5 6 7 connected but not reserved 2006-01-13 14:45:53 [MgmSrvr] INFO -- Mgmt server state: node id's 1 not connected but reserved 2006-01-13 14:45:54 [MgmSrvr] INFO -- Node 1: Node 5 Connected 2006-01-13 14:45:54 [MgmSrvr] ALERT -- Node 6: Node 5 Disconnected 2006-01-13 14:45:54 [MgmSrvr] INFO -- Node 6: Communication to Node 5 closed 2006-01-13 14:45:54 [MgmSrvr] ALERT -- Node 7: Node 5 Disconnected 2006-01-13 14:45:54 [MgmSrvr] INFO -- Node 7: Communication to Node 5 closed 2006-01-13 14:45:54 [MgmSrvr] ALERT -- Node 5: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. 2006-01-13 14:45:54 [MgmSrvr] WARNING -- Allocate nodeid (5) failed. Connection from ip XX.XXX.1.92. Returned error string "Id 5 already allocated by another node." 2006-01-13 14:46:11 [MgmSrvr] ALERT -- Node 7: Forced node shutdown completed, restarting. Occured during startphase 4. Initiated by signal 6. Caused by error 2343: 'Internal program error (failed ndbassert)(Internal error, programming error or missing error message, please report a bug). Temp How to repeat: Setup a 3 host cluster and run the load_tpcb.pl script.