Bug #17594 Various NDB startup / restart problems
Submitted: 20 Feb 2006 19:41 Modified: 14 Apr 2006 7:35
Reporter: Joerg Bruehe Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.7-beta OS:Various Unix
Assigned to: CPU Architecture:Any

[20 Feb 2006 19:41] Joerg Bruehe
Description:
Build of 5.1.7-beta, based on ChangeSet
  1.2139 06/02/20 00:32:07 kent@mysql.com +3 -0
  mysql-test-run.pl:
    Added --restart-cleanup option
  drop-on-restart.inc:
    DROP commands to cleanup on restart
    new file
  mysqltest.c:
    Added option --include=<sql-file>

On various platforms, NDB fails to start initially or to restart after problems:

Initial start fails:
=====
Installing ndbcluster master
Starting ndbd 1(2)
error=2350
2006-02-20 07:42:38 [ndbd] INFO     -- Error handler restarting system
2006-02-20 07:42:38 [ndbd] INFO     -- Error handler shutdown completed - aborting
sphase=0
exit=-1

ERROR: /export/home/mysqldev/butch-64bit/test/mysql-5.1.7-beta-solaris9-sparc-64bit/mysql-test/var/ndbcluster-9350/ndb_1.pid was not created in 120 seconds;  Aborting
mysql-test-run: *** ERROR: Error ndbcluster_install
=====
butch-64bit (quote above: Solaris 9),
sol10-sparc-a-64bit (Solaris 10),
sunfire100a-64bit (solaris8-sparc-64bit)

NDB restart (after test failure) fails:
=====
ndb_dd_dump                    [ fail ]

Errors are (from /usr/local/mysqldev/hp3750-64bit/test/mysql-5.1.7-beta-hpux11.00-hppa2.0w-64bit/mysql-test/var/log/mysqltest-time) :
ERROR 1506 (HY000) at line 24: Failed to create LOGFILE GROUP
mysqltest: At line 209: command "$MYSQL test < var/tmp/ndb_dd_dump.sql" failed
(the last lines may be the most important ones)

Ending Tests
Shutting-down MySQL daemon

Master(s) shutdown finished
Slave(s) shutdown finished
Resuming Tests

ndb_gis                        waitNodeState(STARTED, -1) timeout after 121 attemps
mysql-test-run: *** ERROR: Error ndbcluster_start
=====
hp3750-64bit (above)
hpita2-64bit (hpux11.23-ia64-64bit) same messages,

=====
ndb_autodiscover2              [ fail ]

Errors are (from /home/mysqldev/ita2-rhas21/test/mysql-5.1.7-beta-linux-ia64-rhas21/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 10: query 'select * from t9 order by a' failed: 1146: Table 'test.t9' doesn't exist
(the last lines may be the most important ones)

Ending Tests
Shutting-down MySQL daemon

Master(s) shutdown finished
Slave(s) shutdown finished
Resuming Tests

ndb_basic                      waitNodeState(STARTED, -1) timeout after 121 attemps
mysql-test-run: *** ERROR: Error ndbcluster_start
=====
ita2-rhas21

=====
ndb_blob                       [ fail ]

Errors are (from /home/mysqldev/rx2620b-icc-glibc23/test/mysql-5.1.7-beta-linux-ia64-icc-glibc23/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 331: query 'alter table t1 add x int' failed: 1005: Can't create table 'test.#sql-3e7a_6' (errno: 156)
(the last lines may be the most important ones)

Ending Tests
Shutting-down MySQL daemon

Master(s) shutdown finished
Slave(s) shutdown finished
Resuming Tests

ndb_cache                      waitNodeState(STARTED, -1) timeout after 121 attemps
mysql-test-run: *** ERROR: Error ndbcluster_start
=====
rx2620b-icc-glibc23 (ia64)

Two further restart failures might be connected to more general replication problems:
osx-tiger-x86             :: Error ndbcluster_start_slave rpl_ndb_row_001
production-icc-glibc23    :: Error ndbcluster_start_slave rpl_ndb_charset

How to repeat:
Detected by running the test suite.
[21 Feb 2006 8:16] Tomas Ulin
various rpl_ndb_multi  seems to be

23630: rpl_ndb_multi                  [ fail ]
23631: 
23632: Errors are (from /home/mysqldev/pegasos3-glibc23/test/mysql-5.1.8-beta-linux-powerpc-glibc23/mysql-test/var/log/mysqltest-time) :
23633: mysqltest: At line 17: query 'SHOW TABLES' failed: 2013: Lost connection to MySQL server during query
23634: (the last lines may be the most important ones)
23635:
[27 Feb 2006 11:04] Pekka Nousiainen
ChangeSet
  1.2164 06/02/25 17:25:22 pekka@mysql.com +2 -0
  ndb - sockaddr alignment fix, found on sunfire100a, can affect any non-x86

caused coredump in ndb_mgmd and ndbd.
pushed to release clone on Sat Feb 25.
[14 Apr 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".