MySQL Bugs: #18550: ndbd getting "node failure handling not complete..." after graceful restart

Bug #18550	ndbd getting "node failure handling not complete..." after graceful restart
Submitted:	27 Mar 2006 18:24	Modified:	28 Apr 2006 8:57
Reporter:	Serge Kozlov	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0 ->	OS:	Linux (FC4)
Assigned to:	Tomas Ulin	CPU Architecture:	Any

Description:
Run clustr with two ndbd nodes (2 replicas) on same computer and try to restart master node when an application actively works through mysqld. Got crash:

ndb_pid11802_error.log
===============
Current byte-offset of file-pointer is: 569

Time: Monday 27 Mars 2006 - 19:53:51
Status: Permanent error, external action needed
Message: Invalid configuration received from Management Server (Configuration er
ror)
Error: 2350
Error data: Unable to alloc node id
Error object: Could not connect to socket : Could not alloc node id at ndb16 por
t 1186: Cluster refused allocation of id 2. Error: 1703 (Node failure handling n
ot completed: Permanent error: Application error).
Program: /home/ndbdev/skozlov/builds/libexec/ndbd
Pid: 11802
Trace:

core:
====

GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db l
ibrary "/lib64/libthread_db.so.1".

Core was generated by `/home/ndbdev/skozlov/builds/libexec/ndbd -c ndb16 --initial'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
#0 0x00000033b835bd3d in fflush () from /lib64/libc.so.6

How to repeat:
1. Start cluster with 2 ndbd nodes and 2 replicas but all nodes placed on same server (include ndb_mgmd, mysqld, api)
2. Run ndb_mgm
3. Start ./load_tpcb.pl ndb16 3306 root BLANK ndb
4. Wait while 'Loading account table' will appear.
5. Try to retstart master node via RESTART command.
6. That node crashed.

Appearing of bug is unstable.

The behavior that node restart should fail if node failure handling is not complete
  is expected. 
It should however not produce core, this is duplicate of http://bugs.mysql.com/bug.php?id=17677.

BTW: If you did "graceful restart" i.e using ndb_mgm, then it should be fixed to wait
  with restart until a node is allowed to start.

Did you do "graceful restart" ?

I did 'graceful restart' - 'X RESTART'

reviewed by Jonas

pushed to 5.0.22 and 5.1.10

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.0.22/5.1.10 changelogs; closed.