Bug #18550 ndbd getting "node failure handling not complete..." after graceful restart
Submitted: 27 Mar 2006 18:24 Modified: 28 Apr 2006 8:57
Reporter: Serge Kozlov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0 -> OS:Linux (FC4)
Assigned to: Tomas Ulin CPU Architecture:Any

[27 Mar 2006 18:24] Serge Kozlov
Description:
Run clustr with two ndbd nodes (2 replicas) on same computer and try to restart master node when an application actively works through mysqld. Got crash:

ndb_pid11802_error.log
===============
Current byte-offset of file-pointer is: 569

Time: Monday 27 Mars 2006 - 19:53:51
Status: Permanent error, external action needed
Message: Invalid configuration received from Management Server (Configuration er
ror)
Error: 2350
Error data: Unable to alloc node id
Error object: Could not connect to socket : Could not alloc node id at ndb16 por
t 1186: Cluster refused allocation of id 2. Error: 1703 (Node failure handling n
ot completed: Permanent error: Application error).
Program: /home/ndbdev/skozlov/builds/libexec/ndbd
Pid: 11802
Trace: 

core:
====

GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db l
ibrary "/lib64/libthread_db.so.1".

Core was generated by `/home/ndbdev/skozlov/builds/libexec/ndbd -c ndb16 --initial'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...done.
Loaded symbols for /lib64/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
#0  0x00000033b835bd3d in fflush () from /lib64/libc.so.6

How to repeat:
1. Start cluster with 2 ndbd nodes and 2 replicas but all nodes placed on same server (include ndb_mgmd, mysqld, api)
2. Run ndb_mgm
3. Start ./load_tpcb.pl ndb16 3306 root BLANK ndb
4. Wait while 'Loading account table' will appear.
5. Try to retstart master node via RESTART command.
6. That node crashed.
[28 Mar 2006 15:31] Serge Kozlov
Appearing of bug is unstable.
[28 Mar 2006 16:15] Jonas Oreland
The behavior that node restart should fail if node failure handling is not complete
  is expected. 
It should however not produce core, this is duplicate of http://bugs.mysql.com/bug.php?id=17677.

BTW: If you did "graceful restart" i.e using ndb_mgm, then it should be fixed to wait
  with restart until a node is allowed to start.
[28 Mar 2006 16:16] Jonas Oreland
Did you do "graceful restart" ?
[28 Mar 2006 16:49] Serge Kozlov
I did 'graceful restart' - 'X RESTART'
[27 Apr 2006 5:31] Tomas Ulin
reviewed by Jonas

pushed to 5.0.22 and 5.1.10
[28 Apr 2006 8:57] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.0.22/5.1.10 changelogs; closed.