Bug #45733 Cluster with more than 4 data nodes crashes
Submitted: 24 Jun 2009 22:56 Modified: 17 Jul 2009 9:17
Reporter: Sajjad Tariq Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Windows (XP)
Assigned to: jack andrews CPU Architecture:Any
Tags: Storage Node 5.1.34-ndb-7.0.6-cluster-gpl

[24 Jun 2009 22:56] Sajjad Tariq
Description:
When ndb_mgmd is started on a Cluster with more than 4 storage node, then ndb_mgm and the storage NDBD can not get the config and ndb_mgmd crashes. The same setup works for 4 storage nodes.

How to repeat:
Config.ini
==========

[tcp default]
SendBufferMemory=2M
ReceiveBufferMemory=2M

# Increasing the sizes of these 2 buffers beyond the default values
# helps prevent bottlenecks due to slow disk I/O.

[ndb_mgmd default]
datadir=C:/mysql/mysql-cluster/ndb_mgmd

[ndb_mgmd]
id=1
hostname=mgmd1

#[ndb_mgmd]
#id=2
#hostname=mgmd2

[mysqld default]

[mysqld]
id=11
hostname=mysqld11

#[mysqld]
#id=12
#hostname=mysqld12

[api]
id=13

[api]
id=14

[ndbd default]
noofreplicas=2

MaxNoOfOrderedIndexes=10000
MaxNoOfAttributes=10000
TransactionDeadlockDetectionTimeout=12000
StopOnError=0
datadir=C:/mysql/mysql-cluster/ndbd

DataMemory=1280M
IndexMemory=150M

# The values provided for DataMemory and IndexMemory assume 4 GB RAM
# per data node. However, for best results, you should first calculate
# the memory that would be used based on the data you actually plan to
# store (you may find the ndb_size.pl utility helpful in estimating
# this), then allow an extra 20% over the calculated values. Naturally,
# you should ensure that each data node host has at least as much
# physical memory as the sum of these two values.

# ODirect=1

# Enabling this parameter causes NDBCLUSTER to try using O_DIRECT
# writes for local checkpoints and redo logs; this can reduce load on
# CPUs. We recommend doing so when using MySQL Cluster NDB 6.2.3 or
# newer on systems running Linux kernel 2.6 or later.

MaxNoOfConcurrentOperations=100000

#SchedulerSpinTimer=400
#SchedulerExecutionTimer=100
#RealTimeScheduler=1
# Setting these parameters allows you to take advantage of real-time scheduling
# of NDBCLUSTER threads (introduced in MySQL Cluster NDB 6.3.4) to get higher
# throughput.

UndoIndexBuffer=10M
TimeBetweenLocalCheckpoints=20

#TimeBetweenGlobalCheckpoints=1000
#TimeBetweenEpochs=200
#DiskCheckpointSpeed=10M
#DiskCheckpointSpeedInRestart=100M
#RedoBuffer=32M

# CompressedLCP=1
# CompressedBackup=1
# Enabling CompressedLCP and CompressedBackup causes, respectively, local checkpoint files and backup files to be compressed, which can result in a space savings of up to 50% over noncompressed LCPs and backups.

[ndbd]
id=21
hostname=ndbd214

[ndbd]
id=22
hostname=ndbd213

[ndbd]
id=23
hostname=ndbd212

[ndbd]
id=24
hostname=ndbd211

[ndbd]
id=25
hostname=ndbd210

[ndbd]
id=26
hostname=ndbd209

[ndbd]
id=27
hostname=ndbd208

[ndbd]
id=28
hostname=ndbd207

[ndbd]
id=29
hostname=ndbd206

[ndbd]
id=30
hostname=ndbd205

[ndbd]
id=31
hostname=ndbd204

[ndbd]
id=32
hostname=ndbd203

[ndbd]
id=33
hostname=ndbd202

[ndbd]
id=34
hostname=ndbd201

my.ini
==========
[mysql_cluster]
ndb-connectstring=MGMD1

[mysqld]
basedir="C:/Program Files/MySQL/MySQL Server 7.0/"
datadir=C:/mysql/mysql-cluster/data/

port=3306
default-character-set=latin1
default-storage-engine=ndbcluster
skip-innodb
ndbcluster
ndb-use-exact-count=0
ndb-index-stat-enable=0
ndb-force-send=1
engine-condition-pushdown=1

max_allowed_packet=16M
delayed_insert_timeout=10000
connect_timeout=100000

[ndb_mgmd]
config-file="C:/mysql/mysql-cluster/ndb_mgmd/config.ini"
configdir="C:/mysql/mysql-cluster/ndb_mgmd/"

[ndbd default]

[ndbd]

[ndb_mgm]
[25 Jun 2009 6:35] Sveta Smirnova
Thank you for the report.

Please provide error you get.
[25 Jun 2009 15:10] Sajjad Tariq
I start up the management node 

C:\Program Files\MySQL\MySQL Server 7.0\bin>ndb_mgmd --initial
2009-06-25 08:59:57 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5
.1.34 ndb-7.0.6
2009-06-25 08:59:57 [MgmSrvr] INFO     -- Reading cluster configuration from 'C:
/mysql/mysql-cluster/ndb_mgmd/config.ini'

=====================================================================

When I start the management console I get nine unhandled exception error which my Visual Studio Just-In-Time Debugger tries to debug. The message states: 

An unhandled win32 exception occurred in ndb_mgmd.exe[1044]

When I debug it in VS2005 i get following message in the immediate window 

First-chance exception at 0x00414de2 in ndb_mgmd.exe: 0xC0000005: Access violation reading location 0x47774651.

and when I break the code in the disassembly I get break at this line

00414DC9  int         3    
00414DCA  int         3    
00414DCB  int         3    
00414DCC  int         3    
00414DCD  int         3    
00414DCE  int         3    
00414DCF  int         3    
00414DD0  push        esi  
00414DD1  mov         esi,ecx 
00414DD3  mov         ecx,dword ptr [esi+18h] 
00414DD6  test        ecx,ecx 
00414DD8  mov         dword ptr [esi],517B08h 
00414DDE  je          00414DE8 
00414DE0  mov         eax,dword ptr [ecx] 
00414DE2  mov         edx,dword ptr [eax]       <========== Break
00414DE4  push        1    
00414DE6  call        edx  
00414DE8  mov         ecx,dword ptr [esi+1Ch] 
00414DEB  test        ecx,ecx 
00414DED  je          00414DF7 
00414DEF  mov         eax,dword ptr [ecx] 
00414DF1  mov         edx,dword ptr [eax] 

=================================================================
On the management console I get this out put

C:\Program Files\MySQL\MySQL Server 7.0\bin>ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: MGMD1:1186
Could not get configuration
*  1006: Illegal reply from server
*
ndb_mgm>
[10 Jul 2009 14:03] jack andrews
D:\repo\more-than-4-ndbd-bug45733\storage\ndb\src\mgmsrv\debug\ndb_mgmd.exe -f D
:\repo\more-than-4-ndbd-bug45733\ini\trimmed.ini --reload  --initial --nodaemon

and run ndb_mgmd -e "show".  causes crash.

trimmed.ini
===========

[ndb_mgmd]
id=1
datadir=D:\repo\more-than-4-ndbd-bug45733\data
hostname=127.0.0.1

[mysqld]
id=11
hostname=127.0.0.1

[ndbd default]
noofreplicas=2
datadir=D:\repo\more-than-4-ndbd-bug45733\data

[ndbd]
id=21
hostname=127.0.0.1

[ndbd]
id=22
hostname=127.0.0.1

[ndbd]
id=23
hostname=127.0.0.1

[ndbd]
id=24
hostname=127.0.0.1

[ndbd]
id=25
hostname=127.0.0.1

[ndbd]
id=26
hostname=127.0.0.1

[ndbd]
id=27
hostname=127.0.0.1

[ndbd]
id=28
hostname=127.0.0.1
[13 Jul 2009 10:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78511

2948 jack andrews	2009-07-13
      Bug #45733  	Cluster with more than 4 storage node 
        . fixed basestring_vsnprintf on windows
[13 Jul 2009 11:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78531

2948 jack andrews	2009-07-13
      Bug #45733  	Cluster with more than 4 storage node 
        . fixes basestring_vsnprintf() for windows
        . now, will run `ndb_mgm -e show` to completion
[13 Jul 2009 14:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78543

2948 jack andrews	2009-07-13
      Bug #45733  	Cluster with more than 4 storage node 
        . changed basestring_vsnprintf to check that clients
          don't try to write a string with length > max_size.
[14 Jul 2009 11:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78633

2948 jack andrews	2009-07-14
      Bug #45733  	Cluster with more than 4 storage node 
        . fixed basestring_vsprintf so it will always return
          the posix defined retval for vsnprintf.
      
          if the buffer can't hold the output string, the
          function will return the space needed.  you need
          to provide a buffer one larger than the retval
          so the terminating null will be written.
[15 Jul 2009 8:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78704

2948 jack andrews	2009-07-15
      Bug #45733        Cluster with more than 4 storage node
         . fixed basestring_vsprintf so it will always return
           the posix defined retval for vsnprintf.
      
           if the buffer can't hold the output string, the
           function will return the space needed.  you need
           to provide a buffer one larger than the retval
           so the terminating null will be written.
[15 Jul 2009 17:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78762

2949 jack andrews	2009-07-15
      Bug #45733  	Cluster with more than 4 storage node
        . fix for code review
[16 Jul 2009 11:03] Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jack@sun.com-20090716110203-o0zfad30d7tew97x) (version source revid:jack@sun.com-20090715172607-2nyjnhnhlin5dhro) (merge vers: 5.1.35-ndb-7.0.7) (pib:11)
[16 Jul 2009 16:48] jack andrews
i emailed Sajjad with this:

hi Sajjad,

> If you could kindly provide me the updated exe's, I can update my
> installed folders and give the setting a try.

thank you again for evaluating mysql cluster on
windows. your bug reports and fixes move cluster
on windows closer to an official release.

you can download a tgz from ivorykite.com.
username:  guest@ivorykite.com
password:  guest
filename:  bug45733.tgz

note that the username is not 'guest' but
'guest@ivorykite.com'

for example:
$ ftp ivorykite.com
Connected to ivorykite.com.
220 ProFTPD FTP Server ready.
Name (ivorykite.com:jack): guest@ivorykite.com
331 Password required for guest@ivorykite.com.
Password:
230 User guest@ivorykite.com logged in.
Remote system type is UNIX.
Using binary mode to transfer files.

so i hope that the binaries fix the bug you found.
it might even fix the second problem you found.

please let me know how you go with this.

ta, jack.

for my reference,
jack@asus /cygdrive/d/repo/more-than-4-ndbd-bug45733
$ tar cfz bug45733.tgz `find storage/ndb/ -name \*.exe |grep -v ndb/test |grep
-v -- -t` storage/ndb/src/ndbapi/debug/ndbapi.lib

$ tar tf bug45733.tgz
storage/ndb/src/kernel/blocks/dbdict/debug/printSchemaFile.exe
storage/ndb/src/kernel/blocks/dbdih/debug/ndbd_sysfile_reader.exe
storage/ndb/src/kernel/blocks/dblqh/debug/ndbd_redo_log_reader.exe
storage/ndb/src/kernel/blocks/debug/ndb_print_file.exe
storage/ndb/src/kernel/debug/ndbd.exe
storage/ndb/src/mgmclient/debug/ndb_mgm.exe
storage/ndb/src/mgmsrv/debug/ndb_mgmd.exe
storage/ndb/tools/debug/ndb_config.exe
storage/ndb/tools/debug/ndb_delete_all.exe
storage/ndb/tools/debug/ndb_desc.exe
storage/ndb/tools/debug/ndb_drop_index.exe
storage/ndb/tools/debug/ndb_drop_table.exe
storage/ndb/tools/debug/ndb_restore.exe
storage/ndb/tools/debug/ndb_select_all.exe
storage/ndb/tools/debug/ndb_select_count.exe
storage/ndb/tools/debug/ndb_show_tables.exe
storage/ndb/tools/debug/ndb_test_platform.exe
storage/ndb/tools/debug/ndb_waiter.exe
storage/ndb/src/ndbapi/debug/ndbapi.lib

S T wrote:
> Jack,
>  
> I download the software from http://dev.mysql.com/downloads/cluster/7.0.html in a windows msi. If you could kindly provide me the updated exe's, I can update my installed folders and give the setting a try. I really appreciate you working on this to get these bugs resolved.
>  
> Thank you,
>  
> Sajjad
>  
>  > Date: Tue, 14 Jul 2009 20:43:38 +0200
>  > From: jack@sun.com
>  > Subject: Re: #45733: Cluster with more than 4 storage node
>  > To: st1980@hotmail.com
>  >
>  > hi sajjad,
>  >
>  > i think we've fixed the first bug you reported and i suspect it may
>  > have fixed the second one you filed.
>  >
>  > tomorrow, we will push the fix into the main branch. let me know
>  > how you would prefer to receive the fix. did you get a branch from
>  > launchpad? in that case, you just need to bzr pull. did you download
>  > exe's? in that case, we will provide exes. i'm not familiar with
>  > all the ways we deliver (and you can grab) our software, so let me
>  > know.
>  >
>  >
>  > ta, jack
>  >
>
> ------------------------------------------------------------------------
> Windows Live™: Keep your life in sync. Check it out. <http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009>
[16 Jul 2009 19:25] Magnus Blåudd
Pushed to 7.0 and 7.1
[17 Jul 2009 9:17] Jon Stephens
Documented bugfix in the NDB-7.0.7 changelog as follows:

        On Windows, the internal basestring_vsprintf() function did not 
        return a POSIX-compliant value as expected, causing the 
        management server to crash when trying to start a MySQL Cluster 
        with more than 4 data nodes.