Bug #10865 Slave Cluster Mysqld cores under heavy load
Submitted: 25 May 2005 19:13 Modified: 31 May 2005 21:04
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.1.0 OS:Linux (Linux)
Assigned to: Mats Kindahl CPU Architecture:Any

[25 May 2005 19:13] Jonathan Miller
Description:
I had setup a master cluster of 3 computers (ndb08, ndb09, ndb10) with the following configuration:  (Note:  There are two MySQLD servers master1, master2)

[DB DEFAULT]
NoOfReplicas: 1
IndexMemory: 100M
DataMemory: 300M
BackupMemory: 64M
MaxNoOfConcurrentScans: 100
DataDir: .
FileSystemPath: /space/autotest/run

[MGM DEFAULT]
PortNumber: 14000
ArbitrationRank: 1
DataDir: .
[ndb_mgmd]
Id: 1
HostName: ndb08

[ndbd]
Id: 2
HostName: ndb08

[ndbd]
Id: 3
HostName: ndb09

[ndbd]
Id: 4
HostName: ndb08

[ndbd]
Id: 5
HostName: ndb09

[api]
Id: 6
HostName: ndb08

[api]
Id: 7
HostName: ndb08

[api]
Id: 8
HostName: ndb08

[api]
Id: 9
HostName: ndb09

[api]
Id: 10
HostName: ndb09

[mysqld]
Id: 11
HostName: ndb10

[mysqld]
Id: 12
HostName: ndb10

Mysqld start commands;
 ./mysqld_safe --server-id=2 --log-bin=/home/ndbdev/jmiller/builds/var/c2/master2 --log=/home/ndbdev/jmiller/builds/var/c2/master2.log --log-error=/home/ndbdev/jmiller/builds/var/c1/master2.err --socket=/tmp/mysql.sock2 --port=3307 --pid-file=/home/ndbdev/jmiller/builds/var/c2/hostname.pid2 --datadir=/home/ndbdev/jmiller/builds/var/c2 --language=/home/ndbdev/jmiller/builds/share/mysql/english --user=root --ndbcluster --ndb-connectstring=ndb08:14000 &
[1] 10704
[ndbdev@ndb10 bin]$ Starting mysqld daemon with databases from /home/ndbdev/jmiller/builds/var/c2

&

./mysqld_safe --server-id=1 --log-bin=/home/ndbdev/jmiller/builds/var/c1/master1 --log=/home/ndbdev/jmiller/builds/var/c1/master1.log --log-error=/home/ndbdev/jmiller/builds/var/c1/master1.err --socket=/tmp/mysql.sock1 --port=3308 --pid-file=/home/ndbdev/jmiller/builds/var/c1/hostname.pid1 --datadir=/home/ndbdev/jmiller/builds/var/c1 --language=/home/ndbdev/jmiller/builds/share/mysql/english --user=root --ndbcluster --ndb-connectstring=ndb08:14000 &
[2] 10770
[ndbdev@ndb10 bin]$ Starting mysqld daemon with databases from /home/ndbdev/jmiller/builds/var/c1

I then have a sever (NDB11) running as a Slave Cluster under the following configuration:

[DB DEFAULT]
NoOfReplicas: 1
IndexMemory: 100M
DataMemory: 300M
BackupMemory: 64M
MaxNoOfConcurrentScans: 100
DataDir: .
FileSystemPath: /home/ndbdev/jmiller/builds/run

[MGM DEFAULT]
PortNumber: 14000
ArbitrationRank: 1
DataDir: .
[ndb_mgmd]
Id: 1
HostName: ndb11

[ndbd]
Id: 2
HostName: ndb11

[ndbd]
Id: 3
HostName: ndb11

[mysqld]
Id: 4
HostName: ndb11

The slave clusters my.cnf 

[mysqld]
server-id=3
master-user=rep
master-connect-retry=1
master-host=ndb10
master-password=test
master-port=3308
ndbcluster
ndb-connectstring=ndb11:14000  # location of MGM node

I started out running the replication_sample.txt up to where you switch the slave to point to master2. started replicating again. I then created 2 database BANK and BANK2 with all the tables. I started to sets of the "Bank" test running, 1 against BANK and the other against BANK2. Check both masters to ensure that updates where being logged. Then checked the slave to ensure that replication was going okay. I then continued with the replication_sample.txt script. Got to a point in the script where you do a select on the slave, and the mysqld had already cored and restarted, but was not connected to the cluster and not replicating. If a did a mysqladmin -u root shutdown and then restarted the mysqld --ndbcluster --skip-slave-start then as soon as I do START SLAVE; it will core.

Core file locatation: ndb.mysql.com:/home/jonathan/core3133 .

Stack and Stack trace saved to same location named core3133.txt and core3133st.txt .

core3133.txt
0x816cfcc
0x4005e5cd
0x40178a31
0x815723b
0x81e0309
0x81df8d5
0x81def21
0x824aff2
0x824919b
0x400586de
0x401d86c7

core3133st.txt

0x816cfcc handle_segfault + 392
0x4005e5cd _end + 934598969
0x40178a31 _end + 935755165
0x815723b _ZN12Field_string6unpackEPcPKc + 63
0x81e0309 _Z10unpack_rowP3THDP8st_tablePcPKcRK9bitvector + 149
0x81df8d5 _ZN20Write_rows_log_event14do_prepare_rowEP3THDP8st_tablePKc + 37
0x81def21 _ZN14Rows_log_event10exec_eventEP17st_relay_log_info + 609
0x824aff2 _Z20exec_relay_log_eventP3THDP17st_relay_log_info + 578
0x824919b handle_slave_sql + 1015
0x400586de _end + 934574666
0x401d86c7 _end + 936147507

How to repeat:
Not 100% sure. NDB11 is still in this state currently
[25 May 2005 19:19] Jonathan Miller
.
[28 May 2005 2:40] Jonathan Miller
Email from Mats

This is very much look-alike a bug I just fixed (yesterday). Could you get the latest bunch of changesets and try again?

Best wishes,
Mats Kindahl
------------------------------------------------------------------------------------------------
completed another cycle of test today. All tests passed correctly. This issue was not seen.
JBM
[31 May 2005 21:04] Jonathan Miller
Thought I had closed this already, but still shows to be in V state.