Bug #31748 Master crashes during execution of rpl_ndb_basic test case
Submitted: 22 Oct 2007 11:24 Modified: 3 Nov 2008 9:50
Reporter: Serge Kozlov Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:5.1.23-beta OS:Any
Assigned to: Serge Kozlov CPU Architecture:Any

[22 Oct 2007 11:24] Serge Kozlov
Description:
Slave 5.1.22 is crashed during performing of rpl_ndb_basic test case if master is 5.1-telco-6.1

How to repeat:
0. Install both versions so the installations have same upper directory (<builds>).
1. Install mysql 5.1.22 (use mysql-5.1-new-rpl)
2. Install mysql 5.1-telco-6.1
3. Install attached scripts (they replace slave cluster binaries on-the-fly for test cases) into <5.1.22> directory.
4. Go <5.1.22>/test_tools/scripts/versional/
5. Start test: 
./ver-test.pl --builds-dir=<builds> --slave-version=5.1.23-beta-log --master-version=5.1.15-ndb-6.1.21-log --mysqld=--binlog-format=row rpl_ndb_basic
[22 Oct 2007 11:25] Serge Kozlov
script

Attachment: ver-test.tar.gz (application/x-gzip, text), 3.32 KiB.

[22 Oct 2007 12:15] Serge Kozlov
trace:

#0  0x002cd402 in __kernel_vsyscall ()
#1  0x0046164f in ?? ()
#2  0x0000000b in ?? ()
#3  0xb2826000 in ?? ()
#4  0xb2971f28 in ?? ()
#5  0x0835ee7b in init_relay_log_info (rli=0xb2972bb0,
    info_fname=0xb <Address 0xb out of bounds>) at rpl_rli.cc:287
#6  0x0835ee7b in init_relay_log_info (rli=0xb,
    info_fname=0x1 <Address 0x1 out of bounds>) at rpl_rli.cc:287
#7  0x08221250 in __static_initialization_and_destruction_0 (__initialize_p=Vari
able "__initialize_p" is not available.
)
    at set_var.h:54
#8  <signal handler called>
#9  0x00357477 in ?? ()
#10 0xb28057f0 in ?? ()
#11 0x081f4c25 in Field_decimal::store (this=0xb28057f0,
    from=0xb280576d "YZ1\004ABC1", len=2994735629, cs=0xb29722c8)
    at field.cc:1738
#12 0x082e9ed9 in filesort (thd=0xa001da0, table=0xb2802890, sortorder=0x3,
    s_length=2994755432, select=0xb2800a08, max_rows=7074444049006293549,
    sort_positions=72, examined_rows=0xb2802910) at field.h:197
#13 0x082e8a50 in merge_buffers (param=0xb28004e8, from_file=0x9ff3448,
    to_file=0xa001da0, sort_buffer=0xb2802890 "@/\200²\210G\200²",
    lastbuff=0xb2800a08, Fb=0xb297236c, Tb=0x1, flag=111) at filesort.cc:1041
#14 0x082e8cc4 in merge_buffers (param=0xb2800588, from_file=0xb28004e8,
    to_file=0xa001da0,
    sort_buffer=0x471c91b4 <Address 0x471c91b4 out of bounds>,
    lastbuff=0x198a1, Fb=0xa002ae4, Tb=0xb2972458, flag=137677533)
    at filesort.cc:1036
#15 0x082e9a91 in filesort (thd=0xb28004e8, table=0xa001da0,
    sortorder=0x9ff51f0, s_length=167783660, select=0xb2972414, max_rows=0,
    sort_positions=72, examined_rows=0xa002c24) at filesort.cc:936
#16 0x0834cadd in field_real::avg (this=0xa000b90, s=0xa002ae4,
    rows=4611542927296954369) at sql_analyse.h:207
#17 0x0045ebd4 in ?? ()
#18 0x0a000b90 in ?? ()
#19 0xb2972490 in ?? ()
#20 0xb2972490 in ?? ()
#21 0xb2972490 in ?? ()
#22 0xb2972490 in ?? ()
#23 0x00000000 in ?? ()
(gdb)
[23 Oct 2007 15:31] Serge Kozlov
I retested with using -debug version as Slave. Now there are no crashes and slave.err has following:

071023 18:20:41 [ERROR] /xxxxxxxx/builds/5.1-new-rpl/libexec/mysqld: Out of
 memory at line 5793, 'log_event.cc'
071023 18:20:41 [ERROR] /xxxxxxxx/builds/5.1-new-rpl/libexec/mysqld: needed
 4294967229 byte (0k), memory in use: 1894753 bytes (1851k)
071023 18:20:41 [ERROR] Error in Log_event::read_log_event(): 'Found invalid eve
nt in binary log', data_len: 58, event_type: 20
071023 18:20:41 [ERROR] Error reading relay log event: slave SQL thread aborted
because of I/O error
...
071023 18:20:41 [ERROR] Slave SQL: Relay log read failure: Could not parse relay
 log event entry. The possible reasons are: the master's binary log is corrupted
 (you can check this by running 'mysqlbinlog' on the binary log), the slave's re
lay log is corrupted (you can check this by running 'mysqlbinlog' on the relay l
og), a network problem, or a bug in the master's or slave's MySQL code. If you w
ant to check the master's binary log or slave's relay log, you will be able to k
now their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1593
071023 18:20:41 [ERROR] Slave (additional info): Out of memory at line 5793, 'lo
g_event.cc' Error_code: 5
071023 18:20:41 [Warning] Slave: Out of memory at line 5793, 'log_event.cc' Erro
r_code: 5
071023 18:20:41 [Warning] Slave: needed 4294967229 byte (0k), memory in use: 189
4753 bytes (1851k) Error_code: 5
071023 18:20:41 [ERROR] Error running query, slave SQL thread aborted. Fix the p
roblem, and restart the slave SQL thread with "SLAVE START". We stopped at log '
master-bin.000001' position 343
NDB: Found 2 NdbTransaction's that have not been released
NDB: Found 1 NdbReceiver that has not been released
NDB: Found 3 NdbTransaction's that have not been released
NDB: Found 1 NdbReceiver that has not been released
[26 Nov 2007 15:28] Lars Thalmann
Mats, actually not sure what state to put it in.  Please check.
[3 Nov 2008 9:50] Serge Kozlov
Can't repeat with 5.1-rpl and 5.1-telco-6.2. Probably fixed.