Bug #31646 rpl_ndb_extraColMaster with row-based replication on sparc: slave crash
Submitted: 16 Oct 2007 18:34 Modified: 5 Mar 2008 14:47
Reporter: Timothy Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:5.2 OS:Solaris (sol10-sparc-a)
Assigned to: Mats Kindahl CPU Architecture:Any
Tags: pbfail

[16 Oct 2007 18:34] Timothy Smith
Description:
mysqltest: In included file "./extra/rpl_tests/rpl_extraMaster_Col.test": At line 407: failed in 'select master_pos_wait('master-bin.000001', 10449)': 2006: MySQL server has gone away

The result from queries just before the failure was:
< snip >
/* f27 CHAR(255),  */         'three',
/* f28 DECIMAL(30,7),  */     3.333,
/* f29 DOUBLE,  */            3.333333333333333333333333333,
/* f30 ENUM ('a','b','c')*/   'c',
/* f31 FLOAT,  */             3.0,
/* f32 NUMERIC(17,9),  */     3.3333,
/* f33 SET ('a','b','c'),*/   'c',
/*f34 VARBINARY(1025),*/      '3333 minus 3',
/*f35 VARCHAR(257),*/         NULL
);
Warnings:
Warning	1264	Out of range value for column 'f5' at row 1
Warning	1264	Out of range value for column 'f24' at row 1
Warning	1264	Out of range value for column 'f5' at row 2
Warning	1264	Out of range value for column 'f24' at row 2
Warning	1264	Out of range value for column 'f24' at row 3

** Sync slave with master ** 
** Do selects from tables **

The following appears in the slave.err log file:
...
071016 15:06:05 [Note] NDB Binlog: logging ./test/t3
071016 15:06:07 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t4
071016 15:06:07 [Note] NDB Binlog: logging ./test/t4
071016 15:06:12 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t31
071016 15:06:12 [Note] NDB Binlog: logging ./test/t31
071016 15:06:14 [Note] NDB Binlog: RENAME Event: REPL$test/t2
071016 15:06:14 [Note] NDB Binlog: logging ./test/t2
071016 15:06:16 [Note] NDB Binlog: drop table ./test/#sql2-109e-2.
071016 15:06:18 [Note] NDB Binlog: RENAME Event: REPL$test/t3
071016 15:06:18 [Note] NDB Binlog: logging ./test/t3
071016 15:06:20 [Note] NDB Binlog: drop table ./test/#sql2-109e-2.
071016 15:06:21 [Note] NDB Binlog: RENAME Event: REPL$test/t4
071016 15:06:21 [Note] NDB Binlog: logging ./test/t4
071016 15:06:23 [Note] NDB Binlog: drop table ./test/#sql2-109e-2.
071016 15:06:25 [Note] NDB Binlog: RENAME Event: REPL$test/t31
071016 15:06:25 [Note] NDB Binlog: logging ./test/t31
071016 15:06:29 [Note] NDB Binlog: drop table ./test/#sql2-109e-2.
071016 15:06:29 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=151
threads_connected=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59859 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Writing a core file

How to repeat:
Pushbuild shows this failure occasionally since mid September 2007, in 5.2 trees on sol10-sparc-a platform only:

https://intranet.mysql.com/secure/pushbuild/xref.pl?startdate=&enddate=&dir=&plat=&testtyp...
[23 Oct 2007 8:50] Magnus Blåudd
$>dbx ../sql/mysqld var/slave-data/core 
(dbx) threads
<snip>
o    t@23  b l@23   handle_slave_sql()   signal SIGSEGV in  _lwp_kill() 
(dbx) thread t@23
t@23 (l@23) stopped in _lwp_kill at 0xfeebd5ec
0xfeebd5ec: _lwp_kill+0x0008:   bcc,a,pt  %icc,_lwp_kill+0x18   ! 0xfeebd5fc
(dbx) where
current thread: t@23
=>[1] _lwp_kill(0x0, 0xb, 0xfeee8bc0, 0xfed45000, 0xfeee83a8, 0xb), at 0xfeebd5ec 
  [2] write_core(0xb, 0x50, 0xb, 0x6a2c, 0x1a8b, 0x7b1af8), at 0x315204 
  [3] handle_segfault(0xb, 0x97, 0x5fff8, 0x57c774, 0x7b1af8, 0xd5860), at 0x1f3be4 
  [4] __sighndlr(0xb, 0x0, 0xfeadf7e0, 0x1f39e0, 0x0, 0x0), at 0xfeebc52c 
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [5] Field_blob::get_length(0xfeadfb60, 0x7203350a, 0x1, 0x7a1400, 0x647da4, 0x4), at 0x1d9bd4 
  [6] table_def::calc_field_size(0x9664a0, 0x16, 0x7203350a, 0x966478, 0xfeadfb60, 0x4), at 0x3104f8 
  [7] unpack_row(0xc49218, 0x10c3080, 0x3, 0xc3f664, 0xfd8f4c, 0xfd8fb0), at 0x2a9ec4 
  [8] Rows_log_event::write_row(0xfd8f10, 0xc991f8, 0x1, 0x1, 0x0, 0xf72c88), at 0x2a738c 
  [9] Write_rows_log_event::do_exec_row(0xfd8f10, 0xc991f8, 0xf69848, 0xf69848, 0xf6a5c8, 0x1), at 0x2a76b0 
  [10] Rows_log_event::do_apply_event(0xfd8f10, 0xc991f8, 0x5895f7, 0xf69848, 0x10c3080, 0x0), at 0x2a600c 
  [11] exec_relay_log_event(0xf69848, 0xc991f8, 0x0, 0x0, 0x1400, 0x155c), at 0x308e28 
  [12] handle_slave_sql(0x58f9e2, 0x7b1d08, 0x1, 0xc9a080, 0xc991f8, 0x1400), at 0x309cfc 
(dbx)
[24 Oct 2007 11:59] Magnus Blåudd
$> cat rpl_ndb_bug31646.test
-- source include/have_ndb.inc
-- source include/ndb_master-slave.inc
-- source include/have_binlog_format_row.inc

connection master;
CREATE TABLE t1 (f1 INT, f2 INT, f3 INT PRIMARY KEY, f4 CHAR(20),
                 /* extra columns only available on master*/
                 f5 FLOAT DEFAULT '2.00',
                 f6 CHAR(4) DEFAULT 'TEST',
                 f7 INT DEFAULT '0',
                 f8 TEXT,
                 f9 LONGBLOB,
                 f10 BIT(63),
                 f11 VARBINARY(64))ENGINE=NDB;
sync_slave_with_master;

# Drop columns f5-f11 from the table on _slave_
alter table t1 drop f5, drop f6, drop f7, drop f8, drop f9, drop f10, drop f11;

connection master;
INSERT INTO t1 VALUES (1, 1, 1, 'second', 2.0, 'kaks', 2,
                       'got stolen from the paradise',
                       'very fat blob', b'01010101010101',
                       0x123456);

sync_slave_with_master; # <= Crash occurs on slave mysqld

$> ./mtr rpl_ndb_bug31646 --mysqld=--binlog-format=row
<snip>
rpl_ndb.rpl_ndb_bug31646       [ fail ]

mysqltest: At line 26: failed in 'select master_pos_wait('master-bin.000001', 850)': 2013: Lost connection to MySQL server during query

analyzing the core gives same backtrace as earlier
[24 Oct 2007 12:00] Magnus Blåudd
mysqltest test file used to repeat bug

Attachment: rpl_ndb_bug31646.test (application/octet-stream, text), 957 bytes.

[6 Dec 2007 15:28] Jon Stephens
Please verify this bug's status and set to Documenting or Closed if it really has been fixed and pushed. Thanks!
[5 Mar 2008 14:47] Jon Stephens
Documented in the 5.1.23 and 6.0.4 changelogs as follows:

        Cluster replication with extra VARCHAR columns on the
        master caused the slave to fail.