MySQL Bugs: #37093: mysqld got signal 11 when slave sql_thread reads certain relay log entries

Bug #37093	mysqld got signal 11 when slave sql_thread reads certain relay log entries
Submitted:	30 May 2008 8:11	Modified:	11 Jun 2008 12:08
Reporter:	Rickard Gunnarsson	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.1.24	OS:	Linux (Ubuntu Server 6.06)
Assigned to:		CPU Architecture:	Any

Description:
When upgrading from 5.0.45 to 5.1.24 (also tried a build of 5.1.26 with same result) replication fails when the slave sql_thread is started. When starting the thread with "slave start sql_thread;" the relay log is being processed but then stops when mysqld segfault. This is the output of the error log:

080530  8:59:44 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000725' at position 450859142, relay log './mysqld-relay-bin.000002' position: 152163
/usr/local/mysql/bin/mysqld(print_stacktrace+0x1c)[0x82d0040]
/usr/local/mysql/bin/mysqld(handle_segfault+0x1cb)[0x81c84f3]
[0xffffe420]
/usr/local/mysql/bin/mysqld(_ZN3THD27restore_sub_statement_stateEP19Sub_statement_state+0xb4)[0x81be818]
/usr/local/mysql/bin/mysqld(_ZN19Table_triggers_list16process_triggersEP3THD14trg_event_type20trg_action_time_typeb+0xe5)[0x82ee461]
/usr/local/mysql/bin/mysqld(_Z12write_recordP3THDP8st_tableP12st_copy_info+0x431)[0x8232c8d]
/usr/local/mysql/bin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xade)[0x823231a]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1c36)[0x81d57ce]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0xed)[0x81d9e89]
/usr/local/mysql/bin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x3ac)[0x8260904]
/usr/local/mysql/bin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_info+0x18)[0x8260554]
/usr/local/mysql/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_infob+0x11e)[0x82c29ea]
/usr/local/mysql/bin/mysqld[0x82c4e34]
/usr/local/mysql/bin/mysqld(handle_slave_sql+0x47e)[0x82c37fe]
/lib/tls/i686/cmov/libpthread.so.0[0xb7f17341]
/lib/tls/i686/cmov/libc.so.6(__clone+0x5e)[0xb7e444ee]
080530  8:59:44 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=3
max_threads=200
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 452839 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x95ad090
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x9828154 = INSERT IGNORE INTO tbl_week_ticks SET i_unit_id = 796,i_customer_id = 1074,d_lat = 123,d_lon = 123,d_velocity = 0,d_heading = 0,b_validpos = 0,i_sensor_input = 0,i_socket_id = 11,dt_timestamp = '2008-05-28 22:22:29',i_distance = 717930
thd->thread_id=1842
thd->killed=NOT_KILLED

Restarting the slave makes mysqld to die immediately, without continue processing the relay log. I therefore suspected that the statement itself would cause the crash... 

Looking at the relay log at the position where the crash occurred, I'm able to execute the statements in the log manually. I tried changing the relay_log_pos with "change master to relay_log_pos=xxx;" to set the relay log pos to the position just *after* the failed position. When starting the slave again, the processing of the relay log continues, only to die again a bit further down the log. Again at a similar "insert ignore into..." statement.
Important to notice is that between these two positions in the relay log there are several insert ignore into statements that regards other tables.

This is the table layout for the affected table:

mysql> show create table tbl_week_ticks\G
*************************** 1. row ***************************
       Table: tbl_week_ticks
Create Table: CREATE TABLE `tbl_week_ticks` (
  `i_id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `i_unit_id` int(11) DEFAULT '0',
  `i_customer_id` smallint(5) unsigned DEFAULT '0',
  `d_lat` double DEFAULT '0',
  `d_lon` double DEFAULT '0',
  `d_velocity` double DEFAULT '0',
  `d_heading` double DEFAULT '0',
  `b_validpos` tinyint(1) DEFAULT '0',
  `dt_timestamp` datetime DEFAULT '0000-00-00 00:00:00',
  `i_sensor_input` smallint(6) DEFAULT NULL,
  `i_socket_id` int(11) DEFAULT '0',
  `i_distance` int(10) unsigned NOT NULL DEFAULT '0',
  `i_sats_used` tinyint(1) unsigned DEFAULT '0',
  `d_altitude` double DEFAULT '0',
  PRIMARY KEY (`i_id`),
  KEY `Unit` (`i_unit_id`,`dt_timestamp`),
  KEY `Customer` (`i_customer_id`,`dt_timestamp`),
  KEY `Index_timestamp` (`dt_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=108851865 DEFAULT CHARSET=latin1 COMMENT='Temporary; InnoDB free: 7102464 kB; InnoDB free: 94208 kB'
1 row in set (0.00 sec)
 

Downgrading again to 5.0.45 and everything works fine.

The master server is still 5.0.45.

How to repeat:
Upgrade 5.0.45 to 5.1.24 or 5.1.26
Start replication from a 5.0.45 server.

Thank you for the report.

Please provide us your relay and/or binary log file[s].

I found out that what's causing this behavior is when there's a trigger on the table being updated. Removing the trigger completely will make this problem disappear. 
Trying drop trigger and re-create the trigger doesn't solve the problem.

Thank you for the report.

Could you please provide definition of trigger?

CREATE TRIGGER timediff AFTER INSERT ON tbl_week_ticks
FOR EACH ROW BEGIN
	IF ABS((unix_timestamp(now())-unix_timestamp(NEW.dt_timestamp))) > 20 THEN  
		INSERT INTO tbl_delayed_ticks SET i_id=NEW.i_id, i_timediff=(unix_timestamp(now())-unix_timestamp(NEW.dt_timestamp));
	END IF;
END;

Thank you for the feedback. Please provide output of SHOW CREATE TABLE tbl_delayed_ticks also.

mysql> show create table tbl_delayed_ticks\G
*************************** 1. row ***************************
       Table: tbl_delayed_ticks
Create Table: CREATE TABLE `tbl_delayed_ticks` (
  `i_id` int(10) unsigned NOT NULL DEFAULT '0',
  `i_timediff` int(11) DEFAULT '0',
  PRIMARY KEY (`i_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Thank you for the feedback.

I can not repeat the problem in test environment. Please provide configuration file also and describe your hardware (CPU, RAM).

This is the my.cnf file, and the output of uname -a, cat /proc/cpuinfo and cat /proc/meminfo. Please let me know if there's anything else you need.

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr/local/mysql
datadir         = /var/lib/mysql
tmpdir          = /tmp
language        = /usr/share/mysql/english
skip-external-locking
old_passwords   = 1
key_buffer              = 16M
max_allowed_packet      = 16M
thread_stack            = 128K
max_connections         = 200
query_cache_limit       = 1048576
query_cache_size        = 16777216
query_cache_type        = 1
server-id               = 1
log-bin                 = mysql-bin
expire-logs-days        = 7
binlog-do-db            = db_admin
binlog-do-db            = db_ticks
auto_increment_increment = 2
auto_increment_offset = 1
report-host=x.y.z.w
innodb_buffer_pool_size=500M
[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]

[isamchk]
key_buffer              = 16M

root@xxxxxxxx:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6320  @ 1.86GHz
stepping        : 6
cpu MHz         : 1862.063
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 255
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 3727.73

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6320  @ 1.86GHz
stepping        : 6
cpu MHz         : 1862.063
cache size      : 4096 KB
physical id     : 1
siblings        : 1
core id         : 255
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 3724.07

root@xxxxxxx:~# uname -a
Linux xxxxxx 2.6.15-26-server #1 SMP Fri Sep 8 21:00:37 UTC 2006 i686 GNU/Linux

root@xxxxxxxx:~# cat /proc/meminfo
MemTotal:      2074944 kB
MemFree:        101020 kB
Buffers:         74848 kB
Cached:        1209832 kB
SwapCached:       1200 kB
Active:         877900 kB
Inactive:      1039192 kB
HighTotal:     1178496 kB
HighFree:        18952 kB
LowTotal:       896448 kB
LowFree:         82068 kB
SwapTotal:     5960072 kB
SwapFree:      5950148 kB
Dirty:            1836 kB
Writeback:           4 kB
Mapped:         641364 kB
Slab:            42864 kB
CommitLimit:   6997544 kB
Committed_AS:   791364 kB
PageTables:       2184 kB
VmallocTotal:   118776 kB
VmallocUsed:      5392 kB
VmallocChunk:   112796 kB

i tried to repeat a crash, and also failed to do so.
now, suspect this bug may be caused by the fix for bug #33029
just guess it's this code crashing:

function THD::reset_sub_statement_state

if (rpl_master_erroneous_autoinc(this))
  {
    backup->auto_inc_intervals_forced= auto_inc_intervals_forced;
    auto_inc_intervals_forced.empty(); <---- this
  }

btw, i just opened bug #37311 while i was trying to reproduce this crash.

looks like bug #36443

marking as a duplicate of bug #36443