MySQL Bugs: #57105: server crashes when reading relay log

Bug #57105	server crashes when reading relay log
Submitted:	29 Sep 2010 16:03	Modified:	14 Dec 2010 17:25
Reporter:	Arnaud Gadal	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.1.22, 5.1.50, 5.1.51	OS:	Linux (centos, debian)
Assigned to:		CPU Architecture:	Any
Tags:	binlog, corruption, relay log, replication

Description:
Hi,

We frequently hit some random crashes on slaves. When a crash occurs, every slaves are concerned. It may be a binlog corruption althrough the binlog is still readable by mysqlbinlog, no weird content neither.

We use MIXED mode.

Perhaps is it similar to http://bugs.mysql.com/bug.php?id=38718 ?

On this case the architecture looks like :
We have a master-> a master/slave -> and then the slaves (the ones which crash randomly). Maybe a corruption took place between the master and the master/slave, then replicated on the slaves => segfault.

I can reproduced the behavior on every version/OS I've tested : 5.1.22 (centos/debian), 5.1.50 (centos/debian), 5.1.51 (debian).

I'm going to attach the relay log.

The crash log : (from debian / 5.1.51)

100929 11:53:37 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.51-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Server (GPL)
100929 11:54:00 [Note] Slave SQL thread initialized, starting replication in log '.015962' at position 40064871, relay log '/data/mysql/mysql123-relay-bin.000002' position: 353
100929 11:54:00 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=2147483648
read_buffer_size=1048576
max_used_connections=1
max_threads=600
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3332162 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0xfe75f60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x40692108 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2e)[0x8b413e]
/usr/local/mysql/bin/mysqld(handle_segfault+0x322)[0x5e36a2]
/lib/libpthread.so.0[0x7fa8619e8a80]
/usr/local/mysql/bin/mysqld(_Z10unpack_rowPK14Relay_log_infoP8st_tablejPKhPK9st_bitmapPS5_Pmbb+0x1c3)[0x6af003]
/usr/local/mysql/bin/mysqld(_ZN14Rows_log_event9write_rowEPK14Relay_log_infob+0x103)[0x6a59d3]
/usr/local/mysql/bin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEPK14Relay_log_info+0x20)[0x6a5d30]
/usr/local/mysql/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x1b5)[0x6a63e5]
/usr/local/mysql/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0xed)[0x71afcd]
/usr/local/mysql/bin/mysqld(handle_slave_sql+0x828)[0x71be88]
/lib/libpthread.so.0[0x7fa8619e0fc7]
/lib/libc.so.6(clone+0x6d)[0x7fa860e7f64d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil) is an invalid pointer
thd->thread_id=2
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
100929 11:54:01 mysqld_safe Number of processes running now: 0
100929 11:54:01 mysqld_safe mysqld restarted
100929 11:54:01 [Warning] '--log_slow_queries' is deprecated and will be removed in a future release. Please use ''--slow_query_log'/'--slow_query_log_file'' instead.
100929 11:54:01 [Warning] '--safe-show-database' is deprecated and will be removed in a future release. Please use 'GRANT SHOW DATABASES' instead.
100929 11:54:01 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead.
100929 11:54:01 [Warning] '--default-collation' is deprecated and will be removed in a future release. Please use '--collation-server' instead.
100929 11:54:01 [Note] Plugin 'FEDERATED' is disabled.
100929 11:54:01 [Warning] option 'innodb-autoextend-increment': unsigned value 268435456 adjusted to 1000
100929 11:54:02  InnoDB: Started; log sequence number 0 64064
100929 11:54:02 [Note] Recovering after a crash using /data/logs/mysql123-binlog
100929 11:54:02 [Note] Starting crash recovery...
100929 11:54:02 [Note] Crash recovery finished.
100929 11:54:02 [Note] Event Scheduler: Loaded 0 events
100929 11:54:02 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.51-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Server (GPL)

Here is the general_log related to the crash :
                       
mysql123:/data/mysql# more mysql123.log.crash
/usr/local/mysql/bin/mysqld, Version: 5.1.51-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
100929 11:01:53     1 Query     CHANGE MASTER TO master_host='myhost', master_user='myuser', master_password='mypwd', MASTER_LOG_FILE='xxxxx', MASTER_LOG_POS=39890036
100929 11:02:04     1 Query     start slave
                    2 Connect Out       myuser@myhost:3306
                    3 Query     DROP TABLE IF EXISTS mytable_cat_tmp
                    3 Query     CREATE TABLE mytable_cat_tmp LIKE mytable_cat
                    3 Query     UPDATE mytableid SET NEXT_ID = LAST_INSERT_ID(SEQUENCED_ID) WHERE NEXT_ID=0 limit 1
                    3 Query     COMMIT /* implicit, from Xid_log_event */
                    
=>>>>>>>>>>>>>>>>>>>>CRASH HERE.

How to repeat:
To crash the slave I only need to execute a "CHANGE MASTER TO" near the crash zone I've studied, then START SLAVE and it crashes. The relay log looks like :

--------
# at 242
#100927  1:06:45 server id 1000  end_log_pos 39890147   Query   thread_id=66304839      exec_time=41    error_code=0
use mydb/*!*/;
SET TIMESTAMP=1285542405/*!*/;
SET @@session.pseudo_thread_id=66304839/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=192/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
DROP TABLE IF EXISTS mytable_cat_tmp
/*!*/;
# at 353
#100927  1:06:45 server id 1000  end_log_pos 39890274   Query   thread_id=66304839      exec_time=41    error_code=0
SET TIMESTAMP=1285542405/*!*/;
CREATE TABLE mytable_cat_tmp LIKE mytable_cat
/*!*/;
# at 480
# at 548
# at 1587
# at 2626
# at 3665
...
# at 170944
# at 171983
# at 173022
# at 174061
#100927  1:07:26 server id 10060  end_log_pos 40064035  Query   thread_id=89928443      exec_time=0     error_code=0
SET TIMESTAMP=1285542446/*!*/;
UPDATE mytable SET NEXT_ID = LAST_INSERT_ID(SEQUENCED_ID) WHERE NEXT_ID=0 limit 1
/*!*/;
# at 174241
#100927  1:07:26 server id 10060  end_log_pos 40064062  Xid = 689329798
COMMIT/*!*/;  <<<<<<<<<<<<<<<<<================================ CRASH HERE
# at 174268
#100927  1:06:55 server id 1000  end_log_pos 40064871   Write_rows: table id 23794 flags: STMT_END_F

BINLOG '
D9KfTBP...
---------

I can crash the server merely by pointing its master (as described before) or locally by editing manually the relay-log.info + START SLAVE SQL_THREAD in attempt to replay the relay-log at a precise moment (# at 242 for example).

Let me know if you need anything else.

Best Regards,
Arnaud

Suggested fix:
No idea, if a binlog corruption is confirmed, I'll setup replication on SSL.

The problem was fixed with an upgrade from 5.1.22 to 5.1.51 on the M/S server.

=> No more segfault
=> The M/S server doesn't write corrupted binlog any more => slave doesn't crash (that crash problem is fixed since (http://bugs.mysql.com/bug.php?id=38718 is closed I think).

Regards,
Arnaud