Description:
Hi,
We frequently hit some random crashes on slaves. When a crash occurs, every slaves are concerned. It may be a binlog corruption althrough the binlog is still readable by mysqlbinlog, no weird content neither.
We use MIXED mode.
Perhaps is it similar to http://bugs.mysql.com/bug.php?id=38718 ?
On this case the architecture looks like :
We have a master-> a master/slave -> and then the slaves (the ones which crash randomly). Maybe a corruption took place between the master and the master/slave, then replicated on the slaves => segfault.
I can reproduced the behavior on every version/OS I've tested : 5.1.22 (centos/debian), 5.1.50 (centos/debian), 5.1.51 (debian).
I'm going to attach the relay log.
The crash log : (from debian / 5.1.51)
100929 11:53:37 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.51-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
100929 11:54:00 [Note] Slave SQL thread initialized, starting replication in log '.015962' at position 40064871, relay log '/data/mysql/mysql123-relay-bin.000002' position: 353
100929 11:54:00 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=2147483648
read_buffer_size=1048576
max_used_connections=1
max_threads=600
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3332162 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0xfe75f60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x40692108 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2e)[0x8b413e]
/usr/local/mysql/bin/mysqld(handle_segfault+0x322)[0x5e36a2]
/lib/libpthread.so.0[0x7fa8619e8a80]
/usr/local/mysql/bin/mysqld(_Z10unpack_rowPK14Relay_log_infoP8st_tablejPKhPK9st_bitmapPS5_Pmbb+0x1c3)[0x6af003]
/usr/local/mysql/bin/mysqld(_ZN14Rows_log_event9write_rowEPK14Relay_log_infob+0x103)[0x6a59d3]
/usr/local/mysql/bin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEPK14Relay_log_info+0x20)[0x6a5d30]
/usr/local/mysql/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x1b5)[0x6a63e5]
/usr/local/mysql/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0xed)[0x71afcd]
/usr/local/mysql/bin/mysqld(handle_slave_sql+0x828)[0x71be88]
/lib/libpthread.so.0[0x7fa8619e0fc7]
/lib/libc.so.6(clone+0x6d)[0x7fa860e7f64d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil) is an invalid pointer
thd->thread_id=2
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
100929 11:54:01 mysqld_safe Number of processes running now: 0
100929 11:54:01 mysqld_safe mysqld restarted
100929 11:54:01 [Warning] '--log_slow_queries' is deprecated and will be removed in a future release. Please use ''--slow_query_log'/'--slow_query_log_file'' instead.
100929 11:54:01 [Warning] '--safe-show-database' is deprecated and will be removed in a future release. Please use 'GRANT SHOW DATABASES' instead.
100929 11:54:01 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead.
100929 11:54:01 [Warning] '--default-collation' is deprecated and will be removed in a future release. Please use '--collation-server' instead.
100929 11:54:01 [Note] Plugin 'FEDERATED' is disabled.
100929 11:54:01 [Warning] option 'innodb-autoextend-increment': unsigned value 268435456 adjusted to 1000
100929 11:54:02 InnoDB: Started; log sequence number 0 64064
100929 11:54:02 [Note] Recovering after a crash using /data/logs/mysql123-binlog
100929 11:54:02 [Note] Starting crash recovery...
100929 11:54:02 [Note] Crash recovery finished.
100929 11:54:02 [Note] Event Scheduler: Loaded 0 events
100929 11:54:02 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.51-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
Here is the general_log related to the crash :
mysql123:/data/mysql# more mysql123.log.crash
/usr/local/mysql/bin/mysqld, Version: 5.1.51-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /tmp/mysql.sock
Time Id Command Argument
100929 11:01:53 1 Query CHANGE MASTER TO master_host='myhost', master_user='myuser', master_password='mypwd', MASTER_LOG_FILE='xxxxx', MASTER_LOG_POS=39890036
100929 11:02:04 1 Query start slave
2 Connect Out myuser@myhost:3306
3 Query DROP TABLE IF EXISTS mytable_cat_tmp
3 Query CREATE TABLE mytable_cat_tmp LIKE mytable_cat
3 Query UPDATE mytableid SET NEXT_ID = LAST_INSERT_ID(SEQUENCED_ID) WHERE NEXT_ID=0 limit 1
3 Query COMMIT /* implicit, from Xid_log_event */
=>>>>>>>>>>>>>>>>>>>>CRASH HERE.
How to repeat:
To crash the slave I only need to execute a "CHANGE MASTER TO" near the crash zone I've studied, then START SLAVE and it crashes. The relay log looks like :
--------
# at 242
#100927 1:06:45 server id 1000 end_log_pos 39890147 Query thread_id=66304839 exec_time=41 error_code=0
use mydb/*!*/;
SET TIMESTAMP=1285542405/*!*/;
SET @@session.pseudo_thread_id=66304839/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=192/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
DROP TABLE IF EXISTS mytable_cat_tmp
/*!*/;
# at 353
#100927 1:06:45 server id 1000 end_log_pos 39890274 Query thread_id=66304839 exec_time=41 error_code=0
SET TIMESTAMP=1285542405/*!*/;
CREATE TABLE mytable_cat_tmp LIKE mytable_cat
/*!*/;
# at 480
# at 548
# at 1587
# at 2626
# at 3665
...
# at 170944
# at 171983
# at 173022
# at 174061
#100927 1:07:26 server id 10060 end_log_pos 40064035 Query thread_id=89928443 exec_time=0 error_code=0
SET TIMESTAMP=1285542446/*!*/;
UPDATE mytable SET NEXT_ID = LAST_INSERT_ID(SEQUENCED_ID) WHERE NEXT_ID=0 limit 1
/*!*/;
# at 174241
#100927 1:07:26 server id 10060 end_log_pos 40064062 Xid = 689329798
COMMIT/*!*/; <<<<<<<<<<<<<<<<<================================ CRASH HERE
# at 174268
#100927 1:06:55 server id 1000 end_log_pos 40064871 Write_rows: table id 23794 flags: STMT_END_F
BINLOG '
D9KfTBP...
---------
I can crash the server merely by pointing its master (as described before) or locally by editing manually the relay-log.info + START SLAVE SQL_THREAD in attempt to replay the relay-log at a precise moment (# at 242 for example).
Let me know if you need anything else.
Best Regards,
Arnaud
Suggested fix:
No idea, if a binlog corruption is confirmed, I'll setup replication on SSL.