Bug #77261 | 5.6.23 crash while replicating | ||
---|---|---|---|
Submitted: | 5 Jun 2015 15:12 | Modified: | 6 Jun 2015 13:26 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6.23 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[5 Jun 2015 15:12]
Simon Mudd
[5 Jun 2015 15:13]
Simon Mudd
configuration of the server concerned.
Attachment: etc_my.cnf_for_bug_report.txt (text/plain), 3.24 KiB.
[5 Jun 2015 15:18]
Simon Mudd
Note: one of the messages I saw prior to this as I was trying to skip statements with pt-slave-restart --error-numbers=1062,1032 was the following: Last_Error: Could not execute Update_rows event on table mydb.mytable; Got error 171 from storage engine, Error_code: 1030; handler error HA_ERR_CORRUPT_EVENT; the event's master log binlog.020378, end_log_pos 38062529 I was messing around with the binlog format trying to change back to SBR from RBR which finally I concluded is not possible, so wonder if somehow this triggered the other master to write something strange or simply this box picked up something it wasn't expecting to. I don't believe that the events were "corrupt per-se" but it's possible that this changing of binlog_format between ROW and STATEMENT may trigger something which isn't expected.
[5 Jun 2015 16:57]
Simon Mudd
A similar crash just happened on another box. I tried to change slave_exec_mode to IDEMPOTENT but it didn't seem very happy with that. 16:52:47 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=67108864 read_buffer_size=131072 max_used_connections=8 max_threads=7000 thread_count=7 connection_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8225457 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7fc070028c30 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fc088c177e0 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8dce65] /usr/sbin/mysqld(handle_fatal_signal+0x494)[0x666234] /lib64/libpthread.so.0(+0xf710)[0x7fcff87cb710] /lib64/libc.so.6(+0x8431b)[0x7fcff74c831b] /usr/sbin/mysqld(_ZN12Field_string6unpackEPhPKhjb+0x7f)[0x7b600f] /usr/sbin/mysqld(_Z10unpack_rowPK14Relay_log_infoP5TABLEjPKhPK9st_bitmapPS5_PmS5_+0x4b1)[0x8996e1] /usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEPK14Relay_log_info+0x94)[0x873554] /usr/sbin/mysqld(_ZN14Rows_log_event12do_apply_rowEPK14Relay_log_info+0x2f)[0x86cdef] /usr/sbin/mysqld(_ZN14Rows_log_event24do_index_scan_and_updateEPK14Relay_log_info+0x287)[0x874a27] /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x8ba)[0x87a9aa] /usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x74)[0x8806f4] /usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x263)[0x8ab013] /usr/sbin/mysqld[0x8b0872] /usr/sbin/mysqld(handle_slave_sql+0xb0b)[0x8b234b] /usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb0397a] /lib64/libpthread.so.0(+0x79d1)[0x7fcff87c39d1] /lib64/libc.so.6(clone+0x6d)[0x7fcff752c8fd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): is an invalid pointer Connection ID (thread ID): 21246841 Status: NOT_KILLED Again on 5.6.23, which I'm going to upgrade to 5.6.24.
[5 Jun 2015 17:03]
Simon Mudd
On this second box I know I did this: root@server2 [(none)]> show global variables like 'slave_exec_mode'; +-----------------+--------+ | Variable_name | Value | +-----------------+--------+ | slave_exec_mode | STRICT | +-----------------+--------+ 1 row in set (0.00 sec) root@server2 [(none)]> set global slave_exec_mode = 'IDEMPOTENT'; Query OK, 0 rows affected (0.00 sec) For what it's worth I'd been skipping some errors using pt-slave-restart in another window: [root@server2 ~]# pt-slave-restart --error-numbers=1062,1032 --min-sleep=0.002 ... 2015-06-05T18:52:32 relaylog.029483 102589056 1062 2015-06-05T18:52:33 relaylog.029483 102633725 1062 2015-06-05T18:52:33 relaylog.029483 102843197 1610 Error 1610 is not in --error-numbers. [root@server2 ~]# pt-slave-restart --error-numbers=1062,1032,1610 --min-sleep=0.002 2015-06-05T18:52:47 relaylog.029483 102843197 1610 DBD::mysql::st execute failed: MySQL server has gone away [for Statement "SHOW SLAVE STATUS"] at /usr/bin/pt-slave-restart line 2924. [root@server2 ~]# and then I saw the server had crashed.
[5 Jun 2015 18:45]
MySQL Verification Team
The prettified stack from 5.6.23: 0x00000000008dce65: my_print_stacktrace at stacktrace.c:225 0x0000000000666234: handle_fatal_signal at signal_handler.cc:162 0x00000000007b600f: Field_string::unpack(unsigned char*, unsigned char const*, unsigned int, bool) at field.cc:7007 0x00000000008996e1: unpack_row(Relay_log_info const*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) at rpl_record.cc:349 0x0000000000873554: Update_rows_log_event::do_exec_row(Relay_log_info const*) at log_event.cc:13080 0x000000000086cdef: Rows_log_event::do_apply_row(Relay_log_info const*) at log_event.cc:10310 0x0000000000874a27: Rows_log_event::do_index_scan_and_update(Relay_log_info const*) at log_event.cc:10712 0x000000000087a9aa: Rows_log_event::do_apply_event(Relay_log_info const*) at log_event.cc:11411 0x00000000008806f4: Log_event::apply_event(Relay_log_info*) at log_event.cc:3185 0x00000000008ab013: apply_event_and_update_pos(Log_event**, THD*, Relay_log_info*) at rpl_slave.cc:3718 0x00000000008b0872: exec_relay_log_event at rpl_slave.cc:4226 0x00000000008b234b: handle_slave_sql at rpl_slave.cc:6051 0x0000000000b0397a: pfs_spawn_thread at pfs.cc:1863 memcpy(to, from, length); // Pad the string with the pad character of the fields charset field_charset->cset->fill(field_charset, (char*) to + length, field_length - length, field_charset->pad_char); <-------------crash return from+length; } I do have a reason to believe *5.6.25* might be a better choice to try.
[5 Jun 2015 22:16]
Simon Mudd
I'll see if I can download 5.6.25 then and see if that behaves better. Thanks.
[5 Jun 2015 22:41]
Simon Mudd
5.6.25 seems to fix this. I see no crash when setting IDEMPOTENT mode. Thanks.
[5 Jun 2015 22:43]
Simon Mudd
Please note: when you set IDEMPOTENT mode you lose _all_ visibility of the fact the slave may be skipping or ignoring broken comments. That's not ideal at all. In my case right now I don't care but I can't _see_ is the statements I'm skipping are now ok or not. So having counters (not in 5.6 as that would be an incompatible change) but in 5.7 which indicate the number of times a broken statement/event in IDEMPOTENT mode was skipped would be very good. I could use that counter to determine when the issue has gone away, thus allowing me to remove the mode and go back to STRICT mode (if wanted). That functionality would be most helpful.
[6 Jun 2015 13:26]
MySQL Verification Team
closing as can't repeat because not repeatable with current version.