Description:
Hi,
In tring to get Rafal a core for #21448 [Com]: network outages can cause slave mysqld to core on reconnection I got a core that is cluster realated.
-> Thanks again for the work you put into repeating the crash and
-> collecting the evidence. It is going to be a great help.
->
-> It seems that we are hitting (at least) two issues here. One of them
-> is some problem with NDB engine. This is what was hit in the most
-> recent crash with the following stack:
->
-> #0 0x002cd402 in __kernel_vsyscall ()
-> (gdb) bt
-> #0 0x002cd402 in __kernel_vsyscall ()
-> #1 0x0046164f in pthread_kill () from /lib/libpthread.so.0
-> #2 0x0835d6b7 in write_core ()
-> #3 0x08214fe6 in handle_segfault ()
-> #4 <signal handler called>
-> #5 0x084af0f2 in NdbTransaction::getNdbOperation ()
-> #6 0x084af359 in NdbTransaction::getNdbOperation ()
-> #7 0x08302562 in ha_ndbcluster::write_row ()
-> #8 0x082ee875 in handler::ha_write_row ()
-> #9 0x082b8d16 in replace_record ()
-> #10 0x082b8d73 in Write_rows_log_event::do_exec_row ()
-> #11 0x082b68fd in Rows_log_event::exec_event ()
-> #12 0x0834cde9 in exec_relay_log_event ()
-> #13 0x0834d34c in handle_slave_sql ()
-> #14 0x0045ebd4 in start_thread () from /lib/libpthread.so.0
-> #15 0x003b64fe in clone () from /lib/libc.so.6
->
-> The two crashes you reported originally in the bug report were
-> however about a different issue, more related to our replication
-> code.
->
-> #0 0x002cd402 in __kernel_vsyscall ()
-> #1 0x0046164f in pthread_kill () from /lib/libpthread.so.0
-> #2 0x0834d78f in write_core ()
-> #3 0x081f8ad6 in handle_segfault ()
-> #4 <signal handler called>
-> #5 0x00357477 in memset () from /lib/libc.so.6
-> #6 0x081d9d30 in Field_string::unpack ()
-> #7 0x08299a63 in unpack_row ()
-> #8 0x0829be4c in Write_rows_log_event::do_prepare_row ()
-> #9 0x08299edd in Rows_log_event::exec_event () #10 0x0833ff77 in
-> exec_relay_log_event ()
-> #11 0x08341afd in handle_slave_sql ()
-> #12 0x0045ebd4 in start_thread () from /lib/libpthread.so.0
-> #13 0x003b64fe in clone () from /lib/libc.so.6
I will save off the core and the mysqld symbols file
Also, I noticed in my first couple of attempts that after restoring the connection, I almost always have to stop the slave and restart the slave to get it to connect to the cluster again; this problem area maybe where the bulk of the issues in the code can be found.
How to repeat:
See 21448
Suggested fix:
Don't crash ;-)