| Bug #68506 | Got SIGSEGV on MTS recovery + SQL thread error | ||
|---|---|---|---|
| Submitted: | 27 Feb 2013 7:23 | Modified: | 16 May 2013 17:14 |
| Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.6.10 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[27 Feb 2013 7:24]
Yoshinori Matsunobu
I mistyped title:)
[27 Feb 2013 8:29]
MySQL Verification Team
After some effort I was able to crash 5.6.10 here:
mysqld.exe!apply_event_and_update_pos()[rpl_slave.cc:3306]
mysqld.exe!exec_relay_log_event()[rpl_slave.cc:3707]
mysqld.exe!handle_slave_sql()[rpl_slave.cc:5516]
mysqld.exe!pfs_spawn_thread()[pfs.cc:1856]
mysqld.exe!pthread_start()[my_winthread.c:63]
mysqld.exe!_callthreadstartex()[threadex.c:314]
mysqld.exe!_threadstartex()[threadex.c:292]
rli was 0x00000000 here:
if (!(rli->is_mts_recovery() && bitmap_is_set(&rli->recovery_groups,
rli->mts_recovery_index)))
{
reason= ev->shall_skip(rli);
}
[27 Feb 2013 8:30]
MySQL Verification Team
And on debug build, I hit exact crash: mysqld-debug.exe!bitmap_is_set()[my_bitmap.h:101] mysqld-debug.exe!apply_event_and_update_pos()[rpl_slave.cc:3306] mysqld-debug.exe!exec_relay_log_event()[rpl_slave.cc:3701] mysqld-debug.exe!handle_slave_sql()[rpl_slave.cc:5516] mysqld-debug.exe!pfs_spawn_thread()[pfs.cc:1855] mysqld-debug.exe!pthread_start()[my_winthread.c:62] mysqld-debug.exe!_callthreadstartex()[threadex.c:314] mysqld-debug.exe!_threadstartex()[threadex.c:297]
[27 Feb 2013 8:37]
MySQL Verification Team
also repeatable on latest 5.6.11 from internal bzr.
[26 Mar 2013 16:58]
Manish Kumar
Hi Yoshinori,
Will the attached patch fix your problem? Let me know if
it does not. Thanks!
=== modified file 'sql/rpl_slave.cc'
--- sql/rpl_slave.cc revid:saikumar.v@oracle.com-20130318054358-77zqaztvuroujo5s
+++ sql/rpl_slave.cc revid:manish.4.kumar@oracle.com-20130318070143-ec45rxdkg37be0u2
@@ -5622,6 +5622,7 @@
if (rli->recovery_groups_inited)
{
bitmap_free(&rli->recovery_groups);
+ rli->mts_recovery_group_cnt= 0;
rli->recovery_groups_inited= false;
}
Regards,
Manish Kumar
[16 May 2013 17:14]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.
If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at
http://dev.mysql.com/doc/en/installing-source.html
[16 May 2013 17:15]
Jon Stephens
Fixed in 5.6+. Documented as follows in the 5.6.12 and 5.7.2 changelogs:
An SQL thread error during MTS slave recovery caused the slave
to fail.
Closed.
[24 May 2013 6:59]
MySQL Verification Team
http://bugs.mysql.com/bug.php?id=69126 marked as duplicate of this one.

Description: When I tried "how to repeat" steps, mysqld got SIGSEGV. ---- 07:03:04 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=101 max_threads=5024 thread_count=313 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2007704 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0xf290e80 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 593db0b0 thread_stack 0x40000 /data/mysql5610/bin/mysqld(my_print_stacktrace+0x35)[0x8f0c35] /data/mysql5610/bin/mysqld(handle_fatal_signal+0x3e8)[0x66b0f8] /lib64/libpthread.so.0[0x3b1280de70] /data/mysql5610/bin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0xee)[0x8c595e] /data/mysql5610/bin/mysqld[0x8c6518] /data/mysql5610/bin/mysqld(handle_slave_sql+0xca9)[0x8c7829] /data/mysql5610/bin/mysqld(pfs_spawn_thread+0x13b)[0x932cbb] /lib64/libpthread.so.0[0x3b128062f7] /lib64/libc.so.6(clone+0x6d)[0x3b120d1e3d] ----- As far as digging into core files, mysqld crashed here. --- apply_event_and_update_pos(Log_event** ptr_ev, THD* thd, Relay_log_info* rli) if (!(rli->is_mts_recovery() && bitmap_is_set(&rli->recovery_groups, rli->mts_recovery_index))) --- At #7 on below "how to repeat", rli->is_mts_recovery() was true but rli->recovery_groups.bitmap was 0x0. So bitmap_is_set() raised SIGSEGV. $8 = {bitmap = 0x0, n_bits = 524280, last_word_mask = 4278190080, last_word_ptr = 0x34ce18c, mutex = 0x0} When SQL thread terminates (including by error), rli->recovery_groups is freed. handle_slave_sql() if (rli->recovery_groups_inited) { bitmap_free(&rli->recovery_groups); rli->recovery_groups_inited= false; } rli->recovery_groups looks allocated on Relay_log_info global instance creation phase, but recovery_groups looks never re-initialized after "bitmap_free(&rli->recovery_groups)". How to repeat: 1. Enable MTS on slave (Set slave_parallel_workers large enough) 2. Insert into master databases from multiple clients (I tested 100 databases from 100 clients) 3. Kill the slave mysqld when the slave delays 4. Restart the slave with --skip-slave-start 5. Manually insert missing rows to slave (to cause #6 intentionally) 6. START SLAVE. Then SQL thread stops with duplicate key error 7. START SLAVE again.