Bug #70430 | Race condition between purge coordinator and InnoDB monitor crash on shutdown | ||
---|---|---|---|
Submitted: | 26 Sep 2013 7:52 | Modified: | 28 Mar 2014 19:08 |
Reporter: | Laurynas Biveinis (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[26 Sep 2013 7:52]
Laurynas Biveinis
[26 Sep 2013 7:54]
Laurynas Biveinis
Sample Percona Server 5.6.13 stacktraces: crash: Thread 1 (LWP 23301): +bt #0 0x000000363720c69c in pthread_kill () from /lib64/libpthread.so.0 #1 0x000000000066449e in handle_fatal_signal (sig=6) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/signal_handler.cc:254 #2 <signal handler called> #3 0x0000003636e328a5 in raise () from /lib64/libc.so.6 #4 0x0000003636e34085 in abort () from /lib64/libc.so.6 #5 0x000000000096b648 in lock_print_info_summary (file=0x2beae90, nowait=<optimized out>) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/lock/lock0lock.cc:5212 #6 0x00000000009f6abb in srv_printf_innodb_monitor (file=0x2beae90, nowait=1, trx_start_pos=0x0, trx_end=0x0) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/srv/srv0srv.cc:1261 #7 0x00000000009f79a2 in srv_monitor_thread (arg=<optimized out>) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/srv/srv0srv.cc:1886 #8 0x0000003637207851 in start_thread () from /lib64/libpthread.so.0 #9 0x0000003636ee890d in clone () from /lib64/libc.so.6 shutdown: Thread 35 (LWP 23549): +bt #0 0x0000003636ee14f3 in select () from /lib64/libc.so.6 #1 0x000000000099345f in os_thread_sleep (tm=<optimized out>) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/os/os0thread.cc:255 #2 0x000000000097b58e in logs_empty_and_mark_files_at_shutdown () at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/log/log0log.cc:3274 #3 0x00000000009fb20f in innobase_shutdown_for_mysql () at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/srv/srv0start.cc:2869 #4 0x000000000093d444 in innobase_end (hton=<optimized out>, type=<optimized out>) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/innobase/handler/ha_innodb.cc:3627 #5 0x00000000005aa42e in ha_finalize_handlerton (plugin=0x2bc4eb0) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/handler.cc:636 #6 0x00000000006f08fe in plugin_deinitialize (ref_check=true, plugin=0x2bc4eb0) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/sql_plugin.cc:937 #7 reap_plugins () at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/sql_plugin.cc:1015 #8 0x00000000006f12d3 in plugin_shutdown () at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/sql_plugin.cc:1700 #9 0x000000000059ea34 in clean_up (print_message=true) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/mysqld.cc:1840 #10 0x000000000059ee0e in unireg_end () at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/mysqld.cc:1708 #11 0x00000000005a2038 in kill_server (sig_ptr=0x0) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/mysqld.cc:1636 #12 0x00000000005a223e in kill_server_thread (arg=<optimized out>) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/sql/mysqld.cc:1659 #13 0x00000000009236aa in pfs_spawn_thread (arg=0x7f1988001b00) at /bzr/5.6-QA_opt/Percona-Server-5.6.13-rc60.5/storage/perfschema/pfs.cc:1858 #14 0x0000003637207851 in start_thread () from /lib64/libpthread.so.0 #15 0x0000003636ee890d in clone () from /lib64/libc.so.6 And the purge coordinator thread has already stopped. Full stacktraces and additional data available at https://bugs.launchpad.net/percona-server/+bug/1230238
[12 Dec 2013 21:38]
Sveta Smirnova
Thank you for the report. I can not repeat described behavior if add condition to DECLARE_THREAD(srv_monitor_thread) waiting when purge_sys->state == PURGE_STATE_EXIT. Please provide repeatable test case, demonstrating the problem. Source code change I used: === modified file 'storage/innobase/srv/srv0srv.cc' --- storage/innobase/srv/srv0srv.cc 2013-11-29 12:56:54 +0000 +++ storage/innobase/srv/srv0srv.cc 2013-12-12 21:35:03 +0000 @@ -1536,6 +1536,7 @@ time_elapsed = difftime(current_time, last_monitor_time); if (time_elapsed > 15) { +while (purge_sys->state != PURGE_STATE_EXIT) {} last_monitor_time = ut_time(); if (srv_print_innodb_monitor) { Test case I used: --source include/have_innodb.inc create table innodb_monitor(f1 int) engine=innodb; --sleep 20 select 1; --shutdown_server
[13 Dec 2013 11:10]
Laurynas Biveinis
Sveta - In your testcase purge is idle, thus the coordinator waiting for activity by sleeping on a non-timingout event, which is never signalled, hence the purge state is never set to PURGE_STATE_EXIT. The easiest way to make the purge active there deterministically is this on the top of your patch: === modified file 'storage/innobase/srv/srv0srv.cc' --- storage/innobase/srv/srv0srv.cc 2013-07-29 09:18:44 +0000 +++ storage/innobase/srv/srv0srv.cc 2013-12-13 11:05:22 +0000 @@ -2742,6 +2743,8 @@ /* If there are no records to purge or the last purge didn't purge any records then wait for activity. */ + n_total_purged = 1; + if (purge_sys->state == PURGE_STATE_STOP || n_total_purged == 0) { Then running your testcase gives the expected result: Thread 1 (Thread 0x7f4742146700 (LWP 24969)): #0 0x00007f474fbd9f0c in __pthread_kill (threadid=<optimized out>, signo=6) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:62 #1 0x0000000000aa8ba9 in my_write_core (sig=6) at /home/laurynas/percona/src/5.6/mysys/stacktrace.c:422 #2 0x000000000071a4f3 in handle_fatal_signal (sig=6) at /home/laurynas/percona/src/5.6/sql/signal_handler.cc:248 #3 <signal handler called> #4 0x00007f474f01df77 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #5 0x00007f474f0215e8 in __GI_abort () at abort.c:90 #6 0x0000000000bee043 in lock_print_info_summary (file=0x7f474f3a91a0 <_IO_2_1_stderr_>, nowait=1) at /home/laurynas/percona/src/5.6/storage/innobase/lock/lock0lock.cc:5184 #7 0x0000000000cd55d8 in srv_printf_innodb_monitor (file=0x7f474f3a91a0 <_IO_2_1_stderr_>, nowait=1, trx_start_pos=0x0, trx_end=0x0) at /home/laurynas/percona/src/5.6/storage/innobase/srv/srv0srv.cc:1175 #8 0x0000000000cd62a2 in srv_monitor_thread (arg=0x0) at /home/laurynas/percona/src/5.6/storage/innobase/srv/srv0srv.cc:1555 #9 0x00007f474fbd4f6e in start_thread (arg=0x7f4742146700) at pthread_create.c:311 #10 0x00007f474f0e19cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
[19 Dec 2013 17:12]
Sveta Smirnova
Thank you for the feedback. Verified as described.
[16 Jan 2014 21:25]
Daniel Price
Fixed as of 5.6.17, 5.7.4 and here's the changelog entry: The "srv_monitor_thread" would crash in the "lock_print_info_summary()" function due to a race condition between the "srv_monitor_thread" and purge coordinator thread. Thank you for the bug report.
[28 Mar 2014 19:08]
Laurynas Biveinis
5.6$ bzr log -r 5753 -n0 ------------------------------------------------------------ revno: 5753 committer: Yasufumi Kinoshita <yasufumi.kinoshita@oracle.com> branch nick: mysql-5.6 timestamp: Thu 2014-01-16 11:34:22 +0900 message: Bug#17980590 : RACE CONDITION BETWEEN PURGE COORDINATOR AND INNODB MONITOR CRASH ON SHUTDOWN lock_print_info_summary() should treat purge_sys->state==PURGE_STATE_EXIT case, instead of ut_error crash. Because it might be called after purge_thread exited. Approved by Sunny Bains in rb#4209