MySQL Bugs: #62311: segfault in mysqld during early SIGHUP handling

Bug #62311	segfault in mysqld during early SIGHUP handling
Submitted:	31 Aug 2011 17:20	Modified:	9 Feb 2012 5:01
Reporter:	Volodymyr Lukiianyk	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Errors	Severity:	S3 (Non-critical)
Version:	5.1.55, 5.1.61	OS:	Linux (ALT Linux based)
Assigned to:		CPU Architecture:	Any
Tags:	Contribution

Description:
We are running mysqld in lxc-container and it crashes when started from init scripts but works OK when started from command line. According to strace output in the former case (unlike the latter one) it receives really early SIGHUP, which probably is generated on controlling terminal hangup.

After core-file examination and looking into the source code we believe there is a race condition when signal handler thread is trying to access objects/data not yet created/initialized by the main thread.

Segfault 

    mysqld[9142]: segfault at 28 ip 000000000070aa4b sp 00007f95b1703500
                                         error 4 in mysqld[400000+715000]

is caused by the signal handler thread. Here is its backtrace

    ---------------------------------------------------------------------
    (gdb) bt
    #0  0x000000000070aa4b in Event_scheduler::dump_internal_status() ()
    #1  0x00000000007105a1 in Events::dump_internal_status() ()
    #2  0x00000000006525e8 in mysql_print_status() ()
    #3  0x00000000005c865d in signal_hand ()
    #4  0x00007f1defbe38e0 in start_thread () from /lib64/libpthread.so.0
    #5  0x00007f1deed6674d in clone () from /lib64/libc.so.6
    #6  0x0000000000000000 in ?? ()
    ---------------------------------------------------------------------

and here is the backtrace of the main thread

    --------------------------------------------------
    (gdb) thread apply 2 bt

    Thread 2 (Thread 0x7f1df000d720 (LWP 698)):
    #0  0x00000000005b8620 in THD::cleanup() ()
    #1  0x00000000005bb25d in THD::~THD() ()
    #2  0x00000000005bb299 in THD::~THD() ()
    #3  0x0000000000607240 in mysql_rm_tmp_tables() ()
    #4  0x00000000005cba08 in main ()
    --------------------------------------------------

From the sources we get the following call diagram (only relevant calls are depicted)

                                main()
                                 |
                                 :
                                 |-start_signal_handler()
   signal_hand() <---clone()---  |  `-pthread_create(...signal_hand...)
       |                         |-mysql_rm_tmp_tables()
       |                         :
       :                         |-Events::init()
       :                         :  `-scheduler= new Event_scheduler(...)
       :
       |-mysql_print_status()
       :  `-Events::dump_internal_status()
             `-scheduler->dump_internal_status()

From the backtraces of the main thread we see that it was in the function mysql_rm_tmp_tables(). That means that method init() of the Events object wasn't called yet, and its member "scheduler" wasn't initialized. But at the same time signal handler thread tried to call method dump_internal_status() of the (not even yet created) Events_scheduler object.

After applying following patch mysqld ceased to crash during such signal handling.

diff --git a/sql/events.cc b/sql/events.cc
index afae512..91b10db 100644
--- a/sql/events.cc
+++ b/sql/events.cc
@@ -1099,8 +1099,15 @@ Events::dump_internal_status()
     puts("The Event Scheduler is disabled");
   else
   {
-    scheduler->dump_internal_status();
-    event_queue->dump_internal_status();
+    if (scheduler)
+      scheduler->dump_internal_status();
+    else
+      puts("The Event Scheduler isn't created yet.");
+
+    if (event_queue)
+       event_queue->dump_internal_status();
+    else
+      puts("The Event Queue isn't created yet.");
   }
 
   pthread_mutex_unlock(&LOCK_event_metadata);
diff --git a/sql/slave.cc b/sql/slave.cc
index 96319de..c716bcd 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -4487,6 +4487,9 @@ int rotate_relay_log(Master_info* mi)
   Relay_log_info* rli= &mi->rli;
   int error= 0;
 
+  if (!mi)
+     return error;
+
   /* We don't lock rli->run_lock. This would lead to deadlocks. */
   pthread_mutex_lock(&mi->run_lock);

How to repeat:
Well, as stated in the description, we reproduce it simply buy restarting lxc-container with mysqld. Since the reason of such an early SIGHUP isn't confirmed yet, not sure we can provide instruction for reproducing such situation on any other setup.

Suggested fix:
Do not fully enable signal handling thread until all objects/data used by it are fully created/initialized. Or make this thread aware of such a possibility.

Thank you for the problem report and patch contributed. Verified by code review.

is this rare condition repeatable on mysql 5.5 or 5.6 ?  I'm asking, due to the similar nature of  http://bugs.mysql.com/bug.php?id=47703

and related http://bugs.mysql.com/bug.php?id=48536

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Fixed in 5.6. Documented as follows in the 5.6.5 changelog:

        If during server startup a signal such as SIGHUP was caught
        prior to full server initialization, the server could crash.
        This was due to a race condition between the signal handler
        thread and the main thread performing server initialization. To
        prevent this from happening, signal processing is now suspended
        until full initialization of all server components has been
        completed successfully.
      
Closed.

5.6$ bzr log -r 3402.50.94
------------------------------------------------------------
revno: 3402.50.94
committer: Dmitry Shulga <Dmitry.Shulga@oracle.com>
branch nick: mysql-trunk-bug13608371
timestamp: Mon 2012-01-30 11:34:32 +0600
message:
  This patch fixes bug#13608371 (formerly known as bug 62311):
  segfault in mysqld during early SIGHUP handling.
  
  If during server start up some signals like SIGHUP are caught before full
  server initialization has been done then server may crash.
  
  The reason for this bug is that there was a race condition between signal
  handler thread and main thread that was doing server initialization.
  For example, if SIGHUP signal was delivered to server and caught by signal
  handler thread before server had completed initialization of Event_scheduler
  then server crashed when try to access to uninitialized event_scheduler
  instance during signal handling.
  
  To avoid such race condition we suspend signal processing before full
  initialization of all server components has been completed successfully.
  It's achieved by waiting for on COND_server_started conditional variable
  until a flag mysqld_server_started is true.
  
  Moreover superfluous call of pthread_sigmask() to set thread signal mask
  was removed since all signal have been already blocked during execution
  of my_init_signal().
  
  Also call of mysql_cond_broadcast(&COND_thread_count) at signal_hand()
  was moved under protection of mutex LOCK_thread_count to provide deterministic
  schedule behaviour.

Is this fix going to be backported to 5.5 or older (since this was reported for 5.1). The fix seems to be only in 5.6 tree.