Bug #54118 Two signal handlers for SIGUSR1 and none for SIGALRM, resulting in a deadlock
Submitted: 31 May 2010 20:04 Modified: 18 Jun 2010 16:52
Reporter: Donna Harmon Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.1.47 OS:Solaris
Assigned to: Alexey Kopytov CPU Architecture:Any

[31 May 2010 20:04] Donna Harmon
Description:
SIGALRM should be used instead of SIGUSR1 on x86 Solaris 10, as SIGUSR1 should be used on systems with RTS threads only.

There should be two signal handlers, one for SIGALRM at process_alarm() and the other at thread_alarm() for SIGUSR1.  In provided pstack of core we have two signal handlers for SIGUSR1 and none for SIGALRM, resulting in a deadlock.

How to repeat:
1) use our Solaris binary
2) set 2000+ threads in "Sleep" state
3) set wait timeout to 1

Suggested fix:
Use SIGALRM instead of SIGUSR1 on x86 Solaris 10
[17 Jun 2010 13:11] Alexey Kopytov
psig output on a running Solaris binary (mysql-advanced-gpl-5.1.47-solaris10-x86_64):

HUP     blocked,caught  print_signal_warning    0
INT     blocked,default
QUIT    blocked,default
ILL     caught  handle_segfault RESETHAND,NODEFER
TRAP    default
ABRT    caught  handle_segfault RESETHAND,NODEFER
EMT     default
FPE     caught  handle_segfault RESETHAND,NODEFER
KILL    default
BUS     caught  handle_segfault RESETHAND,NODEFER
SEGV    caught  handle_segfault RESETHAND,NODEFER
SYS     default
PIPE    blocked,ignored
ALRM    blocked,caught  print_signal_warning    0
TERM    blocked,caught  print_signal_warning    0
USR1    caught  thread_alarm    0
USR2    default
CLD     default                 NOCLDSTOP
PWR     default
WINCH   default
URG     default
POLL    default
STOP    default
TSTP    blocked,default
CONT    default
TTIN    default
TTOU    default
VTALRM  default
PROF    default
XCPU    default
XFSZ    default
WAITING default
LWP     default
FREEZE  default
THAW    default
CANCEL  default
LOST    default
XRES    default
JVM1    default
JVM2    default
RTMIN   default
RTMIN+1 default
RTMIN+2 default
RTMIN+3 default
RTMAX-3 default
RTMAX-2 default
RTMAX-1 default
RTMAX   default

As seen, we have different signal handlers for SIGALRM and SIGUSR1. For SIGALRM process_alarm() is actually called:

(dbx) status
 (2) trace in _signal
 (3) trace in _sigset
 (4) trace in _libc_sigaction
 (5) stop in init_thr_alarm
 (6) stop in process_alarm
 (7) stop at "mysqld.cc":2774

(after "kill -14 `pgrep mysqld`" in another session):

dbx: warning: File `mysqld.cc' has been modified more recently than `mysqld'
t@2 (l@2) stopped in signal_hand (optimized) at line 2774 in file "mysqld.cc"
 2774       if (cleanup_done)
(dbx) cont
dbx: warning: File `thr_alarm.c' has been modified more recently than `mysqld'
t@2 (l@2) stopped in process_alarm (optimized) at line 297 in file "thr_alarm.c"
  297   {
(dbx) where
current thread: t@2
=>[1] process_alarm(sig = -20582464) (optimized), at 0x9e2d18 (line ~297) in "thr_alarm.c"
  [2] signal_hand(arg = ???) (optimized), at 0x7148f6 (line ~2836) in "mysqld.cc"
  [3] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed7504b 
  [4] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed75280 
(dbx)

So with regard to SIGALRM/SIGUSR1, the signal handling is managed in exactly the same way as in Linux. I do not see any bugs here.