Bug #54082 handle_segfault makes use of unsafe functions
Submitted: 29 May 2010 6:12 Modified: 6 Dec 2011 1:12
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Errors Severity:S1 (Critical)
Version:5.0, 5.1, 5.5 OS:Any (AIX, FC13)
Assigned to: CPU Architecture:Any
Triage: Triaged: D2 (Serious) / R3 (Medium) / E3 (Medium)

[29 May 2010 6:12] Shane Bester
Description:
handle_segfault is the signal handler code of mysqld.  however, it makes calls to potentially unsafe functions localtime_r, fprintf, fflush.

we saw many normal SIGSEGV generate corefiles with stack traces like this,
indicating the handle_segfault itself crashed.

(dbx) where
_global_unlock_common
_rec_mutex_unlock
getenv
tzset
localtime_tz_r
handle_segfault

see the list of "safe" functions for use in a signal handler:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
http://linux.die.net/man/2/signal

How to repeat:
hopefully never!

Suggested fix:
replace localtime_r with time()
replace printf with write()

Although we saw this mostly on AIX platform, the fix could be for all platforms
to ensure consistency.
[29 May 2010 6:15] Shane Bester
http://docs.sun.com/app/docs/doc/816-5137/6mba5vqbd?a=view#gen-95948
[29 May 2010 11:51] Sveta Smirnova
Thank you for the report.

Verified as described.
[29 May 2010 14:45] Shane Bester
i'm no expert on signal handlers, but wouldn't it be nicer to move alot of code out of signal_hand() and only set some global flag instead.  then, some other thread can check these values every X milliseconds and do the actual work there?
[29 May 2010 17:39] Davi Arnaut
You want to generate a backtrace on the process that actually caused the problem.
[29 May 2010 20:45] Davi Arnaut
Yes, no disagreement there. The whole thing is in need of some love, it does too much stuff. Ideally, it should be as simple as possible -- separate Windows/Unix handling, move common code to mysys, do not call libc/mysys functions, handle portability locally, etc.
[17 Sep 2010 13:52] Shane Bester
i got this on my home machine....
5.1.50.  Fedora core 13 x64.

The original crash happened during shutdown of mysqld.  Then segfault handler crashed and the OS caught the exception, saved a corefile:

(gdb) bt
#0  _IO_vfscanf_internal (s=0x7f6620d4c910, format=0x30e1541bdf "%[A-Za-z]%n", argptr=0x7f6620d4ca30, errp=0x0) at vfscanf.c:219
#1  0x00000030e1469035 in _IO_vsscanf (string=0x29c6318 "SAST-2", format=0x30e1541bdf "%[A-Za-z]%n", args=0x7f6620d4ca30) at iovsscanf.c:45
#2  0x00000030e14632d8 in __sscanf (s=<value optimized out>, format=<value optimized out>) at sscanf.c:34
#3  0x00000030e1496eb3 in __tzset_parse_tz (tz=0x29c6318 "SAST-2") at tzset.c:184
#4  0x00000030e14981b0 in __tzfile_compute (timer=1284730991, use_localtime=<value optimized out>, leap_correct=0x7f6620d4cc10, leap_hit=0x7f6620d4cc1c, tp=0x7f6620d4cc60)
    at tzfile.c:646
#5  0x00000030e1497b17 in __tz_convert (timer=0x7f6620d4cc98, use_localtime=1, tp=0x7f6620d4cc60) at tzset.c:627
#6  0x00000000005e30bf in handle_segfault (sig=11) at mysqld.cc:2483
#7  <signal handler called>
#8  0x0000000000000000 in ?? ()
#9  0x0000000000000000 in ?? ()

This crashed in handle_segfault:

  curr_time= my_time(0);
  localtime_r(&curr_time, &tm); <------
[1 Aug 2011 19:26] Shane Bester
On Windows platform, less work should be done in the unhandled exception filter.

For example, the symbol file paths in get_symbol_path() can be constructed at server startup and saved.
[6 Dec 2011 1:12] Paul Dubois
Noted in 5.5.20, 5.6.5 changelogs.

The handle_segfault() signal-handler code in mysqld could itself
crash due to calling unsafe functions.
[6 Dec 2011 1:13] Paul Dubois
Noted in 5.1.61 changelog.
[8 Dec 2011 2:35] Vladislav Vaintroub
Shane,this is not quite correct that get_symbol_path needs to be constructed at startup time. It does not bring much, the most work that is done by stackwalking itself, is much bigger. If one does not want exception in exception filters, disable crash handler and collect the dumps, Windows gives that opportunity. No matter what you do in the process, there is no "safe subset" of functions to use in the crash handler. It will mostly reliably crash in crash handler stack overflow, because some stack needs to be used, and it will crash in CRT is CRT structures are overwritten (even write() will fail).  safe crash handler inside the same process. Doing anything in crashing process is potentially unsafe. this is why debugger and out-of-process handlers are great.
[8 Dec 2011 2:37] Vladislav Vaintroub
err, I meant that "safe crash handler inside the crashing process" is a myth not worth trying.