Bug #40385 Crash on server shutdown
Submitted: 29 Oct 2008 7:55 Modified: 11 Mar 2011 10:57
Reporter: Rafal Somla Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:6.0 source OS:Any
Assigned to: CPU Architecture:Any
Tags: crash, debug lib, shutdwon

[29 Oct 2008 7:55] Rafal Somla
Description:
I came across this crash when running tests from backup suite. I could not repeat it. The crash happens when the signal handler intercepts the signal terminating the server and prints message with DBUG_PRINT() due to uninitialized cs->stack->symbols (in the debug library).

Some more anomalies I observed:
- it took very long for the MTR to quit after running tests (few minutes)
- master.err file contained trace output (which normally goes to master.trace when --debug option is passed to MTR). The trace output was present only for some tests.

> #3  <signal handler called>
> #4  0x00000033d3e721b0 in ?? () from /lib64/libc.so.6
> #5  0x0000000000d80cb4 in InList (linkp=0x8f8f8f8f157a3468, cp=0xea0f6f "quit") at dbug.c:1571
> #6  0x0000000000d812ba in _db_keyword_ (cs=0x1e57350, keyword=0xea0f6f "quit", strict=65538) at dbug.c:1821
> #7  0x0000000000d80630 in _db_doprnt_ (format=0xea4e90 "signal_handler: calling my_thread_end()") at dbug.c:1332
> #8  0x000000000070a53a in signal_hand (arg=0x0) at mysqld.cc:2882

Note the invalid linkp argument to InList().

in mysqld.cc:

> 2877        }
> 2878        else
> 2879          while ((error=my_sigwait(&set,&sig)) == EINTR) ;
> 2880        if (cleanup_done)
> 2881        {
> 2882          DBUG_PRINT("quit",("signal_handler: calling my_thread_end()"));
> 2883          my_thread_end();
> 2884          signal_thread_in_use= 0;
> 2885          pthread_exit(0);                          // Safety
> 2886        }

in _db_doprnt_:

> 1327      CODE_STATE *cs;
> 1328      get_code_state_or_return;
> 1329
> 1330      va_start(args,format);
> 1331
> 1332      if (_db_keyword_(cs, cs->u_keyword, 0))
> 1333      {
> 1334        int save_errno=errno;
> 1335        if (!cs->locked)
> 1336          pthread_mutex_lock(&THR_LOCK_dbug);

in _db_keyword_:

> 1816    BOOLEAN _db_keyword_(CODE_STATE *cs, const char *keyword, int strict)
> 1817    {
> 1818      get_code_state_if_not_set_or_return FALSE;
> 1819      strict=strict ? INCLUDE : INCLUDE|MATCHED;
> 1820
> 1821      return DEBUGGING && DoTrace(cs, 1) & DO_TRACE &&
> 1822             InList(cs->stack->keywords, keyword) & strict;
> 1823    }

> (gdb) p keyword
> $4 = 0xea0f6f "quit"
> (gdb) p cs
> $5 = (CODE_STATE *) 0x1e57350
> (gdb) p cs->stack
> $6 = (struct settings *) 0x1e1a620
> (gdb) p cs->stack->keywords
> $7 = (struct link *) 0x0

So, the first argument of InList is NULL, which is probably the cause of the problem.

How to repeat:
I run "./mtr --force --suite=backup". All tests passed. At the end, after a long shutdown time, I saw info about warnings in var/log/warnings. When examinig master.err I noticed server crash by signal 11. Then I examined core with gdb.

I could not repeat this.
[29 Oct 2008 11:01] Sveta Smirnova
Thank you for the report.

On which machine did you see this?
[29 Oct 2008 11:04] Rafal Somla
I saw the crash when running tests on one of the rpl team servers, which is 64bit Ubuntu Linux machine.
[28 Nov 2008 7:47] Sveta Smirnova
Thank you for the feedback.

I can not repeat it with current bzr sources. So I close the report as "Can't repeat". Feel free to reopen if you are able to see it again.
[9 Mar 2011 18:51] Andrei Elkin
Well, feedback is there: Bug #58754.
Changing this bug status to Dup (although its' historical parent).
[11 Mar 2011 10:57] Andrei Elkin
Reverting it to Can't repeat as basis for making the dup conclusion is not solid enough. Yes, the top part of the stack starting from _db_doprnt_() is the same but I can't say yet if it's not callers of the stack that are culprits..