Bug #38817 please make mtr analyze crashes better
Submitted: 15 Aug 2008 10:44 Modified: 30 Jan 18:29
Reporter: Sven Sandberg
Status: Closed
Category:Tests Severity:S2 (Serious)
Version:4.1+ OS:Any
Assigned to: Magnus Blaudd Target Version:5.1
Tags: 51rpl, server crash, pid, stack trace, mysqld, crash, mtr
Triage: D5 (Feature request)

[15 Aug 2008 10:44] Sven Sandberg
Description:
When mysqld crashes in a test, mtr gives very few clues about the error. All the
following problems may occur due to a server crash:

 - query X failed: 2013: lost connection to server
 - test case timeout
 - could not sync with master (inside a call to sync_slave_with_master or
sync_with_master failed)
 - could not open connection 'default' (on test startup)
 - failed to start mysqld

The following two suggestions would simplify the analysis of crashes significantly:

 (1) When there is a sign that mysqld may have crashed, check if its pid is still
running. Report that it is running, or that it is not running.

 (2) When there is a sign that mysqld may have crashed, look for a core file. Produce a
stack trace to stderr.

 (3) For *all* errors, produce a result diff, so that any unexpected results from
previous queries are included.

How to repeat:
See, e.g., BUG#15399.

Cf. BUG#38181.
[15 Aug 2008 11:10] Vladislav Vaintroub
additional nice-to-have:
dumping callstack on hangs would be very helpful to analyze deadlock situation
(e.g with the help of gdb, dbx or cdb)
[22 Aug 2008 13:58] Sven Sandberg
If you need to prioritize (I know producing result diff needs a little refactoring),
please start with producing a coredump. That's by far the most important thing to add to
mtr right now.

Please make it print a coredump for each running thread (I think that's something like
"gdb /path/to/mysqld /path/to/core --eval-command='thread apply all bt' -q"). Think of
whether any other debug info would be useful and easy to extract from the core.
[6 Sep 2008 8:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53413

2678 Magnus Svensson	2008-09-06
      Bug#38817  please make mtr analyze crashes better
[7 Sep 2008 3:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53438

2688 He Zhenxing	2008-09-07 [merge]
      Auto merge
[4 Oct 2008 9:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55301

2668 Sven Sandberg	2008-10-04
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the
output.
[4 Oct 2008 9:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55302

2668 Sven Sandberg	2008-10-04
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the
output.
[4 Oct 2008 11:48] Philip Stoev
Here are a few notes based on my experience with the Random Query Generator:

* Using if ($line =~ /Core was generated by `(\S+)/) to determine the name of the binary
does not work because gdb truncates the path before printing it on this line. For
PushBuild2, the path is already too long, so this method of determining the binary is not
reliable. Instead, mtr should use the same binary location that was calculated when the
server was started up.

* For Windows, cdb can be used as follows:

my $cdb_cmd = "!sym prompts off; !analyze -v; .ecxr; !for_each_frame
dv /t;~*k;q";
'cdb -i "'.$bindir.'" -y
"'.$bindir.';srv*C:\\cdb_symbols*http://msdl.microsoft.com/download/symbols"
-z "'.$datadir.'\mysqld.dmp" -lines -c "'.$cdb_cmd.'"';

CDB is part of Debugging Tools for Windows and must be on the $PATH, it is located in
%PROGRAMFILES%\Debugging Tools for Windows\cdb. The machine should also have the Windows
Symbol Package installed.

The cdb command above is provided by Vlad and prints the stack trace of the crashing
thread and also the stack traces of all other threads. The output is pretty verbose, with
local variables and such.

* For Solaris's mdb , the command is

echo '::stack' | mdb $core | c++filt

This is useful for SunStudo-compiled binaries and requires that c++filt is present in the
$PATH. The stack traces are not very pretty. For gcc-compiled binaries, gdb produces
infinitely better results, so both debuggers must be attempted.

* On Solaris, pstack can also be used to dump the backtrace of the crashing thread from a
core file. On Linux, pstack only works against processes having a PID.

* Under Solaris ,
[8 Oct 2008 12:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55720

2672 Sven Sandberg	2008-10-08
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the
output.
[13 Oct 2008 18:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56137

2680 Sven Sandberg	2008-10-13
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the
output.
[13 Oct 2008 18:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56138

2708 Sven Sandberg	2008-10-13 [merge]
      merge bugfix for BUG#38817 from 5.1-rpl to 6.0-rpl.
[30 Jan 14:28] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463)
(version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers:
6.0.10-alpha) (pib:6)
[30 Jan 16:07] Bugs System
Pushed into 5.1.32 (revid:luis.soares@sun.com-20090129165946-d6jnnfqfokuzr09y) (version
source revid:sven@mysql.com-20081013161430-oshfyr95nwuye830) (merge vers: 5.1.30) (pib:6)
[30 Jan 18:29] Paul DuBois
Test suite changes. No changelog entry needed.
[17 Feb 15:52] Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef)
(version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers:
5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 17:40] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau)
(version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers:
5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 19:16] Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j)
(version source revid:tomas.ulin@sun.com-20090201210519-vehobc4sy3g9s38e) (merge vers:
5.1.32-ndb-6.2.17) (pib:6)