Bug #38817 please make mtr analyze crashes better
Submitted: 15 Aug 2008 8:44 Modified: 30 Jan 2009 17:29
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tests Severity:S7 (Test Cases)
Version:4.1+ OS:Any
Assigned to: Magnus BlÄudd CPU Architecture:Any
Tags: 51rpl, crash, mtr, mysqld, pid, server crash, stack trace

[15 Aug 2008 8:44] Sven Sandberg
Description:
When mysqld crashes in a test, mtr gives very few clues about the error. All the following problems may occur due to a server crash:

 - query X failed: 2013: lost connection to server
 - test case timeout
 - could not sync with master (inside a call to sync_slave_with_master or sync_with_master failed)
 - could not open connection 'default' (on test startup)
 - failed to start mysqld

The following two suggestions would simplify the analysis of crashes significantly:

 (1) When there is a sign that mysqld may have crashed, check if its pid is still running. Report that it is running, or that it is not running.

 (2) When there is a sign that mysqld may have crashed, look for a core file. Produce a stack trace to stderr.

 (3) For *all* errors, produce a result diff, so that any unexpected results from previous queries are included.

How to repeat:
See, e.g., BUG#15399.

Cf. BUG#38181.
[15 Aug 2008 9:10] Vladislav Vaintroub
additional nice-to-have:
dumping callstack on hangs would be very helpful to analyze deadlock situation
(e.g with the help of gdb, dbx or cdb)
[22 Aug 2008 11:58] Sven Sandberg
If you need to prioritize (I know producing result diff needs a little refactoring), please start with producing a coredump. That's by far the most important thing to add to mtr right now.

Please make it print a coredump for each running thread (I think that's something like "gdb /path/to/mysqld /path/to/core --eval-command='thread apply all bt' -q"). Think of whether any other debug info would be useful and easy to extract from the core.
[6 Sep 2008 6:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53413

2678 Magnus Svensson	2008-09-06
      Bug#38817  please make mtr analyze crashes better
[7 Sep 2008 1:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53438

2688 He Zhenxing	2008-09-07 [merge]
      Auto merge
[4 Oct 2008 7:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55301

2668 Sven Sandberg	2008-10-04
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the output.
[4 Oct 2008 7:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55302

2668 Sven Sandberg	2008-10-04
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the output.
[4 Oct 2008 9:48] Philip Stoev
Here are a few notes based on my experience with the Random Query Generator:

* Using if ($line =~ /Core was generated by `(\S+)/) to determine the name of the binary does not work because gdb truncates the path before printing it on this line. For PushBuild2, the path is already too long, so this method of determining the binary is not reliable. Instead, mtr should use the same binary location that was calculated when the server was started up.

* For Windows, cdb can be used as follows:

my $cdb_cmd = "!sym prompts off; !analyze -v; .ecxr; !for_each_frame
dv /t;~*k;q";
'cdb -i "'.$bindir.'" -y
"'.$bindir.';srv*C:\\cdb_symbols*http://msdl.microsoft.com/download/symbols"
-z "'.$datadir.'\mysqld.dmp" -lines -c "'.$cdb_cmd.'"';

CDB is part of Debugging Tools for Windows and must be on the $PATH, it is located in %PROGRAMFILES%\Debugging Tools for Windows\cdb. The machine should also have the Windows Symbol Package installed.

The cdb command above is provided by Vlad and prints the stack trace of the crashing thread and also the stack traces of all other threads. The output is pretty verbose, with local variables and such.

* For Solaris's mdb , the command is

echo '::stack' | mdb $core | c++filt

This is useful for SunStudo-compiled binaries and requires that c++filt is present in the $PATH. The stack traces are not very pretty. For gcc-compiled binaries, gdb produces infinitely better results, so both debuggers must be attempted.

* On Solaris, pstack can also be used to dump the backtrace of the crashing thread from a core file. On Linux, pstack only works against processes having a PID.

* Under Solaris ,
[8 Oct 2008 10:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55720

2672 Sven Sandberg	2008-10-08
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the output.
[13 Oct 2008 16:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56137

2680 Sven Sandberg	2008-10-13
      BUG#38817: please make mtr analyze crashes better
      Post-push fixes making it work on pushbuild's valgrind host, and clarifying the output.
[13 Oct 2008 16:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56138

2708 Sven Sandberg	2008-10-13 [merge]
      merge bugfix for BUG#38817 from 5.1-rpl to 6.0-rpl.
[30 Jan 2009 13:28] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[30 Jan 2009 15:07] Bugs System
Pushed into 5.1.32 (revid:luis.soares@sun.com-20090129165946-d6jnnfqfokuzr09y) (version source revid:sven@mysql.com-20081013161430-oshfyr95nwuye830) (merge vers: 5.1.30) (pib:6)
[30 Jan 2009 17:29] Paul DuBois
Test suite changes. No changelog entry needed.
[17 Feb 2009 14:52] Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 16:40] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 18:16] Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j) (version source revid:tomas.ulin@sun.com-20090201210519-vehobc4sy3g9s38e) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)