Bug #48303 impossible to analyze test failure "Can't connect to MySQL server"
Submitted: 26 Oct 2009 11:43 Modified: 10 Feb 2011 11:58
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S7 (Test Cases)
Version:5.1+ OS:Any
Assigned to: Bjørn Munch CPU Architecture:Any
Tags: mtr

[26 Oct 2009 11:43] Sven Sandberg
Description:
We sometimes get the following sporadic failure with no useful debug information:

==== begin test output ====
mysqltest: In included file "./include/master-slave.inc": At line 5: query 'connect  master1,127.0.0.1,root,,test,$MASTER_MYPORT,' failed: 2003: Can't connect to MySQL server on '127.0.0.1' (125)
Failed to read from '/export/home/pb2/test/sb_0-875500-1256328658.52/mysql-5.1.41-solaris10-sparc-test/mysql-test/var-rpl_binlog_row/log/rpl_commit_after_flush.log', errno: 22
==== end test output ====

There are several problems with the error message:

 (1) On the line starting "Failed to read from...":

    (1a) it does not say *who* printed the message (I'm guessing mysqltest)

    (1b) It prints the errno, which can only be interpreted after logging in to the machine and looking in the system headers. It's not easy to login to pb2 machines.

    (1c) It does not say what the program was trying to do when the failure happened.

 (2) On the line starting "mysqltest: In included file...":

    (2a) It does not interpolate variables, so we don't know the value of $MASTER_MYPORT.

    (2b) It does not print the full stack of included files, so it is in general not possible to know where the failure happened. (In this particular case it's possible to know where the failure happened because master-slave.inc is only included in one place, but this is needed for other failures.)

 (3) There is no information that can be used to fix the bug. The following would help:

   (3a) Output from ps -ax
   (3b) Output from df -h
   (3c) Output from free
   (3d) Output from mount
   (3e) Output from netstat -tanp

How to repeat:
https://central.sun.net/http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&...

https://central.sun.net/http://pb2.norway.sun.com/?template=mysql_show_test_failure&test_f...

Suggested fix:
(1a) Every time mysqltest prints something, prefix it with "mysqltest: ".

(1b) Every time mysqltest prints an errno, use perror().

(1c) Every time mysqltest prints an error, explain what mysqltes was trying to do when the error happened.

(2a) Every time mysqltest prints a line from the test file where variables have been interpolated, print the line after interpolation.

(2b) When mysqltest prints the current file and line number, print the full stack.

(3a) - (3e) Execute these commands after every failure in mtr.
[26 Oct 2009 12:00] Sven Sandberg
BUG#48304 is waiting on this bug to be fixed.
[26 Oct 2009 12:39] Bjørn Munch
This error most likely means that the server has died, and the suggestions here would not help much in determining why.

A more fruitful approach IMHO is to ignore this particular mysqltest failure and instead (or in addition) extract relevant log from the server. With the fix of Bug #46007, MTR should in theory do this but for some reason it reacts to the died mysqltest instead, even though the server should have died first.

I will look into having MTR extracting server log also in this case; when mysqltest reports failure to connect or lost connection.
[26 Oct 2009 16:52] Luis Soares
In addition to what Bjorn has pointed out, I think that on each
failure, MTR should provide some information on the
context/environment. So I agree also with Sven's suggestions.

I have observed some failures that required me to search the
build & run logs, just to conclude that memory was exhausted, no
more forks were allowed, disk was full, etc... A common message
on some random failures is:

"fork failed sleep 1 second and redo: Not enough space at
lib/My/SafeProcess/Base.pm line 52."

Some other are not that explicit and/or appear in totally
unrelated places in the build & run log file, w.r.t. failing
test. This helps the developer to rule out environment faults
that may be causing the test to fail.

So as I see it:

  1. MTR should provide information on environment context when a
     failure happens.

  2. MTR should provide relevant mysql related logs.

  3. MTR should provide debug information (stack traces and so
     forth...).

I would say MTR does already a very nice job on Items #2 and #3,
although #2 could be improved in some situations (the ones
mentioned in this bug report for instance, and some more - see
below). Item #1, I can't remember MTR providing feedback on
this (sorry if I forgot or am missing anything about this).

Closing my argument, I would like to add another failure that
provides very limited feedback: 

http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2406230
[28 Oct 2009 9:43] Bjørn Munch
I filed bug #48367 for getting server logs when server dies, see my previous comment.
[10 Feb 2011 11:58] Bjørn Munch
I beleive other bug fixes has taken care of what could be done here. Sometimes the information about why something failed simply does not exist.