Bug #58424 mtr ignores failing mysqltest in the presence of expect file for mysqld
Submitted: 23 Nov 2010 18:20 Modified: 20 Dec 2010 3:08
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S3 (Non-critical)
Version:5.5 OS:Any
Assigned to: Bjørn Munch

[23 Nov 2010 18:20] Luis Soares
Description:
Some tests that crash and restart the server can end up timing out in
PB2 (solaris10 x86[_64] debug max). The most noticeable example is the
test case binlog_index.test which makes heavy use of DBUG_SUICIDE + mtr
ability to bring the server back up. The failure symptoms are the 
following:

binlog.binlog_index 'mix' [ fail ] timeout after 900 seconds
        Test ended at 2010-11-22 18:03:16

Test case timeout after 900 seconds

== /export/home/pb2/test/sb_2-2566369-1290432150.42/mysql-5.5.8-solaris10-x86_64/mysql-test/var/log/binlog_index.log == 
show warnings;
Level	Code	Message
Warning	1377	a problem with deleting master-bin.000001; consider examining correspondence of your binlog index file to the actual binlog files
Error	1377	Fatal error during log purge
reset master;
# crash_purge_before_update_index
flush logs;
SET SESSION debug="+d,crash_purge_before_update_index";
purge binary logs TO 'master-bin.000002';
ERROR HY000: Lost connection to MySQL server during query
SET @index=LOAD_FILE('MYSQLTEST_VARDIR/mysqld.1/data//master-bin.index');
SELECT @index;
@index
master-bin.000001
master-bin.000002
master-bin.000003

# crash_purge_non_critical_after_update_index
flush logs;
SET SESSION debug="+d,crash_purge_non_critical_after_update_index";

 == /export/home/pb2/test/sb_2-2566369-1290432150.42/mysql-5.5.8-solaris10-x86_64/mysql-test/var/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/export/home/pb2/test/sb_2-2566369-1290432150.42/tmp/zhGPHhyrmh/mysqld.1.sock' (146)

 - saving '/export/home/pb2/test/sb_2-2566369-1290432150.42/mysql-5.5.8-solaris10-x86_64/mysql-test/var/log/binlog.binlog_index-mix/' to '/export/home/pb2/test/sb_2-2566369-1290432150.42/mysql-5.5.8-solaris10-x86_64/mysql-test/var/log/binlog.binlog_index-mix/'
 - found 'core' (0/5)

Trying 'dbx' to get a backtrace
sh: dbx: not found

Trying 'gdb' to get a backtrace
sh: gdb: not found

Retrying test binlog.binlog_index, attempt(2/3)...

worker[1] Trying to dump core for [mysqltest - pid: 1327, winpid: 1327, exit: 256]
worker[1] Trying to dump core for [mysqld.1 - pid: 1333, winpid: 1333]

Looking at the logs one finds that:
1. the server is restarted
2. mysqld has executed the offending statement (the one that causes the crash)
3. mysqltest/mtr have not logged the offending event nor the error in the
   test case log

Execution control seems not to be returned to the test driver once the
connection to mysqld is broken, suggesting that there is something
not allowing mysqltest to acknowledge that the connection is no more.

Finally, this happens very sporadically for main.crash_commit_before and not
that sporadically for binlog.binlog_index .

How to repeat:
See PB2 logs: http://pb2.norway.sun.com/?action=archive_download&archive_id=2568354&pretty=please

OR

1. ssh skade12
2. <debug build mysql>
3. cd mysql-test 
4. while [ $? == 0 ]; do perl mtr main.crash_commit_before; done

Wait until the execution hangs.

Suggested fix:
.
[23 Nov 2010 19:00] Bjørn Munch
I was able to reproduce the problem running binlog.binlog_index.

What actually happens in this test is that one of the "purge" commands succeeds in stead of failing with error 2013. And then mtr picks up the crashed mysqltest *before* it sees mysqld. With verbose logging, we see:

---
worker[1] > Started [mysqld.1 - pid: 11319, winpid: 11319]
worker[1] > Got [mysqltest - pid: 11303, winpid: 11303, exit: 256]
worker[1] > Crash was expected, file '/export/home/tmp/bm/mysql-5.5.8-solaris10-i386/mysql-test/var/tmp/mysqld.1.expect' exists
---

Oops! It restarts mysqld and that seems to work since the previous mysqld has now also died. But then mtr has probably forgotten that mysqltest has died and waits for it until timeout.

So the test *should* have failed:

---
mysqltest: At line 130: query 'purge binary logs TO 'master-bin.000006'' succeeded - should have failed with errno 2013...
---

(It varies from run to run which one failes)
[24 Nov 2010 9:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124817

2952 Bjorn Munch	2010-11-24
      Bug #58424 mtr ignores failing mysqltest in the presence of expect file for mysqld
      If mysqltest dies, mtr waits to see if mysqld dies too within 100ms
      But in that case, it should not care about expected crash
      Fix: jump past the code that checks the expect file
[27 Nov 2010 16:51] Paul Dubois
Changes to test suite. No changelog entry needed.
[5 Dec 2010 12:38] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[16 Dec 2010 22:26] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)
[17 Dec 2010 12:49] Bugs System
Pushed into mysql-5.1 5.1.55 (revid:georgi.kodinov@oracle.com-20101217124435-9imm43geck5u55qw) (version source revid:mats.kindahl@oracle.com-20101201193331-1c07sjno2g7m46ix) (merge vers: 5.1.55) (pib:24)