Bug #38176 rpl_sporadic_master times out sporadically on pushbuild
Submitted: 16 Jul 2008 15:34 Modified: 2 Feb 2009 17:10
Reporter: Sven Sandberg Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.1, 6.0 OS:Any (powermacg5)
Assigned to: Alfranio Junior CPU Architecture:Any
Tags: powermacg5, pushbuild, rpl_events, rpl_sporadic_master, sporadic, test failure, timeout

[16 Jul 2008 15:34] Sven Sandberg
Description:
rpl_sporadic_master fails sporadically in pushbuild with the following message:

rpl.rpl_sporadic_master        [ fail ]  timeout

Stopping all servers...
Saving datadirs...
 - saving '/Users/henry/pb/bzr_mysql-5.1-rpl/4/mysql-5.1.26-rc-pb4/mysql-test/var-n_mix-5/mysqld.1'
 - saving '/Users/henry/pb/bzr_mysql-5.1-rpl/4/mysql-5.1.26-rc-pb4/mysql-test/var-n_mix-5/mysqld.2'
Resuming tests...

Retrying, attempt(1/3)...

Restarting all servers

How to repeat:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-5.1-rpl&order=4 powermacg5/n_mix

xref links:
http://tinyurl.com/6376p3
http://tinyurl.com/66xsll
http://tinyurl.com/6pp86c

All known failures are on powermacg5. I believe that host is known to be
slow, so one may suspect that the test case merely took a long time.

Suggested fix:
Timeout errors are almost impossible to debug, since no debug information is given.

I'd suggest augmenting mtr so that it forces a stop of the test case, outputs a result diff, outputs the result of "SHOW PROCESSLIST" on all running servers (that can indicate if the servers are overloaded), and outputs the time that mtr waited before timeout.

See, e.g., BUG#38172, BUG#37714 for other bugs that need this info.
[22 Jul 2008 17:17] Sven Sandberg
Also rpl.rpl_events failed with timeout:

https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-bugteam&order=66 'vm-win2003-32-a' Win32 VS2003 -max-nt/rpl_binlog_row
[1 Oct 2008 12:38] Sven Sandberg
Note that, after BUG#38181 was fixed, the bug has happened again in this location:

https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=52 sles10-ia64-a-2/ps_stm_threadpool

(found in the first xref link above). This failure has more debug info than previous ones:

rpl.rpl_sporadic_master                  [ fail ]  timeout after 15 minutes

Test case timeout after 15 minute(s)

 == /dev/shm/var-ps_stm_threadpool-141/6/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Exit with failure! Call 'exit' in script to return with sucess
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info
5	root	localhost:36391	NULL	Sleep	902	NULL	NULL
6	root	localhost	NULL	Query	0	NULL	SHOW PROCESSLIST

Timeout [timer - pid: 27071, exit: 0] expired for running 'analyze-timeout'
 - saving '/dev/shm/var-ps_stm_threadpool-141/6/log/rpl.rpl_sporadic_master/' to '/dev/shm/var-ps_stm_threadpool-141/log/rpl.rpl_sporadic_master/'

Retrying test, attempt(2/3)...

Mtr executed SHOW PROCESSLIST when the timeout occurred, so it seems that mysqld was executing a SLEEP.