Bug #24415 Instance manager test im_daemon_life_cycle fails randomly .
Submitted: 18 Nov 2006 20:26 Modified: 7 Mar 2007 21:57
Reporter: Rafal Somla Email Updates:
Status: Closed Impact on me:
None 
Category:Instance Manager Severity:S3 (Non-critical)
Version:5.0.32 OS:Linux (linux (debian))
Assigned to: Alexander Nozdrin CPU Architecture:Any
Tags: rt_q1_2007

[18 Nov 2006 20:26] Rafal Somla
Description:
Test im_deamon_life_cycle fails randomly with message 

mysqltest: At line 45: query 'START INSTANCE mysqld2' failed: 2002: Can't connect to local MySQL server through socket '/ext/mysql/bkroot/mysql-5.0/mysql-test/var/tmp/im.sock' (111)

Sometimes this error is preceded by the following warning:

mysql-test-run: WARNING: check_expected_crash_and_restart couldn't find an entry for pid: 24316

This was detected while testing main 5.0.32 tree.

This is a known problem which reappears in many bug reports: BUG#21331, BUG#22379, BUG#19362, BUG#15934 to name few. All merged fixes are up to version 5.0.30. See also BUG#20294 and BUG#22815.

Note that the error mesage above is different from errors reported in other bug entries.

How to repeat:
mysql-test-run.pl im_daemon_life_cycle
[20 Nov 2006 15:51] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of

uname -a

and exct configure command line (or script) you used to build.
[21 Nov 2006 17:35] Rafal Somla
This might be useful: after runing the test the log file (r/im_daemon_life_cycle.log) looks as follows:

SHOW VARIABLES LIKE 'server_id';
Variable_name	Value
server_id	1
SHOW INSTANCES;
instance_name	status
mysqld1	online
mysqld2	offline
Killing the process...
Sleeping...
Success: the process was restarted.
Error: server does not accept connections after 30 seconds.

--------------------------------------------------------------------
-- Test for BUG#12751
--------------------------------------------------------------------
START INSTANCE mysqld2;
[20 Dec 2006 10:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17202

ChangeSet@1.2359, 2006-12-20 11:13:16+01:00, joerg@trift2. +1 -0
  Fix silly typos in the disabling of "im_daemon_life_cycle" (bug#24415).
[22 Jan 2007 17:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18570

ChangeSet@1.2411, 2007-01-22 20:05:57+03:00, anozdrin@alik. +14 -0
  Patch for IM in scope of working on BUG#24415: Instance manager test
  im_daemon_life_cycle fails randomly.
  
  1. Move IM-angel functionality into a separate file, create Angel class.
  2. Be more verbose;
  3. Fix typo in FLUSH INSTANCES implementation;
  4. Polishing.
[8 Feb 2007 20:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19586

ChangeSet@1.2412, 2007-02-08 23:34:32+03:00, anozdrin@alik.opbmk +5 -0
  Fix for BUG#24415: Instance manager test im_daemon_life_cycle fails randomly.
  
  The cause of im_daemon_life_cycle.imtest random failures was the following
  behaviour of some implementations of LINUX threads: let's suppose that a process
  had several threads (in LINUX threads, each there is a separate process for each
  thread). When the main process gets killed, the parent receives SIGCHLD before
  all threads (child processes) die. In other words, the parent receives SIGCHLD,
  when its child is not completely dead.
  
  In terms of IM, that means that IM-angel receives SIGCHLD when IM-main is not dead
  and still acquires some resources. After receiving SIGCHLD, IM-angel restarts
  IM-main, but IM-main failed to initialize, because previous instance (copy) of
  IM-main still holds server socket (TCP-port).
  
  Another problem here was that IM-angel restarted IM-main only if it was killed
  by signal. If it exited with error, IM-angel thought it's intended / graceful
  shutdown and exited itself.
  
  So, when the second instance of IM-main failed to initialize, IM-angel thought
  it's intended shutdown and quit.
  
  The fix is
    1. to change IM-angel so that it restarts IM-main if it exited with error code;
    2. to change IM-main so that it returns proper exit code in case of failure.
  
  The patch is committed to 5.1, because the bug is not critical.
[19 Feb 2007 18:41] Konstantin Osipov
Approved over email with several comments.
[20 Feb 2007 19:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/20217

ChangeSet@1.2418, 2007-02-20 22:31:50+03:00, anozdrin@alik.opbmk +7 -0
  Fix for BUG#24415: Instance manager test im_daemon_life_cycle fails randomly.
  
  The cause of im_daemon_life_cycle.imtest random failures was the following
  behaviour of some implementations of LINUX threads: let's suppose that a
  process has several threads (in LINUX threads, there is a separate process for
  each thread). When the main process gets killed, the parent receives SIGCHLD
  before all threads (child processes) die. In other words, the parent receives
  SIGCHLD, when its child is not completely dead.
  
  In terms of IM, that means that IM-angel receives SIGCHLD when IM-main is not dead
  and still holds some resources. After receiving SIGCHLD, IM-angel restarts
  IM-main, but IM-main failed to initialize, because previous instance (copy) of
  IM-main still holds server socket (TCP-port).
  
  Another problem here was that IM-angel restarted IM-main only if it was killed
  by signal. If it exited with error, IM-angel thought it's intended / graceful
  shutdown and exited itself.
  
  So, when the second instance of IM-main failed to initialize, IM-angel thought
  it's intended shutdown and quit.
  
  The fix is
    1. to change IM-angel so that it restarts IM-main if it exited with error code;
    2. to change IM-main so that it returns proper exit code in case of failure.
[23 Feb 2007 17:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/20480

ChangeSet@1.2419, 2007-02-23 20:24:32+03:00, anozdrin@alik.opbmk +2 -0
  BUG#24415: im_daemon_life_cycle.imtest fails
  
  Fix timeouts. Only test suite is changed.
[7 Mar 2007 21:57] Konstantin Osipov
Internal. No ChangeLog entry is needed.