MySQL Bugs: #44823: mysqltest "disconnect" function returns before completing the disconnect

Bug #44823	mysqltest "disconnect" function returns before completing the disconnect
Submitted:	12 May 2009 13:30	Modified:	6 Apr 2011 13:52
Reporter:	Philip Stoev	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	Tools: MTR / mysql-test-run	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	Bjørn Munch	CPU Architecture:	Any

Description:
The "disconnect" function as provided by mysqltest is not sufficiently reliable. It returns before the disconnect is complete at the server, meaning that a follow-up SHOW PROCESSLIST query may still show the connection. Those connections cause random test failures.

A reading of the code shows that mysqltest uses mysql_close() to terminate the connection. In order to fortify that, KILL CONNECTION_ID() will be appled as well.

How to repeat:
see bug http://bugs.mysql.com/bug.php?id=41255

Suggested fix:
 Currently, mysqltest uses mysql_close() to terminate 
the connection, and it does not check the exit value of this code. 
Therefore, we can do the following sequence on "disconnect".

1. mysqltest executes "KILL CONNECTION_ID()"
2. server marks the thread as Killed, removes it from PROCESSLIST and then 
returns to the client
3. mysqltest executes mysql_close()
4. test continues

Based on my reading of the code, by the time #2 is done, the connection will 
no longer show up in PROCESSLIST.

Suggested patch:

      Bug #44823 mysqltest "disconnect" function returns before completing the disconnect

      Matthias reports that after a mysqltest disconnect command, there is a window of time
      during which the connection continues to exist. This causes issues with SHOW PROCESSLIST

      and I_S queries if they follow the disconnect, even across test boundaries. This has
      caused numerous sporadic test failures in the past.

      Fixed by issuing a KILL CONNECTION_ID() for each explicit "disconnect", and for all
      connections at the end of the test. Examination of the code in the server reveals that
      by the time KILL returns, the connection is provably gone and will not show up in
      SHOW PROCESSLIST or I_S.PROCESSLIST.

=== modified file 'client/mysqltest.cc'
--- client/mysqltest.cc 2009-04-28 18:26:31 +0000
+++ client/mysqltest.cc 2009-05-13 07:12:58 +0000
@@ -1070,6 +1070,14 @@
     if (next_con->stmt)
       mysql_stmt_close(next_con->stmt);
     next_con->stmt= 0;
+
+    /*
+      We issue the extra KILL in order to make sure that all connections
+      are truly gone before mysqltest exits and the next test starts.
+    */
+    if (next_con->name_len > 0)
+      mysql_query(&next_con->mysql, "KILL CONNECTION_ID()");
+
     mysql_close(&next_con->mysql);
     if (next_con->util_mysql)
       mysql_close(next_con->util_mysql);
@@ -1154,6 +1162,7 @@

 static void cleanup_and_exit(int exit_code)
 {
+
   free_used_memory();
   my_end(my_end_arg);

@@ -4597,6 +4606,12 @@
     mysql_stmt_close(con->stmt);
   con->stmt= 0;

+  /*
+    We issue the extra KILL in order to make sure the connection is truly
+    gone and removed from SHOW PROCESSLIST by the time we continue
+    with the execution of the script
+  */
+  mysql_query(&con->mysql, "KILL CONNECTION_ID()");
   mysql_close(&con->mysql);

   if (con->util_mysql)

Looks reasonable to me.

Your diff shows a spurious empty line at the start of function cleanup_and_exit(); this should be removed.

Otherwise, OK to push.

I find it weird that any "disconnect" should send a query that does "KILL CONNECTION_ID()" to kill itself before disconnecting from the server.

If a test need to do "SHOW PROCESSLIST" I suggest that it first wait a couple of times(with a wait loop) for the MySQL Server to cleanup any closed connections first. Alternatively the subsequent test should perform more selective SHOW.

Magnus,

The semantics of "disconnect" should be such that the connection is truly gone before disconnect returns. simply calling mysql_close() does not provide this guarantee, therefore I came up with the KILL CONNECTION_ID() hack. I understand it looks weird, however I followed the code path through the server and it appears that by the time KILL returns, the connection is guaranteed to be truly gone.

I agree that future test cases should be written defensively with restrictive INFORMATION_SCHEMA queries, however fixing all exising test cases will be too much of a pain. Furthermore, in order to have a restrictive INFORMATION_SCHEMA query, you need to know the CONNECTION_ID() of all legitimate threads so that you can filter out all the illegitimate ones, etc.

Bjorn, I would still prefer that my original patch with KILL is pushed, or some alternative method to fortify --disconnect is used.

I do agree that tests should be written defensively, but it would still be nice to know that --disconnect does indeed disconnect before returning, and this is not currently the case.

Have tried the suggested fix but found it not to be working, though the example test failed more rarely. Seems like there is no good way to *guarantee* the connection being closed immediately.