Bug #44823 mysqltest "disconnect" function returns before completing the disconnect
Submitted: 12 May 2009 13:30 Modified: 6 Apr 2011 13:52
Reporter: Philip Stoev Email Updates:
Status: Won't fix Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Bjørn Munch CPU Architecture:Any

[12 May 2009 13:30] Philip Stoev
Description:
The "disconnect" function as provided by mysqltest is not sufficiently reliable. It returns before the disconnect is complete at the server, meaning that a follow-up SHOW PROCESSLIST query may still show the connection. Those connections cause random test failures.

A reading of the code shows that mysqltest uses mysql_close() to terminate the connection. In order to fortify that, KILL CONNECTION_ID() will be appled as well.

How to repeat:
see bug http://bugs.mysql.com/bug.php?id=41255

Suggested fix:
 Currently, mysqltest uses mysql_close() to terminate 
the connection, and it does not check the exit value of this code. 
Therefore, we can do the following sequence on "disconnect".

1. mysqltest executes "KILL CONNECTION_ID()"
2. server marks the thread as Killed, removes it from PROCESSLIST and then 
returns to the client
3. mysqltest executes mysql_close()
4. test continues

Based on my reading of the code, by the time #2 is done, the connection will 
no longer show up in PROCESSLIST.
[13 May 2009 7:17] Philip Stoev
Suggested patch:

      Bug #44823 mysqltest "disconnect" function returns before completing the disconnect

      Matthias reports that after a mysqltest disconnect command, there is a window of time
      during which the connection continues to exist. This causes issues with SHOW PROCESSLIST

      and I_S queries if they follow the disconnect, even across test boundaries. This has
      caused numerous sporadic test failures in the past.

      Fixed by issuing a KILL CONNECTION_ID() for each explicit "disconnect", and for all
      connections at the end of the test. Examination of the code in the server reveals that
      by the time KILL returns, the connection is provably gone and will not show up in
      SHOW PROCESSLIST or I_S.PROCESSLIST.

=== modified file 'client/mysqltest.cc'
--- client/mysqltest.cc 2009-04-28 18:26:31 +0000
+++ client/mysqltest.cc 2009-05-13 07:12:58 +0000
@@ -1070,6 +1070,14 @@
     if (next_con->stmt)
       mysql_stmt_close(next_con->stmt);
     next_con->stmt= 0;
+
+    /*
+      We issue the extra KILL in order to make sure that all connections
+      are truly gone before mysqltest exits and the next test starts.
+    */
+    if (next_con->name_len > 0)
+      mysql_query(&next_con->mysql, "KILL CONNECTION_ID()");
+
     mysql_close(&next_con->mysql);
     if (next_con->util_mysql)
       mysql_close(next_con->util_mysql);
@@ -1154,6 +1162,7 @@

 static void cleanup_and_exit(int exit_code)
 {
+
   free_used_memory();
   my_end(my_end_arg);

@@ -4597,6 +4606,12 @@
     mysql_stmt_close(con->stmt);
   con->stmt= 0;

+  /*
+    We issue the extra KILL in order to make sure the connection is truly
+    gone and removed from SHOW PROCESSLIST by the time we continue
+    with the execution of the script
+  */
+  mysql_query(&con->mysql, "KILL CONNECTION_ID()");
   mysql_close(&con->mysql);

   if (con->util_mysql)
[13 May 2009 10:13] Bjørn Munch
Looks reasonable to me.

Your diff shows a spurious empty line at the start of function cleanup_and_exit(); this should be removed.

Otherwise, OK to push.
[13 May 2009 12:31] Magnus Blåudd
I find it weird that any "disconnect" should send a query that does "KILL CONNECTION_ID()" to kill itself before disconnecting from the server.

If a test need to do "SHOW PROCESSLIST" I suggest that it first wait a couple of times(with a wait loop) for the MySQL Server to cleanup any closed connections first. Alternatively the subsequent test should perform more selective SHOW.
[13 May 2009 13:12] Philip Stoev
Magnus,

The semantics of "disconnect" should be such that the connection is truly gone before disconnect returns. simply calling mysql_close() does not provide this guarantee, therefore I came up with the KILL CONNECTION_ID() hack. I understand it looks weird, however I followed the code path through the server and it appears that by the time KILL returns, the connection is guaranteed to be truly gone.

I agree that future test cases should be written defensively with restrictive INFORMATION_SCHEMA queries, however fixing all exising test cases will be too much of a pain. Furthermore, in order to have a restrictive INFORMATION_SCHEMA query, you need to know the CONNECTION_ID() of all legitimate threads so that you can filter out all the illegitimate ones, etc.
[6 Apr 2010 8:16] Philip Stoev
Bjorn, I would still prefer that my original patch with KILL is pushed, or some alternative method to fortify --disconnect is used.

I do agree that tests should be written defensively, but it would still be nice to know that --disconnect does indeed disconnect before returning, and this is not currently the case.
[6 Apr 2011 13:52] Bjørn Munch
Have tried the suggested fix but found it not to be working, though the example test failed more rarely. Seems like there is no good way to *guarantee* the connection being closed immediately.