Bug #37780 Make KILL reliable (main.kill fails randomly)
Submitted: 1 Jul 2008 17:10 Modified: 12 Dec 2012 21:39
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:6.0-BK,5.4 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: disabled, pushbuild, sporadic, test failure, timeout

[1 Jul 2008 17:10] Alexander Nozdrin
Description:
main.kill                      [ fail ]  timeout

How to repeat:
https://intranet.mysql.com/secure/pushbuild/xref.pl?testname=main.kill
[12 Sep 2008 9:27] John Embretsen
This test times out several times per day in Pushbuild. This results in everyone practically ignoring results from this test, and running it becomes close to useless. Please either disable the test or fix as soon as possible.
[2 Dec 2008 18:03] Sven Sandberg
mtr gives some more debug info in 6.0-rpl: the last few queries from the query log, and the output from "SHOW PROCESSLIST":

main.kill                                [ fail ]  timeout after 900 minutes

Test case timeout after 900 seconds

== /dev/shm/var-ps_stm_threadpool-151/log/kill.log == 
# Switching to connection 'default'
kill query ID;
# Switching to connection 'ddl'
ERROR 70100: Query execution was interrupted
# Two kinds of simple ALTER
alter table t1 rename to t2;
# Switching to connection 'default'
kill query ID;
# Switching to connection 'ddl'
ERROR 70100: Query execution was interrupted
alter table t1 disable keys;
# Switching to connection 'default'
kill query ID;
# Switching to connection 'ddl'
ERROR 70100: Query execution was interrupted
# Fast ALTER
alter table t1 alter column i set default 100;
# Switching to connection 'default'
kill query ID;
# Switching to connection 'ddl'

 == /dev/shm/var-ps_stm_threadpool-151/tmp/analyze-timeout-mysqld.1.err ==
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info
245	root	localhost	NULL	Query	0	NULL	SHOW PROCESSLIST

 - saving '/dev/shm/var-ps_stm_threadpool-151/log/main.kill/' to '/dev/shm/var-ps_stm_threadpool-151/log/main.kill/'

Retrying test, attempt(2/3)...
[3 Dec 2008 9:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60474

2718 Horst Hunger	2008-12-03
      temporary fix for Bug#37780: A short analysis of the test shows that it needs a replacement of the sleeps and as "In most cases, it might take some time for the thread to die, because the kill flag is checked only at specific intervals" (user manual) to carfully check that and insert waiting for that event.
[17 Dec 2008 14:10] Sven Sandberg
In 6.0-rpl, mtr has recently been improved so that it prints the last few lines from the query log when timeouts happen. This means we have better debug info for this bug now. See this xref: http://tinyurl.com/6x9ay5
[27 Dec 2008 14:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62361

2809 Ingo Struewing	2008-12-27
      Bug#37780 - main.kill fails randomly
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a thread went
      reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a thread went
      waiting on a condition variable.
      
      3. The Debug Sync Facility implementation had flaws that prevented safe
      synchronization is some cases.
      
      These problems have been solved as follows. Please see also added code
      comments for more details.
      
      1. There is no safe way to detect, when a thread enters the blocking
      state of a read(2) or recv(2) system call, where it can be interrupted
      by a signal. Hence it is not possible to wait for the right moment to
      send a kill signal. To be safe, we need to close the connection. This
      has the disadvantage, that we cannot send a status or error message.
      This patch tries to close the read direction of a socket only, which
      avoids this problem, but terminates a read(2) or recv(2) anyway. If it
      is not possible to keep the write direction open, we close the
      connection completely to be safe.
      
      2. Before waiting on a condition variable, we register it together with
      a synchronizating mutex in THD::mysys_var. After this, we need to test
      THD::killed again. At some places we did only test it in a loop
      condition before the registration. When THD::killed had been set between
      this test and the registration, we entered waiting without noticing the
      killed flag.
      
      3a. The string value for the DEBUG_SYNC system variable received '\0'
      terminators. It could not be re-used. This was a problem for setting
      sync points in stored procedures/functions. It worked only with the
      first execution of the procedure/function. Using a user variable instead
      of a string literal did not work (e.g. SET DEBUG_SYNC= @a). Fixed by a
      correct retrieval of the set value and ensuring a copy of the string.
      
      3b. A Debug Sync point can time out and it be killed. In both cases it
      sets the THD::killed flag. When it times out, it reports an error. When
      it is killed, it does not report an error. When it does report an error,
      it must return a TRUE value, so that the calling function does not add
      send_ok(). The detection of time out was based on the killed flag. This
      failed when the sync point was killed. Fixed by making the return value
      dependend from the error reporting state of the thread and not from the
      killed flag.
      
      In addition to the above, a re-write of the main.kill test case has been
      done. Most sleeps have been replaced by Debug Sync Facility
      synchronization. The test case run time decreased from over 30 to below
      three seconds. A couple of sync points have been added to the server
      code. the declarations have been moved from mysql_priv.h to the new file
      debug_sync.h. This was required to place sync points the the meta data
      locking code.
[29 Dec 2008 10:25] Ingo Strüwing
The patch does not work on Windows. A modified patch will follow.
[11 Jan 2009 18:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62944

2749 Ingo Struewing	2009-01-11
      Bug#37780 - main.kill fails randomly
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a thread went
      reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a thread went
      waiting on a condition variable.
      
      3. The Debug Sync Facility implementation had flaws that prevented safe
      synchronization is some cases.
      
      These problems have been solved as follows. Please see also added code
      comments for more details.
      
      1. There is no safe way to detect, when a thread enters the blocking
      state of a read(2) or recv(2) system call, where it can be interrupted
      by a signal. Hence it is not possible to wait for the right moment to
      send a kill signal. To be safe, we need to close the connection. This
      has the disadvantage, that we cannot send a status or error message.
      This patch tries to close the read direction of a socket only, which
      avoids this problem, but terminates a read(2) or recv(2) anyway. If it
      is not possible to keep the write direction open, we close the
      connection completely to be safe.
      
      2. Before waiting on a condition variable, we register it together with
      a synchronizating mutex in THD::mysys_var. After this, we need to test
      THD::killed again. At some places we did only test it in a loop
      condition before the registration. When THD::killed had been set between
      this test and the registration, we entered waiting without noticing the
      killed flag.
      
      3a. The string value for the DEBUG_SYNC system variable received '\0'
      terminators. It could not be re-used. This was a problem for setting
      sync points in stored procedures/functions. It worked only with the
      first execution of the procedure/function. Using a user variable instead
      of a string literal did not work (e.g. SET DEBUG_SYNC= @a). Fixed by a
      correct retrieval of the set value and ensuring a copy of the string.
      
      3b. A Debug Sync point can time out and it can be killed. In both cases it
      sets the THD::killed flag. When it times out, it reports an error. When
      it is killed, it does not report an error. When it does report an error,
      it must return a TRUE value, so that the calling function does not add
      send_ok(). The detection of time out was based on the killed flag. This
      failed when the sync point was killed. Fixed by making the return value
      dependend from the error reporting state of the thread and not from the
      killed flag.
      
      In addition to the above, a re-write of the main.kill test case has been
      done. Most sleeps have been replaced by Debug Sync Facility
      synchronization. The test case run time decreased from over 30 to below
      three seconds. A couple of sync points have been added to the server
      code. the declarations have been moved from mysql_priv.h to the new file
      debug_sync.h. This was required to place sync points the the meta data
      locking code.
[15 Jan 2009 6:38] Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[19 Jan 2009 11:28] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:06] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 16:12] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:58] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[2 Feb 2009 8:28] Ingo Strüwing
The proposed close of the read direction is to be investigated in more depth. At the moment the bug doesn't have sufficient priority to do this. Hence back to 'verified' for now.
[13 Feb 2009 17:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66266

2724 Alexander Nozdrin	2009-02-13
      Disable WL#3726-part of main.kill test case in order to avoid random failures.
      Noted in Bug#37780.
      modified:
        mysql-test/r/kill.result
        mysql-test/t/kill.test

=== modified file 'mysql-test/r/kill.result'
--- a/mysql-test/r/kill.result	2008-08-18 05:43:50 +0000
+++ b/mysql-test/r/kill.result	2009-02-13 17:00:42 +0000
@@ -138,107 +138,4 @@ KILL CONNECTION_ID();
 # of close of the connection socket
 SELECT 1;
 Got one of the listed errors
-#
-# Additional test for WL#3726 "DDL locking for all metadata objects"
-# Check that DDL and DML statements waiting for metadata locks can
-# be killed. Note that we don't cover all situations here since it
-# can be tricky to write test case for some of them (e.g. REPAIR or
-# ALTER and other statements under LOCK TABLES).
-#
-drop tables if exists t1, t2, t3;
-create table t1 (i int primary key);
-# Test for RENAME TABLE
-# Switching to connection 'blocker'
-lock table t1 read;
-# Switching to connection 'ddl'
-rename table t1 to t2;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Test for DROP TABLE
-drop table t1;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Test for CREATE TRIGGER
-create trigger t1_bi before insert on t1 for each row set @a:=1;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-#
-# Tests for various kinds of ALTER TABLE
-#
-# Full-blown ALTER which should copy table
-alter table t1 add column j int;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Two kinds of simple ALTER
-alter table t1 rename to t2;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-alter table t1 disable keys;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Fast ALTER
-alter table t1 alter column i set default 100;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Special case which is triggered only for MERGE tables.
-# Switching to connection 'blocker'
-unlock tables;
-create table t2 (i int primary key) engine=merge union=(t1);
-lock tables t2 read;
-# Switching to connection 'ddl'
-alter table t2 alter column i set default 100;
-# Switching to connection 'default'
-kill query ID;
-# Switching to connection 'ddl'
-ERROR 70100: Query execution was interrupted
-# Test for DML waiting for meta-data lock
-# Switching to connection 'blocker'
-unlock tables;
-drop table t2;
-create table t2 (k int);
-lock tables t1 read;
-# Switching to connection 'ddl'
-rename tables t1 to t3, t2 to t1;
-# Switching to connection 'dml'
-insert into t2 values (1);
-# Switching to connection 'default'
-kill query ID2;
-# Switching to connection 'dml'
-ERROR 70100: Query execution was interrupted
-# Switching to connection 'blocker'
-unlock tables;
-# Switching to connection 'ddl'
-# Test for DML waiting for tables to be flushed
-# Switching to connection 'blocker'
-lock tables t1 read;
-# Switching to connection 'ddl'
-# Let us mark locked table t1 as old
-flush tables;
-# Switching to connection 'dml'
-select * from t1;
-# Switching to connection 'default'
-kill query ID2;
-# Switching to connection 'dml'
-ERROR 70100: Query execution was interrupted
-# Switching to connection 'blocker'
-unlock tables;
-# Switching to connection 'ddl'
-# Cleanup.
-# Switching to connection 'default'
-drop table t3;
-drop table t1;
 set @@global.concurrent_insert= @old_concurrent_insert;

=== modified file 'mysql-test/t/kill.test'
--- a/mysql-test/t/kill.test	2008-08-18 05:43:50 +0000
+++ b/mysql-test/t/kill.test	2009-02-13 17:00:42 +0000
@@ -328,243 +328,247 @@ KILL CONNECTION_ID();
 SELECT 1;
 --connection default
 
---echo #
---echo # Additional test for WL#3726 "DDL locking for all metadata objects"
---echo # Check that DDL and DML statements waiting for metadata locks can
---echo # be killed. Note that we don't cover all situations here since it
---echo # can be tricky to write test case for some of them (e.g. REPAIR or
---echo # ALTER and other statements under LOCK TABLES).
---echo #
---disable_warnings
-drop tables if exists t1, t2, t3;
---enable_warnings
-
-create table t1 (i int primary key);
-connect (blocker, localhost, root, , );
-connect (dml, localhost, root, , );
-connect (ddl, localhost, root, , );
-
---echo # Test for RENAME TABLE
---echo # Switching to connection 'blocker'
-connection blocker;
-lock table t1 read;
---echo # Switching to connection 'ddl'
-connection ddl;
-let $ID= `select connection_id()`;
---send rename table t1 to t2
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and info = "rename table t1 to t2";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
-
---echo # Test for DROP TABLE
---send drop table t1
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "drop table t1";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
-
---echo # Test for CREATE TRIGGER
---send create trigger t1_bi before insert on t1 for each row set @a:=1
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "create trigger t1_bi before insert on t1 for each row set @a:=1";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
-
---echo #
---echo # Tests for various kinds of ALTER TABLE
---echo #
---echo # Full-blown ALTER which should copy table
---send alter table t1 add column j int
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "alter table t1 add column j int";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
-
---echo # Two kinds of simple ALTER
---send alter table t1 rename to t2
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "alter table t1 rename to t2";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
---send alter table t1 disable keys
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "alter table t1 disable keys";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
---echo # Fast ALTER
---send alter table t1 alter column i set default 100
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "alter table t1 alter column i set default 100";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
---echo # Special case which is triggered only for MERGE tables.
---echo # Switching to connection 'blocker'
-connection blocker;
-unlock tables;
-create table t2 (i int primary key) engine=merge union=(t1);
-lock tables t2 read;
---echo # Switching to connection 'ddl'
-connection ddl;
---send alter table t2 alter column i set default 100
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "alter table t2 alter column i set default 100";
---source include/wait_condition.inc
---replace_result $ID ID
-eval kill query $ID;
---echo # Switching to connection 'ddl'
-connection ddl;
---error ER_QUERY_INTERRUPTED
---reap
-
---echo # Test for DML waiting for meta-data lock
---echo # Switching to connection 'blocker'
-connection blocker;
-unlock tables;
-drop table t2;
-create table t2 (k int);
-lock tables t1 read;
---echo # Switching to connection 'ddl'
-connection ddl;
-# Let us add pending exclusive metadata lock on t2
---send rename tables t1 to t3, t2 to t1
---echo # Switching to connection 'dml'
-connection dml;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "rename tables t1 to t3, t2 to t1";
---source include/wait_condition.inc
-let $ID2= `select connection_id()`;
---send insert into t2 values (1)
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "insert into t2 values (1)";
---source include/wait_condition.inc
---replace_result $ID2 ID2
-eval kill query $ID2;
---echo # Switching to connection 'dml'
-connection dml;
---error ER_QUERY_INTERRUPTED
---reap
---echo # Switching to connection 'blocker'
-connection blocker;
-unlock tables;
---echo # Switching to connection 'ddl'
-connection ddl;
---reap
-
---echo # Test for DML waiting for tables to be flushed
---echo # Switching to connection 'blocker'
-connection blocker;
-lock tables t1 read;
---echo # Switching to connection 'ddl'
-connection ddl;
---echo # Let us mark locked table t1 as old
---send flush tables
---echo # Switching to connection 'dml'
-connection dml;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Flushing tables" and
-        info = "flush tables";
---source include/wait_condition.inc
---send select * from t1
---echo # Switching to connection 'default'
-connection default;
-let $wait_condition=
-  select count(*) = 1 from information_schema.processlist
-  where state = "Waiting for table" and
-        info = "select * from t1";
---source include/wait_condition.inc
---replace_result $ID2 ID2
-eval kill query $ID2;
---echo # Switching to connection 'dml'
-connection dml;
---error ER_QUERY_INTERRUPTED
---reap
---echo # Switching to connection 'blocker'
-connection blocker;
-unlock tables;
---echo # Switching to connection 'ddl'
-connection ddl;
---reap
-
---echo # Cleanup.
---echo # Switching to connection 'default'
-connection default;
-drop table t3;
-drop table t1;
-
+###########################################################################
+#
+#
+#
+# --echo #
+# --echo # Additional test for WL#3726 "DDL locking for all metadata objects"
+# --echo # Check that DDL and DML statements waiting for metadata locks can
+# --echo # be killed. Note that we don't cover all situations here since it
+# --echo # can be tricky to write test case for some of them (e.g. REPAIR or
+# --echo # ALTER and other statements under LOCK TABLES).
+# --echo #
+# --disable_warnings
+# drop tables if exists t1, t2, t3;
+# --enable_warnings
+# 
+# create table t1 (i int primary key);
+# connect (blocker, localhost, root, , );
+# connect (dml, localhost, root, , );
+# connect (ddl, localhost, root, , );
+# 
+# --echo # Test for RENAME TABLE
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# lock table t1 read;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# let $ID= `select connection_id()`;
+# --send rename table t1 to t2
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and info = "rename table t1 to t2";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# 
+# --echo # Test for DROP TABLE
+# --send drop table t1
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "drop table t1";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# 
+# --echo # Test for CREATE TRIGGER
+# --send create trigger t1_bi before insert on t1 for each row set @a:=1
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "create trigger t1_bi before insert on t1 for each row set @a:=1";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# 
+# --echo #
+# --echo # Tests for various kinds of ALTER TABLE
+# --echo #
+# --echo # Full-blown ALTER which should copy table
+# --send alter table t1 add column j int
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "alter table t1 add column j int";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# 
+# --echo # Two kinds of simple ALTER
+# --send alter table t1 rename to t2
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "alter table t1 rename to t2";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# --send alter table t1 disable keys
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "alter table t1 disable keys";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# --echo # Fast ALTER
+# --send alter table t1 alter column i set default 100
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "alter table t1 alter column i set default 100";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# --echo # Special case which is triggered only for MERGE tables.
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# unlock tables;
+# create table t2 (i int primary key) engine=merge union=(t1);
+# lock tables t2 read;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --send alter table t2 alter column i set default 100
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "alter table t2 alter column i set default 100";
+# --source include/wait_condition.inc
+# --replace_result $ID ID
+# eval kill query $ID;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# 
+# --echo # Test for DML waiting for meta-data lock
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# unlock tables;
+# drop table t2;
+# create table t2 (k int);
+# lock tables t1 read;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# # Let us add pending exclusive metadata lock on t2
+# --send rename tables t1 to t3, t2 to t1
+# --echo # Switching to connection 'dml'
+# connection dml;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "rename tables t1 to t3, t2 to t1";
+# --source include/wait_condition.inc
+# let $ID2= `select connection_id()`;
+# --send insert into t2 values (1)
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "insert into t2 values (1)";
+# --source include/wait_condition.inc
+# --replace_result $ID2 ID2
+# eval kill query $ID2;
+# --echo # Switching to connection 'dml'
+# connection dml;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# unlock tables;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --reap
+# 
+# --echo # Test for DML waiting for tables to be flushed
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# lock tables t1 read;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --echo # Let us mark locked table t1 as old
+# --send flush tables
+# --echo # Switching to connection 'dml'
+# connection dml;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Flushing tables" and
+#         info = "flush tables";
+# --source include/wait_condition.inc
+# --send select * from t1
+# --echo # Switching to connection 'default'
+# connection default;
+# let $wait_condition=
+#   select count(*) = 1 from information_schema.processlist
+#   where state = "Waiting for table" and
+#         info = "select * from t1";
+# --source include/wait_condition.inc
+# --replace_result $ID2 ID2
+# eval kill query $ID2;
+# --echo # Switching to connection 'dml'
+# connection dml;
+# --error ER_QUERY_INTERRUPTED
+# --reap
+# --echo # Switching to connection 'blocker'
+# connection blocker;
+# unlock tables;
+# --echo # Switching to connection 'ddl'
+# connection ddl;
+# --reap
+# 
+# --echo # Cleanup.
+# --echo # Switching to connection 'default'
+# connection default;
+# drop table t3;
+# drop table t1;
+# 
 ###########################################################################
 
 # Restore global concurrent_insert value. Keep in the end of the test file.

-- 
MySQL Code Commits Mailing List
For list archives: http://lists.mysql.com/commits
To unsubscribe:    http://lists.mysql.com/commits?unsub=commits@bugs.mysql.com
[16 Feb 2009 18:08] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090216180446-dl1xovi02kbd2fgn) (version source revid:sergefp@mysql.com-20090216083955-st77hpz1lz3o2wli) (merge vers: 6.0.10-alpha) (pib:6)
[25 Feb 2009 15:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67570

2779 Ingo Struewing	2009-02-25
      Bug#37780 - main.kill fails randomly
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a thread went
      reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a thread went
      waiting on a condition variable.
      
      These problems have been solved as follows. Please see also added code
      comments for more details.
      
      1. There is no safe way to detect, when a thread enters the blocking
      state of a read(2) or recv(2) system call, where it can be interrupted
      by a signal. Hence it is not possible to wait for the right moment to
      send a kill signal. To be safe, we need to close the connection.
      
      2. Before waiting on a condition variable, we register it together with
      a synchronizating mutex in THD::mysys_var. After this, we need to test
      THD::killed again. At some places we did only test it in a loop
      condition before the registration. When THD::killed had been set between
      this test and the registration, we entered waiting without noticing the
      killed flag.
      
      In addition to the above, a re-write of the main.kill test case has been
      done. Most sleeps have been replaced by Debug Sync Facility
      synchronization. The test case run time decreased from over 30 to below
      three seconds. A couple of sync points have been added to the server
      code.
     @ mysql-test/r/kill.result
        Bug#37780 - main.kill fails randomly
        Updated test result.
     @ mysql-test/t/kill.test
        Bug#37780 - main.kill fails randomly
        Re-wrote test case to use Debug Sync points instead of sleeps.
     @ sql/event_queue.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in Event_queue::cond_wait().
     @ sql/lock.cc
        Bug#37780 - main.kill fails randomly
        Moved Debug Sync points behind enter_cond().
        Fixed comments.
     @ sql/mdl.cc
        Bug#37780 - main.kill fails randomly
        Fixed a compiler warning.
        Added Debug Sync points.
     @ sql/slave.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in start_slave_thread().
     @ sql/sql_base.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in close_cached_tables() and
        tdc_wait_for_old_versions().
     @ sql/sql_class.cc
        Bug#37780 - main.kill fails randomly
        Fixed and added comments.
     @ sql/sql_class.h
        Bug#37780 - main.kill fails randomly
        Unconditionally enabled SIGNAL_WITH_VIO_CLOSE with a comment.
     @ sql/sql_parse.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in do_command().
     @ sql/sql_select.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in JOIN::optimize().
[13 May 2009 13:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73942

2802 Ingo Struewing	2009-05-13
      Bug#37780 - main.kill fails randomly
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a thread went
      reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a thread went
      waiting on a condition variable.
      
      These problems have been solved as follows. Please see also added code
      comments for more details.
      
      1. There is no safe way to detect, when a thread enters the blocking
      state of a read(2) or recv(2) system call, where it can be interrupted
      by a signal. Hence it is not possible to wait for the right moment to
      send a kill signal. To be safe, we need to close the connection before
      sending a kill signal. If the signal arrives before start of read, read
      fails on the closed connection.
      
      2. Before waiting on a condition variable, we register it together with
      a synchronizating mutex in THD::mysys_var. After this, we need to test
      THD::killed again. At some places we did only test it in a loop
      condition before the registration. When THD::killed had been set between
      this test and the registration, we entered waiting without noticing the
      killed flag.
      
      In addition to the above, a re-write of the main.kill test case has been
      done. All sleeps have been replaced by Debug Sync Facility
      synchronization. The test case run time decreased from over 30 to below
      three seconds. A couple of sync points have been added to the server
      code.
     @ include/config-netware.h
        Bug#37780 - main.kill fails randomly
        Removed SIGNAL_WITH_VIO_CLOSE.
     @ include/config-win.h
        Bug#37780 - main.kill fails randomly
        Removed SIGNAL_WITH_VIO_CLOSE.
     @ mysql-test/r/kill.result
        Bug#37780 - main.kill fails randomly
        Updated test result.
     @ mysql-test/t/disabled.def
        Bug#37780 - main.kill fails randomly
        Re-enabled test case.
     @ mysql-test/t/kill.test
        Bug#37780 - main.kill fails randomly
        Re-wrote test case to use Debug Sync points instead of sleeps.
     @ sql/event_queue.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in Event_queue::cond_wait()
        by adding a check after enter_cond().
     @ sql/lock.cc
        Bug#37780 - main.kill fails randomly
        Moved Debug Sync points behind enter_cond().
        Fixed comments.
     @ sql/mdl.cc
        Bug#37780 - main.kill fails randomly
        Removed an unused sync point.
     @ sql/slave.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in start_slave_thread()
        by adding a check after enter_cond().
        Removed SIGNAL_WITH_VIO_CLOSE.
     @ sql/sql_base.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in close_cached_tables() and
        tdc_wait_for_old_versions()
        by adding checks after enter_cond().
     @ sql/sql_class.cc
        Bug#37780 - main.kill fails randomly
        Removed SIGNAL_WITH_VIO_CLOSE.
        Swapped order of kill and close in THD::awake().
        Added comments.
     @ sql/sql_class.h
        Bug#37780 - main.kill fails randomly
        Removed SIGNAL_WITH_VIO_CLOSE.
        Added a comment to THD::killed.
     @ sql/sql_connect.cc
        Bug#37780 - main.kill fails randomly
        Removed SIGNAL_WITH_VIO_CLOSE.
     @ sql/sql_parse.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in do_command().
     @ sql/sql_select.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in JOIN::optimize().
     @ vio/viosocket.c
        Bug#37780 - main.kill fails randomly
        Added DBUG_PRINTs.
[28 Jul 2009 17:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79439

2847 Ingo Struewing	2009-07-28
      Bug#37780 - main.kill fails randomly
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a thread went
      reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a thread went
      waiting on a condition variable.
      
      These problems have been solved as follows. Please see also added code
      comments for more details.
      
      1. There is no safe way to detect, when a thread enters the blocking
      state of a read(2) or recv(2) system call, where it can be interrupted
      by a signal. Hence it is not possible to wait for the right moment to
      send a kill signal. It has been decided, not to fix it in the code.
      Instead, the test case repeats the KILL statement until the connection
      terminates.
      
      2. Before waiting on a condition variable, we register it together with
      a synchronizating mutex in THD::mysys_var. After this, we need to test
      THD::killed again. At some places we did only test it in a loop
      condition before the registration. When THD::killed had been set between
      this test and the registration, we entered waiting without noticing the
      killed flag. Additional checks ahve been introduced where required.
      
      In addition to the above, a re-write of the main.kill test case has been
      done. All sleeps have been replaced by Debug Sync Facility
      synchronization. A couple of sync points have been added to the server
      code.
      
      To avoid further problems, if the test case fails in spite of the fixes,
      the test case has been added to the "experimental" list for now.
      
      The formerly disabled test case has been re-emabled.
     @ mysql-test/collections/default.experimental
        Bug#37780 - main.kill fails randomly
        Added main.kill.
     @ mysql-test/r/kill.result
        Bug#37780 - main.kill fails randomly
        Updated test result.
     @ mysql-test/t/disabled.def
        Bug#37780 - main.kill fails randomly
        Removed main.kill.
     @ mysql-test/t/kill.test
        Bug#37780 - main.kill fails randomly
        Re-wrote test case to use Debug Sync points instead of sleeps.
     @ sql/event_queue.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in Event_queue::cond_wait()
        by adding a check after enter_cond().
     @ sql/lock.cc
        Bug#37780 - main.kill fails randomly
        Moved Debug Sync points behind enter_cond().
        Fixed comments.
     @ sql/slave.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in start_slave_thread()
        by adding a check after enter_cond().
     @ sql/sql_base.cc
        Bug#37780 - main.kill fails randomly
        Fixed kill detection in close_cached_tables() and
        tdc_wait_for_old_versions()
        by adding checks after enter_cond().
     @ sql/sql_class.cc
        Bug#37780 - main.kill fails randomly
        Swapped order of kill and close in THD::awake().
        Added comments.
     @ sql/sql_class.h
        Bug#37780 - main.kill fails randomly
        Added a comment to THD::killed.
     @ sql/sql_parse.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in do_command().
     @ sql/sql_select.cc
        Bug#37780 - main.kill fails randomly
        Added a sync point in JOIN::optimize().
[5 Aug 2009 7:51] Lars Thalmann
On Tue, Aug 04, 2009 at 08:13:33PM +0200, Ingo Strüwing wrote:
> The post-review changes were non-trivial, so I asked Konstantin, if he
> wants to have another look. I have no reply so far.

I've unchecked Kostjas review box, since a new review might be needed.
[6 Nov 2009 18:55] Ingo Strüwing
Won't fix. No review since 13 weeks. Someone else can complete it, if required.
[12 Jul 2010 8:07] Dmitry Shulga
Bug#52528 was marked a duplicate of this bug.
[21 Oct 2010 12:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121513

3263 Davi Arnaut	2010-10-21
      Bug#37780: Make KILL reliable (main.kill fails randomly)
      
      - A prerequisite cleanup patch for making KILL reliable.
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a
      thread went reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a
      thread went waiting on a condition variable.
      
      These problems have been solved as follows. Please see also added
      code comments for more details.
      
      1. There is no safe way to detect, when a thread enters the
      blocking state of a read(2) or recv(2) system call, where it
      can be interrupted by a signal. Hence it is not possible to wait
      for the right moment to send a kill signal. It has been decided,
      not to fix it in the code.  Instead, the test case repeats the
      KILL statement until the connection terminates.
      
      2. Before waiting on a condition variable, we register it together
      with a synchronizating mutex in THD::mysys_var. After this, we
      need to test THD::killed again. At some places we did only test
      it in a loop condition before the registration. When THD::killed
      had been set between this test and the registration, we entered
      waiting without noticing the killed flag. Additional checks ahve
      been introduced where required.
      
      In addition to the above, a re-write of the main.kill test case has
      been done. All sleeps have been replaced by Debug Sync Facility
      synchronization. A couple of sync points have been added to the
      server code.
      
      To avoid further problems, if the test case fails in spite of
      the fixes, the test case has been added to the "experimental"
      list for now.
      
      - Most of the work on this patch is authored by Ingo Struewing
     @ mysql-test/t/kill.test
        Re-wrote test case to use Debug Sync points instead of sleeps.
     @ sql/event_queue.cc
        Fixed kill detection in Event_queue::cond_wait() by adding a check
        after enter_cond().
     @ sql/lock.cc
        Moved Debug Sync points behind enter_cond().
        Fixed comments.
     @ sql/slave.cc
        Fixed kill detection in start_slave_thread() by adding a check
        after enter_cond().
     @ sql/sql_class.cc
        Swapped order of kill and close in THD::awake().
        Added comments.
     @ sql/sql_class.h
        Added a comment to THD::killed.
     @ sql/sql_parse.cc
        Added a sync point in do_command().
     @ sql/sql_select.cc
        Added a sync point in JOIN::optimize().
[22 Oct 2010 11:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121677

3172 Davi Arnaut	2010-10-22
      Bug#37780: Make KILL reliable (main.kill fails randomly)
      
      - A prerequisite cleanup patch for making KILL reliable.
      
      The test case main.kill did not work reliably.
      
      The following problems have been identified:
      
      1. A kill signal could go lost if it came in, short before a
      thread went reading on the client connection.
      
      2. A kill signal could go lost if it came in, short before a
      thread went waiting on a condition variable.
      
      These problems have been solved as follows. Please see also
      added code comments for more details.
      
      1. There is no safe way to detect, when a thread enters the
      blocking state of a read(2) or recv(2) system call, where it
      can be interrupted by a signal. Hence it is not possible to
      wait for the right moment to send a kill signal. It has been
      decided, not to fix it in the code.  Instead, the test case
      repeats the KILL statement until the connection terminates.
      
      2. Before waiting on a condition variable, we register it
      together with a synchronizating mutex in THD::mysys_var. After
      this, we need to test THD::killed again. At some places we did
      only test it in a loop condition before the registration. When
      THD::killed had been set between this test and the registration,
      we entered waiting without noticing the killed flag. Additional
      checks ahve been introduced where required.
      
      In addition to the above, a re-write of the main.kill test
      case has been done. All sleeps have been replaced by Debug
      Sync Facility synchronization. A couple of sync points have
      been added to the server code.
      
      To avoid further problems, if the test case fails in spite of
      the fixes, the test case has been added to the "experimental"
      list for now.
      
      - Most of the work on this patch is authored by Ingo Struewing
     @ mysql-test/t/kill.test
        Re-wrote test case to use Debug Sync points instead of sleeps
     @ sql/event_queue.cc
        Fixed kill detection in Event_queue::cond_wait() by adding a check
        after enter_cond().
     @ sql/lock.cc
        Moved Debug Sync points behind enter_cond().
        Fixed comments.
     @ sql/slave.cc
        Fixed kill detection in start_slave_thread() by adding a check
        after enter_cond().
     @ sql/sql_class.cc
        Swapped order of kill and close in THD::awake().
        Added comments.
     @ sql/sql_class.h
        Added a comment to THD::killed.
     @ sql/sql_parse.cc
        Added a sync point in do_command().
     @ sql/sql_select.cc
        Added a sync point in JOIN::optimize().
[22 Oct 2010 12:04] Davi Arnaut
Prerequisite patch queued to mysql-5.5-runtime.
[2 Dec 2010 11:28] Davi Arnaut
Bug#58625 has been closed as a duplicate of this one.
[5 Dec 2010 12:42] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[16 Dec 2010 22:32] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)
[12 Dec 2012 21:39] Paul DuBois
Noted in 5.6.9, 5.7.0 changelogs.

On Mac OS X, KILL could sometimes be unreliable.