MySQL Bugs: #87189: Test main.kill is unstable

Bug #87189	Test main.kill is unstable
Submitted:	26 Jul 2017 2:50	Modified:	4 Aug 2017 13:07
Reporter:	Laurynas Biveinis (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Tests: Server	Severity:	S3 (Non-critical)
Version:	8.0.2	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	mtr

Description:
main.kill                                w4 [ fail ]
        Test ended at 2017-07-24 06:16:38

CURRENT_TEST: main.kill
mysqltest: At line 317: query 'reap' succeeded - should have failed with errno 1317...

The result from queries just before the failure was:
< snip >
INSERT INTO t2 VALUES(1);
END|
INSERT INTO t1 VALUES(2),(3);
SET DEBUG_SYNC= 'now WAIT_FOR in_sync';
KILL QUERY @id;
ERROR 70100: Query execution was interrupted
SELECT * FROM t1;
f1
0
1
SELECT * FROM t2;
f2
0
SET DEBUG_SYNC = 'RESET';
DROP TABLE t1, t2;
SET DEBUG_SYNC= 'before_join_optimize SIGNAL in_sync';
PREPARE stmt FROM 'EXPLAIN SELECT * FROM t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12,t13,t14,t15,t16,t17,t18,t19,t20,t21,t22,t23,t24,t25,t26,t27,t28,t29,t30,t31,t32,t33,t34,t35,t36,t37,t38,t39,t40 WHERE a1=a2 AND a2=a3 AND a3=a4 AND a4=a5 AND a5=a6 AND a6=a7 AND a7=a8 AND a8=a9 AND a9=a10 AND a10=a11 AND a11=a12 AND a12=a13 AND a13=a14 AND a14=a15 AND a15=a16 AND a16=a17 AND a17=a18 AND a18=a19 AND a19=a20 AND a20=a21 AND a21=a22 AND a22=a23 AND a23=a24 AND a24=a25 AND a25=a26 AND a26=a27 AND a27=a28 AND a28=a29 AND a29=a30 AND a30=a31 AND a31=a32 AND a32=a33 AND a33=a34 AND a34=a35 AND a35=a36 AND a36=a37 AND a37=a38 AND a38=a39 AND a39=a40  ';
EXECUTE stmt;
SET DEBUG_SYNC= 'now WAIT_FOR in_sync';
KILL QUERY @id;
safe_process[20271]: Child process: 20272, exit: 1

How to repeat:
I saw the above failure exactly once on 8.0.2, and no --parallel and side load could repeat it. However, I believe the cause is seen by testcase analysis.

Suggested fix:
The relevant snippet is

(connection con1)
SET DEBUG_SYNC= 'before_join_optimize SIGNAL in_sync';
eval PREPARE stmt FROM 'EXPLAIN SELECT * $from $where';
send EXECUTE stmt;

connection con2;
SET DEBUG_SYNC= 'now WAIT_FOR in_sync';
KILL QUERY @id;
connection con1;
--error 1317
reap;

Note that con1 thread is not stopped at before_join_optimize, but is allowed to proceed while con2 sets up its debug_sync and kills query. Thus there is a race condition in that con1 may fully complete EXECUTE stmt before con2 KILLs, resulting in the failure above.

This can be fixed by stopping con1 until con2 KILL QUERY is processed:

@@ -305,13 +305,14 @@ while ($i)
 }
 
 --enable_query_log
-SET DEBUG_SYNC= 'before_join_optimize SIGNAL in_sync';
+SET DEBUG_SYNC= 'before_join_optimize SIGNAL in_sync WAIT_FOR continue';
 eval PREPARE stmt FROM 'EXPLAIN SELECT * $from $where';
 send EXECUTE stmt;
 
 connection con2;
 SET DEBUG_SYNC= 'now WAIT_FOR in_sync';
 KILL QUERY @id;
+SET DEBUG_SYNC= 'now SIGNAL continue';
 connection con1;
 --error 1317
 reap;

Hello Laurynas,

Thank you for the report.
I could not repeat the issue at my end(Sierra MacOS, Ubuntu 16.04.02/16.10/17.04, Debian 8 and OL7), anything else required(cmake, os etc details) to trigger the issue? Please let me know.

I have built 8.0.2 w/ and w/o -DWITH_ASAN=ON 

cmake . -DENABLE_DOWNLOADS=1 -DWITH_BOOST=./boost/ -DWITH_DEBUG=ON

Thanks,
Umesh

As I wrote in "How to repeat" above, this has a very low occurrence rate. Since I provided testcase analysis with a suggested change, maybe you can verify this as a "testcase improvement request"?

Thank you Laurynas.
Verifying for the testcase improvement as suggested.

Thanks,
Umesh

Posted by developer:
 
Fixed in 8.0.3.

Work was done for test suite. No changelog entry needed.