| Bug #52528 | Protocol desynchronisation between client and server on a DDL workload | ||
|---|---|---|---|
| Submitted: | 1 Apr 2010 15:17 | Modified: | 12 Jul 2010 8:04 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | mysql-trunk,mysql-trunk-runtime-exp, 5.1, next-mr | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[1 Apr 2010 15:46]
Philip Stoev
Last queries from the thread that handged:
152 Query SET SESSION SQL_MODE=''
152 Query SAVEPOINT A
152 Query SAVEPOINT B
152 Query FLUSH TABLE testdb_S . t1_temp1_S
152 Query COMMIT WORK
152 Query UNLOCK TABLES
152 Query FLUSH TABLE testdb_S . t1_view1_N
152 Query ROLLBACK WORK
152 Query COMMIT WORK NO RELEASE
152 Query CREATE TRIGGER testdb_N . tr1_1_N BEFORE DELETE ON t1_base2_N FOR EACH ROW BEGIN REPLACE testdb_N . t1_temp2_N ( `col_int` ) VALUES ( 6 ) ; END
152 Query SELECT * FROM test.table1_int_autoinc INTO OUTFILE '/tmp/gentest18505.tmp'
152 Query LOAD DATA INFILE '/tmp/gentest18505.tmp' INTO TABLE testdb_S . t1_temp1_S
152 Query START TRANSACTION
152 Query SELECT * FROM test.table1_int_autoinc INTO OUTFILE '/tmp/gentest18505.tmp'
152 Query LOAD DATA INFILE '/tmp/gentest18505.tmp' INTO TABLE testdb_N . t1_base1_N
152 Query /* Sequence start */ CREATE FUNCTION testdb_S . f1_2_S () RETURNS INTEGER RETURN ( SELECT MOD( COUNT( DISTINCT `pk` ) , 10 ) FROM testdb_S . t1_view2_S ) ; COMMIT ; SELECT SLEEP( 0.5 * 0.94 * 1 ) ; DROP FUNCTION testdb_S . f1_2_S /* Sequence end */
core,binary and logs:
http://mysql-systemqa.s3.amazonaws.com/var-bug52528.zip
source:
revision-id: kostja@sun.com-20100327120632-xvhcrtay402f8x52
date: 2010-03-27 15:06:32 +0300
build-date: 2010-04-01 18:41:13 +0300
revno: 3138
branch-nick: mysql-trunk-runtime-exp
[13 Apr 2010 10:02]
Philip Stoev
Unfortunately this issue does not lend itself to automatic simplification, even if the RQG command line above reproduces it easily. It is possible that KILL and/or COMMIT|ROLLBACK RELEASE, which cause the RQG to establish a fresh connection, are required to reproduce.
[13 Apr 2010 15:40]
Sveta Smirnova
Bug #48515 can be duplicate of this one.
[21 Apr 2010 8:56]
Sveta Smirnova
See also bug #50399 Bug #19610 was marked as duplicate of this one.
[11 Jul 2010 20:56]
Elena Stepanova
After running some additional tests, I am quite certain that Philip´s initial RQG test and my MTR test used to hit the same problem: when an idle connection got killed, it did not disappear but stayed in 'Killed' forever. The RQG grammar contains KILL QUERY and KILL statements. In a number of failures that I checked, a hanging connection <ID> always appeared after some other connection(s) ran KILL QUERY <ID> followed by KILL <ID>. The race condition apparently is that at the moment of 'KILL' the previous query has to be already killed, but a new one should not start yet. The MTR test was initially designed to kill an idle thread, so it failed deterministically. I have modified it a little to avoid confusion about pass/fail, and to remove explicit sleeps. It can be found below as test case 'Kill Idle'. Both tests fail on 5.5 cmake builds before the change http://lists.mysql.com/commits/104251, and do not fail on builds which contain this change (mid April, depending on the tree). Bug#19610, which is marked as a duplicate of this one, initially reports two part of the problem: first, idle connection do not disappear after wait/interactive_timeout, and second, these connections, while killed manually get stuck in 'Killed'. The second part is exactly the same as we observed. However, while running tests on a SUSE 10.2 box, I could not get the failure on *normal* 5.1 builds, neither on source code from March-April, nor on the current trees. Both the life-after-timeout and the stay-in-killed problems are easily reproducible if I compile 5.1 explicitly with NO_ALARM (and disappear again if I add SIGNAL_WITH_VIO_CLOSE); I'm not sure however it is a valid experiment. Just in case, the test which shows both issues on a 5.1 "NO_ALARM" build can be found below as 'Kill Timed Out' test case. #--------------------- # Kill Idle test case # Should fail on a 5.5 cmake build before http://lists.mysql.com/commits/104251 # and pass after the change connect(con1,localhost,root,,test,,); let $con1_id = `SELECT CONNECTION_ID()`; connection default; let $wait_condition = SELECT command = 'Sleep' FROM information_schema.processlist WHERE id = $con1_id; --source include/wait_condition.inc eval KILL $con1_id; let $wait_timeout = 5; let $wait_condition = SELECT command = 'Killed' AND time > 3 FROM information_schema.processlist WHERE id = $con1_id; --source include/wait_condition.inc if ($success) { SHOW PROCESSLIST; --die # Hanging connection detected } --echo # No hanging connections detected --exit # EOF (Kill Idle test case) #--------------------- # Kill Timed Out test case connect(con1,localhost,root,,test,,); SET @@wait_timeout=3; SET @@interactive_timeout=3; let $con1_id = `SELECT CONNECTION_ID()`; connection default; let $wait_timeout=6; let $wait_condition = SELECT time > 4 FROM information_schema.processlist WHERE id = $con1_id; --source include/wait_condition.inc if (!$success) { --echo # No timed out connections detected, nothing to kill --exit } --echo # Timed out connection detected: SHOW PROCESSLIST; eval KILL $con1_id; let $wait_timeout = 3; let $wait_condition = SELECT time > 6 AND command = 'Killed' FROM information_schema.processlist WHERE id = $con1_id; --source include/wait_condition.inc if (!$success) { --echo # No hanging connections detected --die # Timed out connection, but no hang up } SHOW PROCESSLIST; --die # Timed out and hanging connection detected # EOF (Kill Timed Out test case)
[12 Jul 2010 8:04]
Dmitry Shulga
This is a duplicate of Bug#37780

Description: When executing a RQG DDL/MDL workload, sometimes the server will forever remain waiting for a packet from the client, but the client does not appear to think that it is supposed to send such a packet, and vice versa. The server backtrace looks like this: Thread 3 (Thread 0x7fffc32ac950 (LWP 15171)): #0 0x000000315b00df5b in read () from /lib64/libpthread.so.0 #1 0x0000000000a1755f in vio_read (vio=0x7fffbc1491c8, buf=0x7fffbc0ae598 "\001", size=4) at /1tb/bzr/mysql-trunk/vio/viosocket.c:44 #2 0x0000000000522e62 in my_real_read (net=0x7fffbc1462b8, complen=0x7fffc32abfc8) at /1tb/bzr/mysql-trunk/sql/net_serv.cc:836 #3 0x00000000005234e2 in my_net_read (net=0x7fffbc1462b8) at /1tb/bzr/mysql-trunk/sql/net_serv.cc:1026 #4 0x00000000005b3806 in do_command (thd=0x7fffbc1460b8) at /1tb/bzr/mysql-trunk/sql/sql_parse.cc:656 #5 0x000000000067e584 in do_handle_one_connection (thd_arg=0x7fffbc1460b8) at /1tb/bzr/mysql-trunk/sql/sql_connect.cc:1174 #6 0x000000000067e653 in handle_one_connection (arg=0x7fffbc1460b8) at /1tb/bzr/mysql-trunk/sql/sql_connect.cc:1113 #7 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #8 0x000000315a4e627d in clone () from /lib64/libc.so.6 SHOW PROCESSLIST looks like this: *************************** 2. row *************************** Id: 32 User: root Host: localhost:51720 db: test Command: Killed Time: 345 State: Info: NULL As it is visible from SHOW PROCESSLIST, applying KILL did not achieve anything, just flagged the connection as "Killed". How to repeat: This is easily reproducible with the RQG with the following command line: perl runall.pl \ --grammar=conf/runtime/WL5004_sql.yy \ --gendata=conf/runtime/WL5004_data.zz \ --queries=1M \ --duration=120 \ --reporters=Deadlock,ErrorLog,Backtrace,Shutdown \ --mysqld=--innodb-lock-wait-timeout=1 \ --mysqld=--loose-lock-wait-timeout=1 \ --mysqld=--secure-file-priv=/tmp/ \ --engine=MyISAM \ --rows=1 \ --threads=16 \ --mysqld=--transaction-isolation=READ-COMMITTED \ --mysqld=--log-output=table,file \ --mysqld=--innodb_flush_log_at_trx_commit=1 \ --mask-level=1 \ --mysqld=--big-tables \ --mysqld=--open_files_limit=10 \ --mysqld=--table_cache=10 \ --mysqld=--tmp_table_size=1K \ --mask=55166 \ --queries=100000000 \ --basedir=/1tb/bzr/mysql-trunk-runtime-exp \ --seed=time Ignore any output from the test, but notice that the final lines will be # 2010-04-01T18:10:37 Child process completed successfully. # 2010-04-01T18:10:37 Child process completed successfully. but the test will not terminate. The reason is that some RQG testing children still expect packets from mysql, which never arrive.