Bug #52356 | query_cache_debug fails on Linux | ||
---|---|---|---|
Submitted: | 25 Mar 2010 9:16 | Modified: | 7 May 2010 19:51 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | Celosia (M3), Dahlia (M4) | OS: | Linux |
Assigned to: | Alexander Nozdrin | CPU Architecture: | Any |
Tags: | experimental, pb2, test failure |
[25 Mar 2010 9:16]
Alexander Nozdrin
[25 Mar 2010 10:09]
Valeriy Kravchuk
Fails for me with recent mysql-next-mr tree on 32-bit Ubuntu 8.04 also (built using BUILD/compile-pentium-debug-max): openxs@ubuntu:/home2/openxs/bzr/mysql-next-mr/mysql-test$ ./mtr query_cache query_cache_debug Logging: ./mtr query_cache query_cache_debug 100325 12:03:57 [Note] Buffered information: Performance schema disabled (reason: start parameters). 100325 12:03:57 [Note] Plugin 'FEDERATED' is disabled. 100325 12:03:57 [Note] Plugin 'ndbcluster' is disabled. MySQL Version 5.6.99 Checking supported features... - using ndbcluster when necessary, mysqld supports it - SSL connections supported - binaries are debug compiled Collecting tests... vardir: /home2/openxs/bzr/mysql-next-mr/mysql-test/var Checking leftover processes... - found old pid 14879 in 'mysqld.2.pid', killing it... process did not exist! - found old pid 14684 in 'mysqld.1.pid', killing it... process did not exist! Removing old var directory... Creating var directory '/home2/openxs/bzr/mysql-next-mr/mysql-test/var'... Installing system database... Using server port 60581 ============================================================================== TEST RESULT TIME (ms) ------------------------------------------------------------ worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 main.query_cache [ pass ] 26630 main.query_cache_debug [ fail ] Test ended at 2010-03-25 12:04:33 CURRENT_TEST: main.query_cache_debug mysqltest: At line 230: query 'DELETE FROM t1' failed: 1689: Wait on a lock was aborted due to a pending exclusive lock The result from queries just before the failure was: < snip > SELECT SQL_NO_CACHE id FROM information_schema.processlist WHERE state='wait_in_query_cache_insert' ORDER BY id ASC LIMIT 1 INTO @thread_id; KILL QUERY @thread_id; SELECT SQL_NO_CACHE id FROM information_schema.processlist WHERE state='wait_in_query_cache_insert' ORDER BY id DESC LIMIT 1 INTO @thread_id; KILL QUERY @thread_id; ** ** Finally signal the DELETE statement on THD1 one last time. ** The stmt will complete the query cache invalidation and return ** cache status to NO_FLUSH_IN_PROGRESS. On the status change ** One signal will be sent to the thread group waiting for executing ** invalidations and a broadcast signal will be sent to the thread ** group holding result set writers. SELECT SQL_NO_CACHE id FROM information_schema.processlist WHERE state='wait_in_query_cache_invalidate2' LIMIT 1 INTO @flush_thread_id; KILL QUERY @flush_thread_id; ** ************************************************************************* ** No tables should be locked =================================== Connection thd2 a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb More results from queries before failure can be found in /home2/openxs/bzr/mysql-next-mr/mysql-test/var/log/query_cache_debug.log - saving '/home2/openxs/bzr/mysql-next-mr/mysql-test/var/log/main.query_cache_debug/' to '/home2/openxs/bzr/mysql-next-mr/mysql-test/var/log/main.query_cache_debug/' Only 2 of 3 completed. mysql-test-run: *** ERROR: Not all tests completed openxs@ubuntu:/home2/openxs/bzr/mysql-next-mr/mysql-test$ uname -a Linux ubuntu 2.6.24-27-generic #1 SMP Fri Mar 12 01:10:31 UTC 2010 i686 GNU/Linux
[10 Apr 2010 8:16]
Alexander Nozdrin
Now it fails on other Linuxes too. Making the test case experimental on Linux due to this bug.
[20 Apr 2010 8:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106073 3015 Alexander Nozdrin 2010-04-20 Patch for Bug#52356: query_cache_debug fails on Linux. The problem was a race condition in the test case: there is a thread, that is doing DELETE FROM t1 in the background. There are two KILL QUERY of that DELETE statement. Finally, another thread is doing another DELETE FROM t1. The problem was that the test case didn't wait for the statement being killed to die happily and tried to execute another DELETE FROM t1, which obviously conflicted with the dying one. The fix is to wait for the dying statement to rest in peace before executing another DELETE FROM t1.
[27 Apr 2010 16:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106705 3015 Alexander Nozdrin 2010-04-27 Patch for Bug#52356: query_cache_debug fails on Linux. There were two problems here: 1. misleading error message 2. abusing KILL QUERY in the test case 1. The server reported "'DELETE FROM t1' failed: 1689: Wait on a lock was aborted due to a pending exclusive lock", while the proper error message should be "'DELETE FROM t1' failed: 1317: Query execution was interrupted". The problem is that the server has two different flags for signalling that a query is being killed: THD::killed and mysys_var::abort. The test case triggers a race: sometimes mysys_var::abort is set earlier than THD::killed. That leads to the following situation: - thr_lock() checks mysys_var::abort and returns error status, since mysys_var::abort is set; - the caller (mysql_lock_tables()) gets an error from thr_lock(), but THD::killed is not set, so it decides that thr_lock() couldn't get a lock due to a pending exclusive lock. This is a known issue with the server and it's not going to be fixed soon. 5.5 differs from 5.1 here as follows: when thr_lock() returns an error: - 5.1 continues trying thr_lock() until success; - 5.5 propagates the error The fix is to check both THD::killed and mysys_var::abort when analyzing error returned from thr_lock(). 2. The test case uses KILL QUERY is a highly concurent environment. The fix is to wait for the dying statement to rest in peace before executing another DELETE FROM t1.
[28 Apr 2010 18:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/106865 3023 Alexander Nozdrin 2010-04-28 Patch for Bug#52356: query_cache_debug fails on Linux. There were two problems here: 1. misleading error message 2. abusing KILL QUERY in the test case 1. The server reported "'DELETE FROM t1' failed: 1689: Wait on a lock was aborted due to a pending exclusive lock", while the proper error message should be "'DELETE FROM t1' failed: 1317: Query execution was interrupted". The problem is that the server has two different flags for signalling that a query is being killed: THD::killed and mysys_var::abort. The test case triggers a race: sometimes mysys_var::abort is set earlier than THD::killed. That leads to the following situation: - thr_lock() checks mysys_var::abort and returns error status, since mysys_var::abort is set; - the caller (mysql_lock_tables()) gets an error from thr_lock(), but THD::killed is not set, so it decides that thr_lock() couldn't get a lock due to a pending exclusive lock. This is a known issue with the server and it's not going to be fixed soon. 5.5 differs from 5.1 here as follows: when thr_lock() returns an error: - 5.1 continues trying thr_lock() until success; - 5.5 propagates the error The fix is to check both THD::killed and mysys_var::abort when analyzing error returned from thr_lock(). 2. The test case uses KILL QUERY is a highly concurent environment. The fix is to wait for the dying statement to rest in peace before executing another DELETE FROM t1.
[30 Apr 2010 12:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/107079 3027 Alexander Nozdrin 2010-04-30 Patch for Bug#52356: query_cache_debug fails on Linux. There were two problems here: 1. misleading error message 2. abusing KILL QUERY in the test case 1. The server reported "'DELETE FROM t1' failed: 1689: Wait on a lock was aborted due to a pending exclusive lock", while the proper error message should be "'DELETE FROM t1' failed: 1317: Query execution was interrupted". The problem is that the server has two different flags for signalling that a query is being killed: THD::killed and mysys_var::abort. The test case triggers a race: sometimes mysys_var::abort is set earlier than THD::killed. That leads to the following situation: - thr_lock() checks mysys_var::abort and returns error status, since mysys_var::abort is set; - the caller (mysql_lock_tables()) gets an error from thr_lock(), but THD::killed is not set, so it decides that thr_lock() couldn't get a lock due to a pending exclusive lock. This is a known issue with the server and it's not going to be fixed soon. 5.5 differs from 5.1 here as follows: when thr_lock() returns an error: - 5.1 continues trying thr_lock() until success; - 5.5 propagates the error 2. The test case uses KILL QUERY is a highly concurent environment. The fix is to wait for the dying statement to rest in peace before executing another DELETE FROM t1.
[4 May 2010 16:14]
Alexander Nozdrin
The patch was approved by Kostja and pushed into trunk-bugfixing.
[7 May 2010 9:20]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100507091908-vqyhpwf2km0aokno) (version source revid:alik@sun.com-20100507091737-12vceffs11elb25g) (merge vers: 6.0.14-alpha) (pib:16)
[7 May 2010 9:22]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100507091655-349gwq21ursz8y4p) (version source revid:alik@sun.com-20100507091655-349gwq21ursz8y4p) (merge vers: 5.5.5-m3) (pib:16)
[7 May 2010 9:23]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100507091823-nzao4h3qosau4tin) (version source revid:alik@sun.com-20100507091720-ib9r8uny2aeazvas) (pib:16)
[7 May 2010 19:51]
Paul DuBois
Noted in 5.5.5, 6.0.14 changelogs. There was a race condition between flags used for signaling that a query was killed, which led to error-reporting and lock-acquisition problems.