| Bug #45066 | FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE | ||
|---|---|---|---|
| Submitted: | 25 May 2009 12:21 | Modified: | 7 Mar 2010 12:43 | 
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) | 
| Version: | 5.1 | OS: | Any | 
| Assigned to: | Jon Olav Hauglid | CPU Architecture: | Any | 
   [25 May 2009 12:45]
   Sveta Smirnova        
  Thank you for the report. Verified as described.
   [25 May 2009 14:30]
   Philip Stoev        
  Sorry, I meant CPU usage is at 0%.
   [26 May 2009 15:57]
   Sveta Smirnova        
  Same problem occurs if issue FLUSH TABLES (without read lock).
   [27 May 2009 14:41]
   Philip Stoev        
  mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.38 sec)
   [27 May 2009 15:56]
   Sveta Smirnova        
  With FLUSH TABLES: mysql> show processlist; +----+------+-----------------+------+---------+------+-------------------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+-------------------+-----------------------+ | 1 | root | localhost:49328 | test | Sleep | 423 | | NULL | | 3 | root | localhost:49330 | NULL | Query | 0 | NULL | show processlist | | 7 | root | localhost:49334 | test | Query | 416 | Waiting for table | FLUSH TABLES | | 8 | root | localhost:49335 | test | Query | 416 | Waiting for table | LOCK TABLE `BBB` READ | | 9 | root | localhost:49336 | test | Query | 416 | Waiting for table | FLUSH TABLES | +----+------+-----------------+------+---------+------+-------------------+-----------------------+ 5 rows in set (0.00 sec) mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.00 sec) With FLUSH TABLES WITH READ LOCK: mysql> show processlist; +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ | 1 | root | localhost:49566 | test | Sleep | 224 | | NULL | | 3 | root | localhost:49568 | NULL | Query | 0 | NULL | show processlist | | 7 | root | localhost:49572 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | | 8 | root | localhost:49573 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | | 9 | root | localhost:49574 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ 5 rows in set (0.00 sec) mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.00 sec)
   [16 Jun 2009 12:52]
   Christoffer Hall        
  Backtrace bug 45066
Attachment: backtrace_bug45066.txt (text/plain), 15.72 KiB.
   [16 Jun 2009 12:53]
   Christoffer Hall        
  I have uploaded a backtrace of all threads. Not all, I have removed some InnoDB threads that were just sitting there doing nothing. In this case we have two threads doing "flush tables" being deadlocked against a number threads doing "lock tables" (and therefor opening them). This ends up with all of them waiting on COND_refresh and no signal is being sent. In open_table finds a table in the cache, but the refresh_version in the table is different from the version in the table share and open_table goes to sleep on COND_refresh. Bumping the version number is only done by flush tables if I understand it correctly. Somehow a thread that has had a lock on table A has passed through, done what it is supposed to and not signalled COND_refresh. Either that or flush incorrectly went to sleep.
   [16 Jun 2009 14:46]
   Konstantin Osipov        
  The bug is not reproducible in 6.0, FLUSH TABLES under LOCK TABLE <> READ is not allowed in 6.0. Even if you change LOCK TABLE READ to LOCK TABLE WRITE, the bug is still not reproducible in 6.0. reopen_tables() function, which deadlocks, was removed by WL#4284. Fixing it in 5.1 may lead to other deadlocks. Suggest to re-triage for a fix in 6.0/5.4, and close the bug there by adding a test case to the test suite.
   [16 Jun 2009 16:03]
   Konstantin Osipov        
  A note to the implementor: the bug apparently has 4 distinct manifestations: LOCK TABLE READ + FLUSH TABLES LOCK TABLE WRITE + FLUSH TABLES LOCK TABLE READ + FLUSH TABLES WITH READ LOCK LOCK TABLE WRITE + FLUSH TABLES WITH READ LOCK. In other words, use of FLUSH TABLES WITH READ LOCK instead of FLUSH TABLES may lead to a different deadlock stack trace. It is necessary to verify that neither of the above 4 cases is repeatable in 5.4. LOCK TABLE READ + FLUSH TABLES are disallowed in 5.4, so that should be easy to verify. To ensure that there is no regression of that kind, it would be nice to add a test case to the test suite. I am at a loss, however, how to construct a non-concurrent test case that would trigger the deadlock condition, especially since in 6.0 metadata locking has a wholly different implementation.
   [18 Jun 2009 9:33]
   Christoffer Hall        
  I cannot produce deadlocks in any of the four scenarios in 6.0. No deadlocks pop up at all.
   [29 Jun 2009 13:24]
   Christoffer Hall        
  A problem with this test does pop up in 5.4. A freeze happens. This time there is no obvious deadlock from the backtrace in gdb. But frozen it is. Since this is a 5.4 issue I'm unassigning it from from me.
   [2 Jul 2009 8:23]
   Konstantin Osipov        
  Please add this coverage to lock.test or flush.test: lock table t1 read; flush tables; lock table t1 write; flush tables; # # If you allow that combination, you reintroduce bug Bug#45066 # lock table t1 read; flush tables with read lock; lock table t1 write; flush tables with read lock;
   [2 Jul 2009 14:24]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/77793 2822 Jon Olav Hauglid 2009-07-02 Bug #45066 FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE Test coverage for combinations of LOCK TABLE READ / WRITE and FLUSH TABLES / FLUSH TABLES WITH READ LOCK added to lock.test. LOCK and FLUSH are executed sequentially from one connection.
   [23 Jul 2009 11:29]
   Alexander Nozdrin        
  Pushed into azalea via azalea-bugfixing.
   [6 Aug 2009 20:15]
   Paul DuBois        
  Noted in 5.4.4 changelog. Execution of FLUSH TABLES or FLUSH TABLES WITH READ LOCK concurrently with LOCK TABLES resulted in deadlock.
   [12 Aug 2009 22:24]
   Paul DuBois        
  Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
   [15 Aug 2009 1:43]
   Paul DuBois        
  Ignore previous comment about 5.4.2.
   [8 Dec 2009 13:18]
   Jon Olav Hauglid        
  Pushed to mysql-next-4284 (5.6.0-beta)
   [16 Feb 2010 16:50]
   Bugs System        
  Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:kostja@sun.com-20091211154405-c9yhiewr9o5d20rq) (merge vers: 6.0.14-alpha) (pib:16)
   [16 Feb 2010 17:00]
   Bugs System        
  Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:kostja@sun.com-20091208135954-h6ipvx9mpzmpuipk) (pib:16)
   [17 Feb 2010 0:38]
   Paul DuBois        
  Noted in 6.0.14 changelog. Setting report to Need Merge pending push of Celosia into release tree.
   [6 Mar 2010 11:04]
   Bugs System        
  Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
   [7 Mar 2010 12:43]
   Paul DuBois        
  Noted in 5.5.3 changelog.

Description: Executing FLUSH TABLE WITH READ LOCK concurrently with LOCK TABLE READ results in a deadlock, where all threads wait forever to obtain a lock. This is problematic for the following resons: 1. No thread ever moves forward and the CPU usage is at 100%. 2. The locks are actually non-conflicting, and there is no reason to not be able to aquire them 3. The lock is not influenced by the table-lock-wait-timeout or any other setting, and as such is never resolved 4. Having this situation is problematic for stress testing using the Random Query Generator. An error message of some sort is required so that the test can move forward. 5. New mysql clients that insist on enumerating tables at startup can not connect -- mysql -A needs to be used to bypass the enumeration and connect to the database. How to repeat: Use the Random Query Generator -- www.launchpad.net/randgen with the following grammar file: query: lock | flush ; lock: LOCK TABLE _table READ; flush: FLUSH TABLES WITH READ LOCK ; $ perl runall.pl --basedir=/build/bzr/mysql-5.1/ --grammar=conf/locking-bug.yy --mysqld=--table-lock-wait-timeout=1 --threads=3 Suggested fix: Ideally, one of the threads should be able to aquire the lock and move forward. Alternatively, one of the threads should get an error message, such as: - lock wait timeout exceeded - deadlock detected - 1223 Can't execute the query because you have a conflicting read lock