Bug #45066 | FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE | ||
---|---|---|---|
Submitted: | 25 May 2009 12:21 | Modified: | 7 Mar 2010 12:43 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) |
Version: | 5.1 | OS: | Any |
Assigned to: | Jon Olav Hauglid | CPU Architecture: | Any |
[25 May 2009 12:21]
Philip Stoev
[25 May 2009 12:45]
Sveta Smirnova
Thank you for the report. Verified as described.
[25 May 2009 14:30]
Philip Stoev
Sorry, I meant CPU usage is at 0%.
[26 May 2009 15:57]
Sveta Smirnova
Same problem occurs if issue FLUSH TABLES (without read lock).
[27 May 2009 14:41]
Philip Stoev
mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.38 sec)
[27 May 2009 15:56]
Sveta Smirnova
With FLUSH TABLES: mysql> show processlist; +----+------+-----------------+------+---------+------+-------------------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+-------------------+-----------------------+ | 1 | root | localhost:49328 | test | Sleep | 423 | | NULL | | 3 | root | localhost:49330 | NULL | Query | 0 | NULL | show processlist | | 7 | root | localhost:49334 | test | Query | 416 | Waiting for table | FLUSH TABLES | | 8 | root | localhost:49335 | test | Query | 416 | Waiting for table | LOCK TABLE `BBB` READ | | 9 | root | localhost:49336 | test | Query | 416 | Waiting for table | FLUSH TABLES | +----+------+-----------------+------+---------+------+-------------------+-----------------------+ 5 rows in set (0.00 sec) mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.00 sec) With FLUSH TABLES WITH READ LOCK: mysql> show processlist; +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ | 1 | root | localhost:49566 | test | Sleep | 224 | | NULL | | 3 | root | localhost:49568 | NULL | Query | 0 | NULL | show processlist | | 7 | root | localhost:49572 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | | 8 | root | localhost:49573 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | | 9 | root | localhost:49574 | test | Query | 221 | Flushing tables | FLUSH TABLES WITH READ LOCK | +----+------+-----------------+------+---------+------+-----------------+-----------------------------+ 5 rows in set (0.00 sec) mysql> show variables like '%thread%cache%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | thread_cache_size | 0 | +-------------------+-------+ 1 row in set (0.00 sec)
[16 Jun 2009 12:52]
Christoffer Hall
Backtrace bug 45066
Attachment: backtrace_bug45066.txt (text/plain), 15.72 KiB.
[16 Jun 2009 12:53]
Christoffer Hall
I have uploaded a backtrace of all threads. Not all, I have removed some InnoDB threads that were just sitting there doing nothing. In this case we have two threads doing "flush tables" being deadlocked against a number threads doing "lock tables" (and therefor opening them). This ends up with all of them waiting on COND_refresh and no signal is being sent. In open_table finds a table in the cache, but the refresh_version in the table is different from the version in the table share and open_table goes to sleep on COND_refresh. Bumping the version number is only done by flush tables if I understand it correctly. Somehow a thread that has had a lock on table A has passed through, done what it is supposed to and not signalled COND_refresh. Either that or flush incorrectly went to sleep.
[16 Jun 2009 14:46]
Konstantin Osipov
The bug is not reproducible in 6.0, FLUSH TABLES under LOCK TABLE <> READ is not allowed in 6.0. Even if you change LOCK TABLE READ to LOCK TABLE WRITE, the bug is still not reproducible in 6.0. reopen_tables() function, which deadlocks, was removed by WL#4284. Fixing it in 5.1 may lead to other deadlocks. Suggest to re-triage for a fix in 6.0/5.4, and close the bug there by adding a test case to the test suite.
[16 Jun 2009 16:03]
Konstantin Osipov
A note to the implementor: the bug apparently has 4 distinct manifestations: LOCK TABLE READ + FLUSH TABLES LOCK TABLE WRITE + FLUSH TABLES LOCK TABLE READ + FLUSH TABLES WITH READ LOCK LOCK TABLE WRITE + FLUSH TABLES WITH READ LOCK. In other words, use of FLUSH TABLES WITH READ LOCK instead of FLUSH TABLES may lead to a different deadlock stack trace. It is necessary to verify that neither of the above 4 cases is repeatable in 5.4. LOCK TABLE READ + FLUSH TABLES are disallowed in 5.4, so that should be easy to verify. To ensure that there is no regression of that kind, it would be nice to add a test case to the test suite. I am at a loss, however, how to construct a non-concurrent test case that would trigger the deadlock condition, especially since in 6.0 metadata locking has a wholly different implementation.
[18 Jun 2009 9:33]
Christoffer Hall
I cannot produce deadlocks in any of the four scenarios in 6.0. No deadlocks pop up at all.
[29 Jun 2009 13:24]
Christoffer Hall
A problem with this test does pop up in 5.4. A freeze happens. This time there is no obvious deadlock from the backtrace in gdb. But frozen it is. Since this is a 5.4 issue I'm unassigning it from from me.
[2 Jul 2009 8:23]
Konstantin Osipov
Please add this coverage to lock.test or flush.test: lock table t1 read; flush tables; lock table t1 write; flush tables; # # If you allow that combination, you reintroduce bug Bug#45066 # lock table t1 read; flush tables with read lock; lock table t1 write; flush tables with read lock;
[2 Jul 2009 14:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/77793 2822 Jon Olav Hauglid 2009-07-02 Bug #45066 FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE Test coverage for combinations of LOCK TABLE READ / WRITE and FLUSH TABLES / FLUSH TABLES WITH READ LOCK added to lock.test. LOCK and FLUSH are executed sequentially from one connection.
[23 Jul 2009 11:29]
Alexander Nozdrin
Pushed into azalea via azalea-bugfixing.
[6 Aug 2009 20:15]
Paul DuBois
Noted in 5.4.4 changelog. Execution of FLUSH TABLES or FLUSH TABLES WITH READ LOCK concurrently with LOCK TABLES resulted in deadlock.
[12 Aug 2009 22:24]
Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 1:43]
Paul DuBois
Ignore previous comment about 5.4.2.
[8 Dec 2009 13:18]
Jon Olav Hauglid
Pushed to mysql-next-4284 (5.6.0-beta)
[16 Feb 2010 16:50]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:kostja@sun.com-20091211154405-c9yhiewr9o5d20rq) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 17:00]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:kostja@sun.com-20091208135954-h6ipvx9mpzmpuipk) (pib:16)
[17 Feb 2010 0:38]
Paul DuBois
Noted in 6.0.14 changelog. Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:04]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 12:43]
Paul DuBois
Noted in 5.5.3 changelog.