Bug #45066 FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE
Submitted: 25 May 2009 12:21 Modified: 7 Mar 2010 12:43
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any

[25 May 2009 12:21] Philip Stoev
Description:
Executing FLUSH TABLE WITH READ LOCK concurrently with LOCK TABLE READ results in a deadlock, where all threads wait forever to obtain a lock. This is problematic for the following resons:

1. No thread ever moves forward and the CPU usage is at 100%.
2. The locks are actually non-conflicting, and there is no reason to not be able to aquire them
3. The lock is not influenced by the table-lock-wait-timeout or any other setting, and as such is never resolved
4. Having this situation is problematic for stress testing using the Random Query Generator. An error message of some sort is required so that the test can move forward.
5. New mysql clients that insist on enumerating tables at startup can not connect -- mysql -A needs to be used to bypass the enumeration and connect to the database.

How to repeat:
Use the Random Query Generator -- www.launchpad.net/randgen

with the following grammar file:

query:
        lock | flush ;

lock:
        LOCK TABLE _table READ;

flush:
        FLUSH TABLES WITH READ LOCK ;

$ perl runall.pl --basedir=/build/bzr/mysql-5.1/ --grammar=conf/locking-bug.yy --mysqld=--table-lock-wait-timeout=1 --threads=3

Suggested fix:
Ideally, one of the threads should be able to aquire the lock and move forward.

Alternatively, one of the threads should get an error message, such as:

- lock wait timeout exceeded
- deadlock detected
- 1223 Can't execute the query because you have a conflicting read lock
[25 May 2009 12:45] Sveta Smirnova
Thank you for the report.

Verified as described.
[25 May 2009 14:30] Philip Stoev
Sorry, I meant CPU usage is at 0%.
[26 May 2009 15:57] Sveta Smirnova
Same problem occurs if issue FLUSH TABLES (without read lock).
[27 May 2009 14:41] Philip Stoev
mysql> show variables like '%thread%cache%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| thread_cache_size | 0     |
+-------------------+-------+
1 row in set (0.38 sec)
[27 May 2009 15:56] Sveta Smirnova
With FLUSH TABLES:

mysql> show processlist;
+----+------+-----------------+------+---------+------+-------------------+-----------------------+
| Id | User | Host            | db   | Command | Time | State             | Info                  |
+----+------+-----------------+------+---------+------+-------------------+-----------------------+
|  1 | root | localhost:49328 | test | Sleep   |  423 |                   | NULL                  | 
|  3 | root | localhost:49330 | NULL | Query   |    0 | NULL              | show processlist      | 
|  7 | root | localhost:49334 | test | Query   |  416 | Waiting for table | FLUSH TABLES          | 
|  8 | root | localhost:49335 | test | Query   |  416 | Waiting for table | LOCK TABLE `BBB` READ | 
|  9 | root | localhost:49336 | test | Query   |  416 | Waiting for table | FLUSH TABLES          | 
+----+------+-----------------+------+---------+------+-------------------+-----------------------+
5 rows in set (0.00 sec)

mysql> show variables like '%thread%cache%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| thread_cache_size | 0     | 
+-------------------+-------+
1 row in set (0.00 sec)

With FLUSH TABLES WITH READ LOCK:

mysql> show processlist;
+----+------+-----------------+------+---------+------+-----------------+-----------------------------+
| Id | User | Host            | db   | Command | Time | State           | Info                        |
+----+------+-----------------+------+---------+------+-----------------+-----------------------------+
|  1 | root | localhost:49566 | test | Sleep   |  224 |                 | NULL                        | 
|  3 | root | localhost:49568 | NULL | Query   |    0 | NULL            | show processlist            | 
|  7 | root | localhost:49572 | test | Query   |  221 | Flushing tables | FLUSH TABLES WITH READ LOCK | 
|  8 | root | localhost:49573 | test | Query   |  221 | Flushing tables | FLUSH TABLES WITH READ LOCK | 
|  9 | root | localhost:49574 | test | Query   |  221 | Flushing tables | FLUSH TABLES WITH READ LOCK | 
+----+------+-----------------+------+---------+------+-----------------+-----------------------------+
5 rows in set (0.00 sec)

mysql> show variables like '%thread%cache%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| thread_cache_size | 0     | 
+-------------------+-------+
1 row in set (0.00 sec)
[16 Jun 2009 12:52] Christoffer Hall
Backtrace bug 45066

Attachment: backtrace_bug45066.txt (text/plain), 15.72 KiB.

[16 Jun 2009 12:53] Christoffer Hall
I have uploaded a backtrace of all threads. Not all, I have removed some InnoDB threads that were just sitting there doing nothing. In this case we have two threads doing "flush tables" being deadlocked against a number threads doing "lock tables" (and therefor opening them). This ends up with all of them waiting on COND_refresh and no signal is being sent.

In open_table finds a table in the cache, but the refresh_version in the table is different from the version in the table share and open_table goes to sleep on COND_refresh. Bumping the version number is only done by flush tables if I understand it correctly. Somehow a thread that has had a lock on table A has passed through, done what it is supposed to and not signalled COND_refresh. Either that or flush incorrectly went to sleep.
[16 Jun 2009 14:46] Konstantin Osipov
The bug is not reproducible in 6.0, FLUSH TABLES under LOCK TABLE <> READ is not allowed in 6.0.
Even if you change LOCK TABLE READ to LOCK TABLE WRITE, the bug is still not reproducible in 6.0.
reopen_tables() function, which deadlocks, was removed by WL#4284.

Fixing it in 5.1 may lead to other deadlocks. 

Suggest to re-triage for a fix in 6.0/5.4, and close the bug there by adding a test case to the test suite.
[16 Jun 2009 16:03] Konstantin Osipov
A note to the implementor:
the bug apparently has 4 distinct manifestations:

LOCK TABLE READ + FLUSH TABLES
LOCK TABLE WRITE + FLUSH TABLES
LOCK TABLE READ + FLUSH TABLES WITH READ LOCK
LOCK TABLE WRITE + FLUSH TABLES WITH READ LOCK.

In other words, use of FLUSH TABLES WITH READ LOCK instead of FLUSH TABLES may lead to a different deadlock stack trace.

It is necessary to verify that neither of the above 4 cases is repeatable in 5.4.

LOCK TABLE READ + FLUSH TABLES are disallowed in 5.4, so that should be easy
to verify. 

To ensure that there is no regression of that kind, it would be nice to add
a test case to the test suite. I am at a loss, however, how to construct a non-concurrent test case that would trigger the deadlock condition, especially
since in 6.0 metadata locking has a wholly different implementation.
[18 Jun 2009 9:33] Christoffer Hall
I cannot produce deadlocks in any of the four scenarios in 6.0. No deadlocks pop up at all.
[29 Jun 2009 13:24] Christoffer Hall
A problem with this test does pop up in 5.4. A freeze happens. This time there is no obvious deadlock from the backtrace in gdb. But frozen it is. Since this is a 5.4 issue I'm unassigning it from from me.
[2 Jul 2009 8:23] Konstantin Osipov
Please add this coverage to lock.test or flush.test:

lock table t1 read;
flush tables;

lock table t1 write;
flush tables;

#
# If you allow that combination, you reintroduce bug Bug#45066 
# 
lock table t1 read;
flush tables with read lock;

lock table t1 write;
flush tables with read lock;
[2 Jul 2009 14:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77793

2822 Jon Olav Hauglid	2009-07-02
      Bug #45066 FLUSH TABLES WITH READ LOCK deadlocks against LOCK TABLE
      
      Test coverage for combinations of LOCK TABLE READ / WRITE and 
      FLUSH TABLES / FLUSH TABLES WITH READ LOCK added to lock.test.
      LOCK and FLUSH are executed sequentially from one connection.
[23 Jul 2009 11:29] Alexander Nozdrin
Pushed into azalea via azalea-bugfixing.
[6 Aug 2009 20:15] Paul Dubois
Noted in 5.4.4 changelog.

Execution of FLUSH TABLES or FLUSH TABLES WITH READ LOCK concurrently
with LOCK TABLES resulted in deadlock.
[12 Aug 2009 22:24] Paul Dubois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 1:43] Paul Dubois
Ignore previous comment about 5.4.2.
[8 Dec 2009 13:18] Jon Olav Hauglid
Pushed to mysql-next-4284 (5.6.0-beta)
[16 Feb 2010 16:50] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:kostja@sun.com-20091211154405-c9yhiewr9o5d20rq) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 17:00] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:kostja@sun.com-20091208135954-h6ipvx9mpzmpuipk) (pib:16)
[17 Feb 2010 0:38] Paul Dubois
Noted in 6.0.14 changelog.

Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:04] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 12:43] Paul Dubois
Noted in 5.5.3 changelog.