Bug #41563 FLUSH TABLES WITH READ LOCK results are different in 6.0 Vs 5.1
Submitted: 17 Dec 2008 16:53 Modified: 29 Dec 2008 11:49
Reporter: Jonathan Miller Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:6.0 OS:Linux
Assigned to: Ingo Strüwing CPU Architecture:Any

[17 Dec 2008 16:53] Jonathan Miller
Description:
Please use MTR and run the test case on 5.1 and then on 6.0

How to repeat:
# 
# Test that DROP TABLES does not wait for a impending FLUSH TABLES 
# WITH READ LOCK 
# 
 
--disable_warnings 
drop table if exists t1; 
--enable_warnings 
create table t1 (i int); 
connect (flush,localhost,root,,test,,); 
connection default; 
--echo connection: default 
lock tables t1 write; 
connection flush; 
--echo connection: flush 
--send flush tables with read lock; 
connection default; 
--echo connection: default 
let $wait_condition= 
  select count(*) = 1 from information_schema.processlist 
  where state = "Flushing tables"; 
--source include/wait_condition.inc 
flush tables; 
let $wait_condition= 
  select count(*) = 1 from information_schema.processlist 
  where state = "Flushing tables"; 
--source include/wait_condition.inc 
drop table t1; 
let $wait_condition= 
  select count(*) = 0 from information_schema.processlist 
  where state = "Flushing tables"; 
--source include/wait_condition.inc 
connection flush; 
--reap 
connection default; 
disconnect flush;
[23 Dec 2008 21:58] Jonathan Miller
Omer, 5.1 stays locked like it should, 6.0 passes straight through.
[24 Dec 2008 4:29] Davi Arnaut
I don't see why this is a backup issue. Nonetheless, if I recall correctly this test is only present on 6.0 and the associated bug was also only fixed on 6.0.

http://lists.mysql.com/commits/40027
[29 Dec 2008 11:49] Davi Arnaut
Closing as not a bug because if a thread holds a locked table, it should not wait for a impending GRL as waiting can lead to a deadlock (the GLR thread will wait for the locked table). Furthermore, the deadlock was only fixed on 6.0 and another difference is that the thread state on 6.0 is "Waiting for table". More information about the deadlock issue can be found by looking at the changeset history of the test case.
[29 Dec 2008 19:10] Sheeri Cabral
This actually is a bug, but it's an artifact of http://bugs.mysql.com/bug.php?id=25858
[29 Dec 2008 19:21] Jeremy Zawodny
How do you *NOT* see this as a backup issue?

Lots of backup methods do a FLUSH TABLES WITH READ LOCK and then go about performing their work and expect that no CHANGES take place during that time.

If someone is able to DROP TABLE during a backup, that could lead to data loss and/or inconsistent backups.
[29 Dec 2008 21:28] Jeremy Cole
Ugh.  This is definitely a bug -- it will break nearly every backup script in existence in a subtle and non-obvious way.
[29 Dec 2008 21:38] Davi Arnaut
When I wrote that this is not a backup issue I was referring to the bug category. The correct category for this bug is Locking and has implications far beyond backup.

If FLUSH TABLES WITH READ LOCK is impending (it hasn't completed yet), someone is only able to drop a table if it already holds a exclusive lock on the table -- and the flush won't succeed until all users relinquish their exclusive locks.

So, could someone explain me what is the bug? Do people want to backport this bug fix to 5.1? Am I missing something?
[29 Dec 2008 21:50] Davi Arnaut
Simply adding a comment that this is a bug without explaining what the bug is serves no purpose. The facts so far is that someone took a test case for a bug that is fixed only on 6.0 and tried to run it on 5.1 and it obviously failed. Furthermore, this context only applies when a connection is holding a exclusive lock on a table.

If someone sees a bug here, please explain it in detail so that we clearly identify what is the problem.
[29 Dec 2008 21:56] Konstantin Osipov
Apparently some are missing that FLUSH TABLES WITH READ LOCK has not yet completed at the point when the table is dropped.

Please correct me understanding of the complain if it is wrong:

Before, *pending* FLUSH TABLES WITH READ LOCK would block DROP TABLES.
Now it doesn't, provided that the issuer of DROP TABLES has locked the table with LOCK TABLE <name> WRITE;

How can this break "pretty much every backup script out there"?
[29 Dec 2008 23:04] Jonathan Miller
Brian Aker wrote:
> Hi!
>
> This means that Veritas backup and LVM snapshots will no longer work :)
>
> Cheers,
>     -Brian
[30 Dec 2008 1:01] Davi Arnaut
> Omer, 5.1 stays locked like it should, 6.0 passes straight through.

On 5.1 it stays *dead* locked, both DROP TABLES and FLUSH TABLES WITH READ LOCK are on a circular wait and won't succeed. FLUSH TABLES WITH READ LOCK is waiting for t1 (which is locked by another connection) and the connection that holds the exclusive lock on t1 waits for the FLUSH TABLES WITH READ LOCK.
[30 Dec 2008 11:55] Ingo Strüwing
IMHO this bug report describes a bug in 5.1, which has been fixed in 6.0.

The problem in 5.1 is a server lockup on the following sequence:

    con1: LOCK TABLE t1 WRITE
        con2: FLUSH TABLES WITH READ LOCK
    con1: DROP TABLE t1

Both, in 5.1 and 6.0, FLUSH TABLES WITH READ LOCK takes the global read lock immediately, but the flush part has to wait for the table lock.

Both, in 5.1 and 6.0, new write locks are already blocked in this situation. No LOCK TABLE t2 WRITE, and no DROP TABLE t2 will go through when started after FLUSH TABLES WITH READ LOCK took the global read lock.

In 5.1, DROP TABLE t1 blocks on the global read lock ==> lockup.

In 6.0, DROP TABLE t1 is allowed to bypass the global read lock, because it is owner of the existing write lock on t1. This is ok, because FLUSH TABLES WITH READ LOCK did not report success yet, so the application cannot know if it owns the global read lock yet.

DROP TABLE t1 implicitly releases the write lock on t1 and thus allows the flush part of FLUSH TABLES WITH READ LOCK to proceed and the statement to finish.

Applications, that wait for FLUSH TABLES WITH READ LOCK to finish and report success, before doing backups, snapshots, or whatever, are safe.

Now I wonder why Brian says that "Veritas backup and LVM snapshots will no longer work". Do they start their backup/snapshot work before FLUSH TABLES WITH READ LOCK returns with success?

Regards
Ingo