Bug #32395 Alter table under a impending global read lock causes a server crash
Submitted: 14 Nov 2007 20:04 Modified: 18 Dec 2007 21:35
Reporter: Davi Arnaut (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: Davi Arnaut CPU Architecture:Any
Tags: ALTER TABLE, flush, global read lock, lock tables, read lock

[14 Nov 2007 20:04] Davi Arnaut
Description:
When holding locked tables, DDL/DDM statements don't check/wait
for a impending global read lock because it might deadlock if one
of the tables it holds is write-locked and there is no consistency
problem in ignoring the impending global read lock if the table
is write-locked, because the the global read lock request won't
have completed yet (it's waiting for the write-locked tables).

But the problem is that some DDS statements (like alter table)
need to momentarily drop the lock, reopen the table and grab
the write lock again (see reopen_tables). When grabbing the
lock again, reopen_tables doesn't pass a flag to mysql_lock_tables
in order to ignore the impending global read lock, which causes
a assertion because LOCK_open is being hold.

See somewhat similar issues: Bug#7823, Bug#9459, Bug#18884

Also, the manual is a bit confusing regarding the global read lock:

FLUSH TABLES WITH READ LOCK:

"Closes all open tables and locks all tables for all databases with a read lock"
..
"FLUSH TABLES WITH READ LOCK acquires a global read lock and not table locks"

How to repeat:
create table t1 (i int);
connect (flush,localhost,root,,test,,);
connection default;
--echo connection: default
lock tables t1 write;
connection flush;
--echo connection: flush
--send flush tables with read lock;
connection default;
--echo connection: default
let $wait_condition=
  select count(*) = 1 from information_schema.processlist
  where state = "Flushing tables";
--source include/wait_condition.inc
alter table t1 add column j int;
unlock tables;
connection flush;
--echo connection: flush
--reap
unlock tables;
connection default;
drop table t1;
disconnect flush;

Suggested fix:
Sort out global read lock semantics and requirements and fix
reopen_tables accordingly.
[15 Nov 2007 2:33] Davi Arnaut
Another smaller test case (slight different issue):

set session low_priority_updates=1;
create table t1 (i int);
lock tables t1 write;
flush tables with read lock;
[21 Nov 2007 20:49] Davi Arnaut
Last test case (low_priority_updates) reported as Bug#32528
[12 Dec 2007 21:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39829

ChangeSet@1.2681, 2007-12-12 19:44:14-02:00, davi@mysql.com +11 -0
  Bug#32395 Alter table under a impending global read lock causes a server crash
  
  The problem is that some DDL statements (ALTER TABLE, CREATE
  TRIGGER, FLUSH TABLES, ...) when under LOCK TABLES need to
  momentarily drop the lock, reopen the table and grab the write
  lock again (using reopen_tables). When grabbing the lock again,
  reopen_tables doesn't pass a flag to mysql_lock_tables in
  order to ignore the impending global read lock, which causes a
  assertion because LOCK_open is being hold. Also dropping the
  lock must not signal to any threads that the table has been
  relinquished (related to the locking/flushing protocol).
  
  The solution is to correct the way the table is reopenned
  and the locks grabbed. When reopening the table and under
  LOCK TABLES, the table version should be set to 0 so other
  threads have to wait for the table. When grabbing the lock,
  any other flush should be ignored because it's theoretically
  a atomic operation. The chosen solution also fixes a potential
  discrepancy between binlog and GRL (global read lock) because
  table placeholders were being ignored, now a FLUSH TABLES WITH
  READ LOCK will properly for table with open placeholders.
  
  It's also important to mention that this patch doesn't fix
  a potential deadlock if one uses two GRLs under LOCK TABLES
  concurrently.
[16 Dec 2007 11:43] Bugs System
Pushed into 5.1.23-rc
[16 Dec 2007 11:44] Bugs System
Pushed into 6.0.5-alpha
[18 Dec 2007 21:35] Paul DuBois
Noted in 5.1.23, 6.0.5 changelogs.

If a global read lock acquired with FLUSH TABLES WITH READ LOCK was
in effect, executing ALTER TABLE could cause a server crash.