Bug #77390 Manual does not explain a "deadlock" case of online ALTER
Submitted: 17 Jun 2015 12:54 Modified: 12 Apr 2017 16:27
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: alter, innodb, metadata, missing manual

[17 Jun 2015 12:54] Valeriy Kravchuk
Description:
Manual (https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-concurrency.html) says:

"An online DDL statement for an InnoDB table always waits for currently executing transactions that are accessing the table to commit or roll back, because it requires exclusive access to the table for a brief period while the DDL statement is being prepared. Likewise, it requires exclusive access to the table for a brief time before finishing. Thus, an online DDL statement waits for any transactions that are started while the DDL is in progress, and query or modify the table, to commit or roll back before the DDL completes."

From the above one may conclude that if there is some lock set on the table we want to ALTER at the moment we start ALTER, we do nothing but wait for the transaction to complete. In reality, we sometimes may do proceed with online ALTER and then ALTER is blocked at the very end, when exclusive table level lock is requested, while holding the metadata lock on the table ALTERed. This may cause massive locking of other sessions if innodb_lock_wait timeout is large enough.

See "How to repeat" for specific details.

How to repeat:
Create the following tables and add rows:

create table t1(id int primary key, data int) engine=InnoDB;
create table t2(id int primary key, data int, t1_id int, constraint fk_t1
  foreign key(t1_id) references t1(id)) engine=InnoDB;

insert into t1 values (1,1);
insert into t2 values (1,1,1);

Now, in session 1 do:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t2 values (2,1,1);
Query OK, 1 row affected (0.01 sec)

In session 2 do:

mysql> alter table t1 add column data2 int, algorithm=INPLACE, lock=NONE;

ALTER hangs (as expected), the problem is the stage it hangs on:

mysql> show processlist;
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
| Id | User | Host            | db   | Command | Time | State
                 | Info
     |
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
|  1 | root | localhost:49835 | test | Sleep   |   70 |
                 | NULL
     |
|  2 | root | localhost:49850 | test | Query   |    0 | init
                 | show processlist
     |
|  3 | root | localhost:49853 | test | Query   |    7 | committing alter table t
o storage engine | alter table t1 add column data2 int, algorithm=INPLACE, lock=
NONE |
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
3 rows in set (0.00 sec)

So, it's final stage and it requires X level table lock. If in the first (or any other) session we try to access the table:

update t1 set data = 2 where id=1; 

this will hang waiting for table metadata lock:

mysql> show processlist;
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
| Id | User | Host            | db   | Command | Time | State
                 | Info
     |
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
|  1 | root | localhost:49835 | test | Query   |    3 | Waiting for table metada
ta lock          | update t1 set data = 2 where id=1
     |
|  2 | root | localhost:49850 | test | Query   |    0 | init
                 | show processlist
     |
|  3 | root | localhost:49853 | test | Query   |   30 | committing alter table t
o storage engine | alter table t1 add column data2 int, algorithm=INPLACE, lock=
NONE |
+----+------+-----------------+------+---------+------+-------------------------
-----------------+--------------------------------------------------------------
-----+
3 rows in set (0.00 sec)

and these tow session will hang on each other for the time equals to min(lock_wait_timeout, innodb_lock_wait_timeout). The later is smaller by default and eventually ALTER fails:

mysql> alter table t1 add column data2 int, algorithm=INPLACE, lock=NONE;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>

The locks involved at InnoDB level are:

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 81435, not started
MySQL thread id 3, OS thread handle 0x11d8, query id 18 localhost ::1 root committing alter table to storage engine
alter table t1 add column data2 int, algorithm=INPLACE, lock=NONE
---TRANSACTION 81434, ACTIVE 51 sec setting table lock for creating or dropping index
mysql tables in use 1, locked 1
LOCK WAIT 1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 3, OS thread handle 0x11d8, query id 18 localhost ::1 root committing alter table to storage engine
alter table t1 add column data2 int, algorithm=INPLACE, lock=NONE
Trx read view will not see trx with id >= 81436, sees < 81429
------- TRX HAS BEEN WAITING 50 SEC FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `test`.`t1` trx id 81434 lock mode X waiting
------------------
TABLE LOCK table `test`.`t1` trx id 81434 lock mode X waiting
---TRANSACTION 81429, ACTIVE 114 sec
3 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x1224, query id 20 localhost ::1 root Waiting for table metadata lock
update t1 set data = 2 where id=1
TABLE LOCK table `test`.`t2` trx id 81429 lock mode IX
TABLE LOCK table `test`.`t1` trx id 81429 lock mode IS
RECORD LOCKS space id 531 page no 3 n bits 72 index `PRIMARY` of table `test`.`t1` trx id 81429 lock mode S locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000000013e12; asc     > ;;
 2: len 7; hex 8f000001c20110; asc        ;;
 3: len 4; hex 80000001; asc     ;;

This case is really hard to debug and it would be nice to explain all kinds of InnoDB and metadata locks set by online ALTER, in order.

Suggested fix:
Please, explain all metadata and InnoDB locks set by online ALTER, with examples and details enough to explain non-trivial cases like the one above. From current manual content

"An online DDL statement for an InnoDB table always waits for currently executing transactions that are accessing the table to commit or roll back, because it requires exclusive access to the table for a brief period while the DDL statement is being prepared"

one may expect that ALTER is locked at the initial stage and thus can NOT hold any blocking metadata lock.
[17 Jun 2015 14:42] MySQL Verification Team
Thank you for the bug report.
[24 Mar 2017 17:22] Daniel Price
Posted by developer:
 
An MDL lock is not taken on the referenced table, permitting the online ALTER TABLE operation to proceed up to the commit phase.

create table t1(id int primary key, data int) engine=InnoDB;
create table t2(id int primary key, data int, t1_id int, 
  constraint fk_t1 foreign key(t1_id) references t1(id)) engine=InnoDB;

insert into t1 values (1,1);
insert into t2 values (1,1,1);

SESSION 1:

start transaction;
insert into t2 values (2,1,1); 

# MDL lock is taken on t2 only. MDL lock is only acquired for the explicitly 
# mentioned table. (See related Bug #11756705)
# IX lock is taken on t2
# IS lock is taken on t1 for FK check 

SESSION 2:

alter table t1 add column data2 int, algorithm=INPLACE, lock=NONE;

# MDL Exclusive lock is taken on t1 briefly for online DDL prepare phase
# MDL lock on t1 is downgraded for the online DDL copy phase
# MDL Exclusive lock taken again on t1 for online DDL commit phase 
# Online DDL commit phase is blocked by the IS lock on t1 taken by SESSION 1. 

SESSION 3:

update t1 set data = 2 where id=1; 

# Blocked by MDL exclusive lock on t1 acquired for the online DDL commit phase
[12 Apr 2017 16:27] Daniel Price
Posted by developer:
 
The documentation has been updated as follows:

https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-concurrency.html

"A case in which an online DDL operation on a table does not wait for a
currently executing transaction to complete can occur when the table is in
a foreign key relationship and a transaction is run explicitly on the
other table in the foreign key relationship. In this case, the transaction
holds an exclusive metadata lock on the table it is updating, but only
holds shared InnoDB table lock (required for foreign key checking) on the
other table. The shared InnoDB table lock permits the online DDL operation
to proceed but blocks the operation at the commit phase, when an exclusive
InnoDB table lock is required. This scenario can result in deadlocks as
other transactions wait for the online DDL operation to commit. (See Bug
#48652, and Bug #77390)"

Thank you for the bug report.