Bug #91549 MDL lock queue order seems to depend on table names
Submitted: 4 Jul 2018 16:29 Modified: 16 Nov 2018 14:07
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7.22, 8.0.11 OS:Any
Assigned to: CPU Architecture:Any

[4 Jul 2018 16:29] Shane Bester
Description:
The locking testcase differs only in the names of the tables,  
yet the results are fundamentally different.  

It is expected that name of table does not impact the outcome when multiple clients wait on locks.

Repeated on 8.0.11 and today's git.

How to repeat:
Testcase is by Yoshinori-san:

* Case 1:
drop table if exists a, a_old, a_new;
create table if not exists a (id int primary key) engine=innodb;
create table if not exists a_new (id int primary key) engine=innodb;

client 1)
lock table a write, a_new write;

client 2)
insert into a values (1);
-> waiting for locks

client 3)
rename table a to a_old, a_new to a;
-> waiting for locks

client 1)
unlock tables;
-> client 2 and 3 proceed

mysql> select * from a;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)

mysql> select * from a_old;
Empty set (0.00 sec)

-----------------

Case2 with only a different table name!!
---------------

* Case 2:
drop table if exists a, __old_a, __new_a;
create table if not exists a (id int primary key) engine=innodb;
create table if not exists __new_a (id int primary key) engine=innodb;

client 1)
lock table a write, __new_a write;

client 2)
insert into a values (1);
-> waiting for locks

client 3)
rename table a to __old_a, __new_a to a;
-> waiting for locks

client 1)
unlock tables;
-> client 2 and 3 proceed

mysql> select * from a;       <----------------
Empty set (0.00 sec)         

mysql> select * from __old_a; <----------------
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)

Suggested fix:
Clearly document how lock orders will be prioritized and 
fix code if anything need fixing here.
[5 Jul 2018 9:20] MySQL Verification Team
same observed in 5.7.22
[16 Jul 2018 12:22] Dmitry Lenev
Hello!

There is no bug in code here, i.e. the code behaves as designed in both test
cases.

To understand why we get different results in these two test cases, one
needs to take into account 2 points:

1) Metadata locks are acquired one by one. This means that RENAME TABLES statement
   can't request and wait for locks on all table names used in it simultaneously.
   Instead, it will acquire lock on the first* table name, the lock on the
   second** table name and so on.

   *), **) See next point for the meaning of "first" and "second" in this
   context.
   
2) In order to avoid deadlocks between DDL statements, the latter try to acquire
   metadata locks on table names in alphabetical order, for names which are
   are known in advance. So "RENAME TABLES a TO a_old, a_new TO a" acquires
   locks in the following order - 'a', 'a_new', 'a_old'. And "RENAME
   TABLE a TO __old_a, __new_a TO a" acquires locks in different order -
   '__new_a', '__old_a', 'a'.

Taking into account the above we can understand why the second test case
behaves in such a way.

===

drop table if exists a, __old_a, __new_a;
create table if not exists a (id int primary key) engine=innodb;
create table if not exists __new_a (id int primary key) engine=innodb;

client 1)
lock table a write, __new_a write;
# Acquires SNRW lock (shared lock which blocks reads and writes)
# on '__new_a' and then on 'a'.

client 2)
insert into a values (1);
# Blocked waiting for shared read (SR) lock on 'a'

client 3)
rename table a to __old_a, __new_a to a;
# Needs to acquire exclusive (X) lock on '__new_a', '__old_a' and then 'a'
# Blocked waiting on '__new_a', without holding any locks on 'a'

client 1)
unlock tables;
# Locks on 'a' and '__new_a' are realeased.

# In case of 'a', there is only waiting lock request from client 2, which
# is satisfied. So INSERT proceeds until completion.

# In case of '__new_a' the only waiting lock request is from client 3.
# It is satisfied. So RENAME gets lock on '__new_a', '__old_a' and then
# starts to wait for lock on 'a' (since it is owned by INSERT).

# So INSERT completes, adding row to 'a'. Then RENAME is unblocked,
# so 'a' becomes '__old_a', and '__new_a' becomes 'a'.

# Hence the below results:

mysql> select * from a;
# Empty set (0.00 sec)         

mysql> select * from __old_a;
# +----+
# | id |
# +----+
# |  1 |
# +----+
# 1 row in set (0.00 sec)

===

Here are a couple of possible workarounds for this issue:

1) Use naming scheme with suffixes for old and new table versions,
   similarly to how it was done in the first scenario. In this case
   RENAME TABLES will always try to acquire lock on table name 'a'
   before lock on 'a_new'. So there will be both INSERT and RENAME waiting
   for lock on table name 'a' until UNLOCK TABLES time.
   And when UNLOCK TABLE happens RENAME will take precedence over INSERT
   as it tries to acquire exclusive metadata lock/is DDL.

2) Use different prefix than '__old_/__new_' for old and new table
   versions which will get sorted after any reasonable table name
   alphabetically. E.g. '~old_' and '~new_'.
   Again RENAME TABLES will try to acquire lock on table name 'a'
   before locks on '~new_a' and '~old_a'. And the behavior will be
   similar to case 1). Names starting from '~' character will require
   quoting of course.

Having said all the above, I agree that we don't have much details
about how and what metadata locks are acquired by different statements
in our documentation. So I am changing this bug to Documentation issue.
[23 Jul 2018 18:05] Dmitry Lenev
Our documentation doesn't say much about lock priority so here are some clarifications about SR/SW and X locks..

Normally when we reschedule waiting lock requests after some thread releases the lock and there are waiters of both SR/SW and X types, the X lock will be preferred.

However, there is one exception to this rule. In case when max_write_lock_count system variable is set to some non-default, low value like 10, pending request for SR/SW lock might be preferred to pending request for X lock, provided that this SR/SW lock request already had to let pass lock requests of X, SNRW or SNW types (i.e. strong, DDL types) 10 times. In other words we permit pending SW/SR locks to proceed after granting max_write_lock_count X/SNRW/SNW locks while this lock is pending in a row (note that SNRW locks are for example acquired by LOCK TABLES WRITE).

OTOH, since max_write_lock_count main use cases are systems with MyISAM tables with heavy write load or systems where lots of LOCK TABLES WRITE statements are used the above corner case can be probably safely ignored by most users.
[16 Nov 2018 14:07] Paul DuBois
Posted by developer:
 
https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html
now points out that metadata lock acquisition is in lock order
and shows an example (based on your test case) to illustrate
how this can affect transaction outcome.