MySQL Bugs: #79274: CREATE TRIGGER etc is not able to get a SHARED_NO

Bug #79274	CREATE TRIGGER etc is not able to get a SHARED_NO_WRITE metadata lock
Submitted:	13 Nov 2015 15:03	Modified:	25 Jul 2019 11:18
Reporter:	Daniël van Eeden (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S2 (Serious)
Version:	5.6, 5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	locks, mdl, Metadata lock

Description:
Situation:
CREATE TRIGGER on a busy table (w/ many writes)
This hangs

To troubleshoot:
select LOCK_TYPE, LOCK_DURATION, LOCK_STATUS, OWNER_THREAD_ID, t.PROCESSLIST_ID, t.PROCESSLIST_USER, t.PROCESSLIST_COMMAND,
t.PROCESSLIST_TIME, LEFT(e.SQL_TEXT,40) from metadata_locks mdl LEFT JOIN threads t on mdl.OWNER_THREAD_ID=t.thread_id
left join events_statements_current e on e.thread_id=mdl.owner_thread_id where mdl.object_name='my_table' order by PROCESSLIST_TIME desc

The problem:
This shows that CREATE TRIGGER has a PENDING SHARED_NO_WRITE metadata lock on that table.
There are multiple other queries with a GRANTED SHARED_WRITE lock on the same table. These queries finished a INSERT/UPDATE/etc.
The script executing the CREATE TRIGGER gets a number of Lock wait timeouts and needs to keep retrying.

The workaround:
Killing the connections which have a SHARED_WRITE lock lets the CREATE TRIGGER continue, but only if you kill all the queries at once
(maybe even multiple times). This is because there will be new connections which took a SHARED_WRITE lock.

How to repeat:
Create a table which is heavily written to (maybe keep connections/transactions open for a few seconds)
Then try to create a trigger on the table.

Suggested fix:
Make it possible to create a trigger on a very active table.

Best solution: Make this so it doesn't need an exclusive lock.

Alternative solution: Make it possible to block new connections from taking a SHARED_WRITE lock while the CREATE TRIGGER is running.

Any reason this bug is not verified?

Hi Daniël,

Sorry for waiting so long for our response.

Actually, SHARED_NO_WRITE MDL lock is highly necessary when executing CREATE or DROP TRIGGER statements. Truly and highly necessary. Also, during the lifetime of 5.7, this lock has been escalated into SHARED_NO_READ_WRITE lock. It was necessary for fixing some bug(s).

However, I do not see this behaviour described anywhere in our documentation, nor do I see any entry in our Revision History (which we used to call ChangeLog).

Hence, would you mind if we convert this bug into Documentation bug and ask for a full and verbose explanation in our Reference Manual(s) and Revision history of why is this behaviour necessary ???

Many thanks in advance .......

Not sure if this is should only be fixed in documentation.

I think we should go for this solution:
Make it possible to block new connections from taking a SHARED_WRITE lock while the CREATE TRIGGER is running.

The problem now is that while CREATE TRIGGER runs there are new connections taking the SHARED_WRITE lock and the the CREATE TRIGGER waits until it hits a timeout.

Hello Daniel,

I read your request, but I do not see it how we can accommodate your request. May be you can help me ??

Due to the design and due to the various critical bugs that occurred in 5.6 and 5.7 lifetime, MDL works with CREATE / DROP TRIGGER in the following fashion....

First, TL_READ_NO_INSERT lock is taken. That is a read-only lock that does not allow for concurrent writes of any kind.

Now, we come to the critical point. At one point in this DDL operation, this lock had to be changed into this metadata lock: MDL_SHARED_NO_READ_WRITE. Short description of this lock:

"
    A shared metadata lock which allows other connections
    to access table metadata, but not data.
    It blocks all attempts to read or update table data, while allowing
    INFORMATION_SCHEMA and SHOW queries.
    A connection holding this kind of lock can read table metadata modify and
    read table data.
"

Hence, if other SQL statements come after first lock, but before the second one, then CREATE / DROP TRIGGER will have to wait that all MDL locks are cleared before it acquires this lock.

It would be possible to move this lock to the beginning of this DDL operation, but that would block so many SELECT queries that it would be a huge performance loss.

If I understood your idea fully, you would like us to block connections of getting established, at all. Or did you mean just to prohibit issuing SQL statements, but the consequences would be the same for both ideas. As far as I recall, we do not have any mechanism to perform such a blocking. That would require a major overhaul of our server and this change might result in the number of the regression bugs.

Please, let me know your opinion on this matter.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I encountered this when attempting to run a schema change with pt-online-schema-change on MySQL 5.7.22. The OSC failed with this error:

DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction [for Statement "CREATE TRIGGER `pt_osc_rescore_B_TABLENAME_del` AFTER DELETE ON `SCHEMANAME`.`TABLENAME` FOR EACH ROW DELETE IGNORE FROM `rescore`.`TABLENAME` WHERE `SCHEMANAME`.`_TABLENAME_new`.`id` <=> OLD.`id`"] at /usr/bin/pt-online-schema-change line 10722.

This basically prevents us from running online schema changes on any heavily written to table. I think there should be a way to specify that creation of the triggers should get higher priority than other incoming queries to the table. I'd expect trigger creation to not take long, so putting it in front of the queue should not hurt performance for other queries too much.

Hi Eduardo,

Thank you very much for your comments.

However, the problem that you are reporting is not related at all to this bug report.

What you are encountering is standard InnoDB's lock wait timeout hitting you. Our Reference Manual contains all the info on that timeout and on the methods on how to diagnose it.

To repeat this
----------------
Setup initial schema:

drop table if exists t;
create table t(a int not null,key(a))engine=innodb;
drop table if exists s;
create table s(a bigint)engine=innodb;
insert into s values(0);

Now run a fake workload with transactions that take up to 1 second each:

mysqlslap -h127.0.0.1 --query="start transaction;insert into t(a) values(floor(rand()*10000));do sleep(rand());commit;" --number-of-queries=10000  --iterations=100000 --create-schema=test --concurrency=100

In another session,  try to drop/create trigger.....

set lock_wait_timeout=30;
select now(6);
drop trigger if exists trg_before_insert;
select now(6);
create trigger trg_before_insert before insert on t for each row update s set a:=a;
select now(6);
drop trigger if exists trg_before_insert;
select now(6);

-- -----------------

The trigger cannot be created even if you have much longer lock_wait_timeout.

------------
mysql>   show processlist;
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------
| Id    | User | Host            | db   | Command | Time | State                           | Info
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------
|   153 | root | localhost:38234 | test | Query   |   49 | Waiting for table metadata lock | create trigger trg_before_insert bef
| 18352 | root | localhost:56437 | test | Query   |    0 | starting                        | show processlist
| 36856 | root | localhost:10458 | NULL | Sleep   |  303 |                                 | NULL
| 37157 | root | localhost:10765 | test | Query   |   48 | Waiting for table metadata lock | insert into t(a) values(floor(rand()
| 37158 | root | localhost:10767 | test | Query   |   48 | Waiting for table metadata lock | insert into t(a) values(floor(rand()
<cut>
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------
103 rows in set (0.00 sec)

mysql>  select now(6),LOCK_TYPE, LOCK_DURATION, LOCK_STATUS, OWNER_THREAD_ID, t.PROCESSLIST_ID, t.PROCESSLIST_USER, t.PROCESSLIST_COMMAND,
    ->   t.PROCESSLIST_TIME, LEFT(e.SQL_TEXT,40) from performance_schema.metadata_locks mdl LEFT JOIN performance_schema.threads t on mdl.OWNER_THREAD_ID=t.thread_id
    ->   left join performance_schema.events_statements_current e on e.thread_id=mdl.owner_thread_id where mdl.object_name='t' order by PROCESSLIST_TIME desc;
+----------------------------+-----------------+---------------+-------------+-----------------+----------------+------------------+---------------------+------------------+----------------------
| now(6)                     | LOCK_TYPE       | LOCK_DURATION | LOCK_STATUS | OWNER_THREAD_ID | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_COMMAND | PROCESSLIST_TIME | LEFT(e.SQL_TEXT,40)
+----------------------------+-----------------+---------------+-------------+-----------------+----------------+------------------+---------------------+------------------+----------------------
| 2019-03-28 08:23:39.357911 | SHARED_NO_WRITE | TRANSACTION   | PENDING     |             178 |            153 | root             | Query               |               73 | create trigger trg_be
| 2019-03-28 08:23:39.357911 | SHARED_WRITE    | TRANSACTION   | PENDING     |           37227 |          37202 | root             | Query               |               72 | insert into t(a) valu
<cut>
| 2019-03-28 08:23:39.357911 | SHARED_WRITE    | TRANSACTION   | PENDING     |           37202 |          37177 | root             | Query               |               72 | insert into t(a) valu
| 2019-03-28 08:23:39.357911 | SHARED_WRITE    | TRANSACTION   | GRANTED     |           18377 |          18352 | root             | Query               |                0 | select now(6),LOCK_TY
+----------------------------+-----------------+---------------+-------------+-----------------+----------------+------------------+---------------------+------------------+----------------------
102 rows in set (0.01 sec)

My expectation is that currently running transactions would finish and the ddl to create trigger should run after say ~2 seconds max.

my above test was on 5.7.25

In my test, what I found is that "lock_wait_timeout" should be higher than "innodb_lock_wait_timeout" in order for CREATE/DROP TRIGGER to succeed.  

Otherwise the innodb lock waits hold up the transactions and the MDL timeout kicks in first.   

So it is debatable whether I've repeated the same thing as reported.

Shane,

You have discovered that "lock_wait_timeout" should be higher than "innodb_lock_wait_timeout" in order for CREATE/DROP TRIGGER to succeed.  

That is both a workaround and a fix for the bug. Simply, on startup or change of the values, there should be a message in the error log and a warning  or error when either of the variables is set during runtime.

Can you point exactly to the documentation you're referring to?

I assume you're talking about this:

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html and Shared and Exclusive Locks.

The "docs" there seem quite clear but miss mentioning explicitly the situation we're talking about where there's an existing S lock, followed by someone trying to take an X lock.

So far so good, the X locker waits....
but if a third thread comes along  and requests an S lock what seems to be ignored is there's already an "attempt" to take an X lock so I'd read and argue that the new "S" lock must wait.

Our interpretation here and is that this 3rd thread gains an S lock leaving the  2nd thread to continue waiting. Thread one an release it's lock but a new thread can come in and request a new S lock and the process continues.

Basically the "attempt" to take an X lock is not honoured but InnoDB, forcing others to queue up behind the X lock requestor.

This is what we think is happening and as far as I can see this specific case is not documented explicitly at least where I was looking.

If my interpretation is wrong or you're trying to point me at something else that's clear to you please point at the page / section and appropriate text to help clarify.

Thanks.

Let me rephrase what didn't come out quite correctly:

Basically the "attempt" to take an X lock is not honoured by InnoDB, which should force other requestors to queue up behind the X lock requestor, and only be "handled" once that lock request is handled (or it times out)..

Note: we can also argue all around but what actually happens with the current locking, but I think that what seems to be a completely reasonable assumption is that a CREATE or DROP TRIGGER statement should be able to get a lock to make the changes it needs once all previous locks have been released as otherwise you can never be sure that the process will ever succeed.

If these earlier locks are not released in time then I agree the requestor of the lock for CREATE/DROP TRIGGER may suffer a timeout, but we do not think this is what is happening here

HI Simon,

Nice to hear from you again.

Your analysis of the InnoDB locks queueing is quite correct. There are only two points which I would like to bring in here ....

* A thread that asks for the X lock should be actually in the "X lock waiting" status

* Much more important ... What we have here is the mix of InnoDB and MDL locks. DDL operations on tables, views, triggers etc ... also require usage of the MDL locks. This is exactly the situation that we have here ... Your analysis is correct, but you missed to mention waiting on the MDL locks. Hence, that is how we came to the workaround and possible solution.

Last, but not least, this bug is already verified. Do you want us to do anything else, except for copying your comments into our internal bugs database, which is already done.

We are waiting on your feedback.

Thanks in advance.

Hi Sinisa,

Thanks for your response.

I reacted maybe incorrectly as the message between yourself and Shane seemed to me to imply that "this is a configuration issue and if you configure things differently the problem will go away", or that was a possible interpretation.

So clearly you've looked at this in more depth than I have and as you say this is verified but the comment "That is both a workaround and a fix for the bug." seems to imply that it's our problem to ensure the db is configured correctly and we should set things to ensure this problem does not arise.

If that was a misunderstanding then that's fine but it wasn't clear.
So if you agree there's a problem and as you say it's verified there's nothing else to do except hopefully look how the issue can be addressed.

Thank you, very much, Simon for your comments.

Yes, we are here in agreement on all counts. It is also possible that a solution to this problem can be found without changing those timeouts as described.

Based on all this, I have also increased the severity of this bug.

Have a nice weekend !!!

Posted by developer:
 
Hello!

Let me provide some clarifications about this issue.

First of all the type of lock acquired by CREATE TRIGGER/DROP TRIGGER
which times out in the above case is SHARED_NO_WRITE metadata lock
(these locks are documented at https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html
and https://dev.mysql.com/doc/refman/8.0/en/metadata-locks-table.html).
Such locks are not related to InnoDB's S and X locks described at
https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html which
were mentioned above.

SHARED_NO_WRITE is an upgradable shared metadata lock which blocks
all attempts to update table data (which acquire SHARED_WRITE metadata
lock), allowing reads (which acquire SHARED_READ metadata lock).

Pending request for lock of this type will wait for existing acquired
SHARED_WRITE locks on the object to go away, but it will also block all
further requests for SHARED_WRITE lock on the object. There is exception
to that rule if max_write_lock_count system variable is set, but it seems
not to be the case here.

This is can easily be shown using the following test case for mysqltest tool:

CREATE TABLE t1 (i INT);

# Start transaction which will acquire SW lock and thus block SNW
BEGIN;
INSERT INTO t1 VALUES (1);

connect (con1, localhost, root,,);
# Start CREATE TRIGGER which will block trying to acquire SNW
--send CREATE TRIGGER trg BEFORE INSERT ON t1 FOR EACH ROW SET @a:=1

connect (con2, localhost, root,,);
let $wait_condition=
  SELECT COUNT(*) = 1 FROM information_schema.processlist
  WHERE state = "Waiting for table metadata lock" AND
        info = "CREATE TRIGGER trg BEFORE INSERT ON t1 FOR EACH ROW SET @a:=1";
--source include/wait_condition.inc

# Try to execute DML which will request another SW on the table
--send INSERT INTO t1 VALUES (100)

connection default;
let $wait_condition=
  SELECT COUNT(*) = 1 FROM information_schema.processlist
  WHERE state = "Waiting for table metadata lock" AND
        info = "INSERT INTO t1 VALUES (100)";
--source include/wait_condition.inc

# Observe that both CREATE TRIGGER and second INSERT wait
SHOW PROCESSLIST;
# Id     User    Host    db      Command Time    State   Info
# 3      root    localhost       test    Query   0       starting        SHOW PROCESSLIST
# 4      root    localhost       test    Query   0       Waiting for table metadata lock CREATE TRIGGER trg BEFORE INSERT ON t1 FOR EACH ROW SET @a:=1
# 5      root    localhost       test    Query   0       Waiting for table metadata lock INSERT INTO t1 VALUES (100)

# Observe granted SW lock and waiting requests for SNW and SW locks.
SELECT * FROM performance_schema.metadata_locks WHERE object_name='t1';
# OBJECT_TYPE    OBJECT_SCHEMA   OBJECT_NAME     OBJECT_INSTANCE_BEGIN   LOCK_TYPE       LOCK_DURATION   LOCK_STATUS     SOURCE  OWNER_THREAD_ID OWNER_EVENT_ID
# TABLE  test    t1      140293295878096 SHARED_WRITE    TRANSACTION     GRANTED         24      2164
# TABLE  test    t1      140293361961328 SHARED_NO_WRITE TRANSACTION     PENDING         25      45
# TABLE  test    t1      140293160668240 SHARED_WRITE    TRANSACTION     PENDING         26      590

This can be also confirmed by looking at MySQL Server code.
Here is excerpt from sql/mdl.cc:

const MDL_lock::MDL_lock_strategy MDL_lock::m_object_lock_strategy =
{
...
  /**
    Each array in the next group specifies if a particular type of request can
    be satisfied if there is a waiting request for the same lock of a certain
    type. In other words each array specifies a priority matrix for different
    lock types.

    We use each of the arrays depending on whether the number of successively
    granted "piglet" and "hog" lock requests exceed the max_write_lock_count
    threshold. This is necessary to avoid high priority lock requests starving
    out requests with lower priority.

    The first array in the group is used in default situation when both
    MDL_lock::m_piglet_lock_count and MDL_lock::m_hog_lock_count don't exceed
    the threshold.

    A priority matrice specified by it looks like:

       Request  |         Pending requests for lock          |
        type    | S  SH  SR  SW  SWLP  SU  SRO  SNW  SNRW  X |
      ----------+--------------------------------------------+
      S         | +   +   +   +    +    +   +    +     +   - |
      SH        | +   +   +   +    +    +   +    +     +   + |
      SR        | +   +   +   +    +    +   +    +     -   - |
      SW        | +   +   +   +    +    +   +    -     -   - |
      SWLP      | +   +   +   +    +    +   -    -     -   - |
      SU        | +   +   +   +    +    +   +    +     +   - |
      SRO       | +   +   +   -    +    +   +    +     -   - |
      SNW       | +   +   +   +    +    +   +    +     +   - |
      SNRW      | +   +   +   +    +    +   +    +     +   - |
      X         | +   +   +   +    +    +   +    +     +   + |

    Invariant [INV2]: for all priority matrices, if A is the set of
    incompatible waiting requests for a given request and B is the set
    of incompatible granted requests for the same request, then A will
    always be a subset of B. This means that moving a lock from waiting
    to granted state doesn't unblock additional requests.
    MDL_lock::reschedule_waiters() code relies on this.
  */
  {
    {
      0,
      MDL_BIT(MDL_EXCLUSIVE),
      0,
      MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED_NO_READ_WRITE),
      // The below 2 lines describe acquiring SHARED_WRITE lock
      MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED_NO_READ_WRITE) |
        MDL_BIT(MDL_SHARED_NO_WRITE),
      MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED_NO_READ_WRITE) |
        MDL_BIT(MDL_SHARED_NO_WRITE) | MDL_BIT(MDL_SHARED_READ_ONLY),
      MDL_BIT(MDL_EXCLUSIVE),
      MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED_NO_READ_WRITE) |
        MDL_BIT(MDL_SHARED_WRITE),
      MDL_BIT(MDL_EXCLUSIVE),
      MDL_BIT(MDL_EXCLUSIVE),
      0
    },
...

So normally acquisition of SHARED_NO_WRITE lock by CREATE TRIGGER/DROP
TRIGGER should not be starved out by stream of concurrent DML.

However, there are nuances.

If your DML transactions which have been started before issuing
CREATE TRIGGER might run longer (because they simply do a lot, or
because block on some locks, or wait for InnoDB or intersubsystem
deadlocks to be resolved) than @@lock_wait_timeout setting
then they can cause CREATE TRIGGER to fail due to timeout.

Because of this, as Shane has pointed out above, it is bad idea to set
@@innodb_lock_wait_timeout to values longer than @@lock_wait_timeout
value in scenarios like this.

I think that before continuing with this bug we need clear confirmation
that in customers case there are no open DML transactions which hold
SHARED_WRITE lock on table in question longer than @@lock_wait_timeout
seconds (for any reason).

Hi Daniel, Simon,

I hope that you have read developer's comments.

Hence, he has the question in the end of his comment, that requires your answer:

-----------------------------------------------------------------------

I think that before continuing with this bug we need clear confirmation
that in customers case there are no open DML transactions which hold
SHARED_WRITE lock on table in question longer than @@lock_wait_timeout
seconds (for any reason).

-----------------------------------------------------------------------

Your feedback should be very much welcome.

Hi Simon,

If it is not a rush for you, it is not a rush for us either. 

Hence, Daniel should take his time.

> I think that before continuing with this bug we need clear confirmation
> that in customers case there are no open DML transactions which hold
> SHARED_WRITE lock on table in question longer than @@lock_wait_timeout
> seconds (for any reason).

I'm fairly sure that we don't have any open DML transactions for more than @lock_wait_timeout.

Daniel,

Thanks a lot.

Your comments have been copied to the internal database.

Let us know if you encounter any instance where value of that variable is not as you expected it to be.

Hi Daniel, Simon,

Your last answer to our question was:

"

I'm fairly sure that we don't have any open DML transactions for more than @lock_wait_timeout.

"

This makes a problem for us to repeat the problem that you are experiencing. If your @@global.lock_wait_timout is larger then @@global.innodb_lock_wait_timeout and if you did not have any DML / DDL transactions longer than lock_wait_timeout, then we are in the problem here. This is so since, we managed to repeat what you experienced only when those two config variables are wrongly set.

Can you clarify the issue for us ????

Thanks a lot in advance.

Hi Daniel,

We have informed you one month ago what is the only scenario under which we can repeat this behaviour. Since you claimed that this scenario is not what you have, our Development has changed the status to the appropriate one.