MySQL Bugs: #87620: Deadlock on get

Bug #87620	Deadlock on get_lock(..., 0)
Submitted:	31 Aug 2017 9:13	Modified:	3 Nov 2017 15:54
Reporter:	Marco Neves	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S3 (Non-critical)
Version:	5.7.5+	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	bug, deadlock, GET_LOCK

Description:
When there is two(+?) concurrent attempts to get a pre-existing user level lock with timeout, it results in a Deadlock error.

The error doesn't happen if the two queries are ran independently, only if they run at the same time.

 

How to repeat:
Setup:

CREATE TABLE `queue` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `consume_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `data` varchar(255) DEFAULT NULL,
  `state` tinyint(3) unsigned DEFAULT '1',
  `retries` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `consume_at` (`consume_at`)
) ENGINE=InnoDB;

We add some records:

for i in $(seq 2 10000); do echo "INSERT INTO queue (data,retries)
VALUES($i, 0);"; done|mysql -u test -p test_locks

### Open two connections (conn1, conn2) and run this commands at the
same time in both (I'm using cssh, but other tools will work):

select id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0);

conn1:
...
5674 rows in set (3.69 sec)

conn2:
...
4326 rows in set (4.65 sec)

(The number of rows per connection will vary, but you get the idea)

## I was expecting this to happen

Now, let's do it again - at this time all the rows in the table are
locked by one of the connection:

select id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0);

conn1:
ERROR 3058 (HY000): Deadlock found when trying to get user-level lock;
try rolling back transaction/releasing locks and restarting lock
acquisition.

conn2:
4326 rows in set (4.14 sec)

## I was not expecting this deadlock - I was expecting each of the
connections to return again the rows they already had locked.

If this queries are rerun again the result will be the same - the
connection that get the deadlock the first time will get it every
time, AFAICT.

!!! I call that **Weird 1**

Running the query in each of the connections independently still
return the right results (i.e. the records they already had locked).

Just for test, let's check metadata_locks:

select count(*) FROM performance_schema.metadata_locks WHERE
OBJECT_TYPE='USER LEVEL LOCK';
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.03 sec)

## I was expecting that as well.

So, let's talk about **weird 2**:

back to both connections (I'm running this simultaneously, but it
doesn't matter anymore):

SELECT RELEASE_ALL_LOCKS();

conn1:
+---------------------+
| RELEASE_ALL_LOCKS() |
+---------------------+
|                6776 |
+---------------------+
1 row in set (0.04 sec)

conn2:
+---------------------+
| RELEASE_ALL_LOCKS() |
+---------------------+
|               12978 |
+---------------------+
1 row in set (0.04 sec)

What we get is - the connection that deadlocks returns a number that
is bigger or equal to the number of rows it managed to lock, and the
connection that does not deadlock will return a number that is as many
times the number of rows as the number of time the query is runned
(simultaneously and otherwise).

Alternative way to get into this issue:

--in both connections:

SELECT id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0) LIMIT
100; -- this works in both

SELECT id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0) LIMIT
100; -- this still works in both - and they return the previously locked rows

SELECT id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0) LIMIT
200; -- one of the connections fails with the deadlock error

SELECT id FROM queue2 WHERE consume_at<NOW() and GET_LOCK(id,0) LIMIT
200; -- one of the connections fails with the deadlock error, but not
always the same one

mpn

Suggested fix:
None of this examples should result in a deadlock - if the timeout is 0+ then I would not expect a deadlock error.

In all the cases used, I would expect to get the rows that were previously locked to be returned - and any extra rows needed to complete the dataset locked and returned if they are not locked.

I mistaken the version - this only affects 5.7.5+, not 5.6.5+.

Hi,
verified as described on 5.7.19

mysql [localhost] {msandbox} (test) > select id FROM queue WHERE consume_at<NOW() and GET_LOCK(id,0);
ERROR 3058 (HY000): Deadlock found when trying to get user-level lock; try rolling back transaction/releasing locks and restarting lock acquisition.
mysql [localhost] {msandbox} (test) > select @@version
    -> ;
+-----------+
| @@version |
+-----------+
| 5.7.19    |
+-----------+
1 row in set (0.00 sec)

Thanks for the submission
Bogdan

Posted by developer:
 
During the reimplementation of GET_LOCK() the semantics was changes a bit, as well as adding performance schema monitoring.

Quoting the 5.7 reference manual here:

GET_LOCK(str,timeout)

Tries to obtain a lock with a name given by the string str, using a timeout of timeout seconds. A negative timeout value means infinite timeout. The lock is exclusive. While held by one session, other sessions cannot obtain a lock of the same name.

Returns 1 if the lock was obtained successfully, 0 if the attempt timed out (for example, because another client has previously locked the name), or NULL if an error occurred (such as running out of memory or the thread was killed with mysqladmin kill).

A lock obtained with GET_LOCK() is released explicitly by executing RELEASE_LOCK() or implicitly when your session terminates (either normally or abnormally). Lock release may also occur with another call to GET_LOCK():

    Before 5.7.5, only a single simultaneous lock can be acquired and GET_LOCK() releases any existing lock.

    In MySQL 5.7.5, GET_LOCK() was reimplemented using the metadata locking (MDL) subsystem and its capabilities were extended. Multiple simultaneous locks can be acquired and GET_LOCK() does not release any existing locks. It is even possible for a given session to acquire multiple locks for the same name. Other sessions cannot acquire a lock with that name until the acquiring session releases all its locks for the name.

    As a result of the MDL reimplementation, locks acquired with GET_LOCK() appear in the Performance Schema metadata_locks table. The OBJECT_TYPE column says USER LEVEL LOCK and the OBJECT_NAME column indicates the lock name. Also, the capability of acquiring multiple locks introduces the possibility of deadlock among clients. When this happens, the server chooses a caller and terminates its lock-acquisition request with an ER_USER_LOCK_DEADLOCK error. This error does not cause transactions to roll back. 

The difference in lock acquisition behavior as of MySQL 5.7.5 can be seen by the following example. Suppose that you execute these statements:

SELECT GET_LOCK('lock1',10);
SELECT GET_LOCK('lock2',10);
SELECT RELEASE_LOCK('lock2');
SELECT RELEASE_LOCK('lock1');

In MySQL 5.7.5 or later, the second GET_LOCK() acquires a second lock and both RELEASE_LOCK() calls return 1 (success). Before MySQL 5.7.5, the second GET_LOCK() releases the first lock ('lock1') and the second RELEASE_LOCK() returns NULL (failure) because there is no 'lock1' to release.

MySQL 5.7.5 and later enforces a maximum length on lock names of 64 characters. Previously, no limit was enforced.

Locks obtained with GET_LOCK() are not released when transactions commit or roll back.

GET_LOCK() can be used to implement application locks or to simulate record locks. Names are locked on a server-wide basis. If a name has been locked within one session, GET_LOCK() blocks any request by another session for a lock with the same name. This enables clients that agree on a given lock name to use the name to perform cooperative advisory locking. But be aware that it also enables a client that is not among the set of cooperating clients to lock a name, either inadvertently or deliberately, and thus prevent any of the cooperating clients from locking that name. One way to reduce the likelihood of this is to use lock names that are database-specific or application-specific. For example, use lock names of the form db_name.str or app_name.str.

If multiple clients are waiting for a lock, the order in which they will acquire it is undefined. Applications should not assume that clients will acquire the lock in the same order that they issued the lock requests.

GET_LOCK() is unsafe for statement-based replication. A warning is logged if you use this function when binlog_format is set to STATEMENT.
Caution

With the capability of acquiring multiple named locks in MySQL 5.7.5, it is possible for a single statement to acquire a large number of locks. For example:

INSERT INTO ... SELECT GET_LOCK(t1.col_name) FROM t1;

These types of statements may have certain adverse effects. For example, if the statement fails part way through and rolls back, locks acquired up to the point of failure will still exist. If the intent is for there to be a correspondence between rows inserted and locks acquired, that intent will not be satisfied. Also, if it is important that locks are granted in a certain order, be aware that result set order may differ depending on which execution plan the optimizer chooses. For these reasons, it may be best to limit applications to a single lock-acquisition call per statement.

A different locking interface is available as either a plugin service or a set of user-defined functions. This interface provides lock namespaces and distinct read and write locks, unlike the interface provided by GET_LOCK() and related functions. For details, see Section 28.3.1, “The Locking Service”.

Hi Ståle Deraas,

I'm aware of the changes on 5.7.5, and the documentation you provided.

This is not an issue about the functionality change, but one specific behavior, which I don't think is expected.

1- I'm assuming that a timeout of 0 results in "get it now or move on returning 0" - which seems to work, except for one very specific case [3].

2- everything works as described in most cases - we can get multiple locks, other sessions don't get the same locks, all good

3- EXCEPT when two connections try to get AT THE SAME TIME, in parallel, the same preexisting locks. Both of the connections need to have some of the preexisting locks, as far as I can reproduce it.

Thanks,

mpn

Posted by developer:
 
Hello!

The problem is that GET_LOCK acquires the lock using MDL_context::acquire_lock() even in case of 0 timeout.
And the latter in cases when it fails to acquire lock immediately will do deadlock detection before
even checking if needs to wait for the lock at all due to timeout being reached. As result deadlocks
can be reported even with 0 timeout.

And indeed ER_LOCK_DEADLOCK is not something which is reasonable expect with 0 timeout (i.e. in case
when no wait should happen at all) unlike ER_LOCK_TIMEOUT.

IMO the problem can be fixed by avoiding deadlock detection in case when MDL_context::acquire_lock()
is called with 0 timeout.

Posted by developer:
 
Fixed in 8.0.4, 9.0.0.

Concurrent calls to GET_LOCK() could cause deadlock, even with a wait
time of 0.

Given the original bug report is in 5.7 are you intending on fixing this in 5.7?
The fix is currently only reported in 8.0/9.0 which doesn't help current users.

Posted by developer:
 
[4 Nov 2017 10:37] Mysqlsys_ww (MYSQLSYS)

Pushed into repo mysql branch mysql-5.7 5.7.21