MySQL Bugs: #101695: Deadlock behavior

Bug #101695	Deadlock behavior
Submitted:	20 Nov 2020 12:09	Modified:	20 Nov 2020 14:17
Reporter:	Niranjan R	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.6, 5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	deadlocks, Exclusive lock, unique index

Description:
Weird deadlock behavior where a transaction holds a lock on an index, and the SAME transactions also waits for a lock on the exact SAME index.

Snippet from show engine innodb status:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-11-20 11:45:02 0x7f40dd4f4700
*** (1) TRANSACTION:
TRANSACTION 225349, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 574, OS thread handle 139916881958656, query id 2294 ip-172-31-41-92.ec2.internal 172.31.41.92 root updating
update tlock set name='lmno' where name='pqrs'

*** (1) HOLDS THE LOCK(S):         ---> SAME
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225349 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:         ---> SAME
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225349 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 225348, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 1
MySQL thread id 566, OS thread handle 139916372268800, query id 2293 ip-172-31-41-92.ec2.internal 172.31.41.92 root updating
update tlock set name='pqrs' where name='lmno'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225348 lock mode S
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 7778797a; asc wxyz;;
 1: len 4; hex 80000003; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225348 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

*** WE ROLL BACK TRANSACTION (1)

--------------------------
If you look at the Transaction (1) : The transaction is holding an exclusive lock on index and is waiting for a lock on the exact same index.

Is this an expected behavior?

How to repeat:
== Create a table with the following DDL:
CREATE TABLE `tlock` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `phnumber` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `usname` (`name`)
);

== Load it with some data:

mysql> select * from tlock;
+----+------+----------+
| id | name | phnumber |
+----+------+----------+
|  1 | abcd |    98765 |
|  2 | pqrs |    12345 |
|  3 | wxyz |    54321 |
+----+------+----------+
3 rows in set (0.00 sec)

== Create one .SQL file with the following statements
update tlock set name='lmno' where name='pqrs';
update tlock set name='pqrs' where name='lmno';

== Execute this .SQL file from 2 different connections simultaneously.
I made use of a bash script(infinite loop) to create simultaneous connections 

Within few seconds, we can see a lot of deadlocks created.

Tested it on 8.0.22 using the above steps.
Steps to reproduce the issue on 5.6 and 5.7 is a bit different. But I was able to achieve a similar deadlock structure.

Suggested fix:
Removing the unique index 'usname' will ensure that there are no deadlocks

Hi Mr. R,

Thank you for your bug report.

However, this is not a bug.

Actually, 8.0.22 has the proper corrected output, which explains everything:

------- TRX HAS BEEN WAITING 7 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 95 page no 5 n bits 72 index usname of table `test`.`t1` trx id 27713 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

------------------
TABLE LOCK table `test`.`t1` trx id 27713 lock mode IX
RECORD LOCKS space id 95 page no 5 n bits 72 index usname of table `test`.`t1` trx id 27713 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

---TRANSACTION 27711, ACTIVE 12 sec
5 lock struct(s), heap size 1200, 5 row lock(s), undo log entries 2
MySQL thread id 11, OS thread handle 123145335353344, query id 33 localhost sinisa
TABLE LOCK table `test`.`t1` trx id 27711 lock mode IX
RECORD LOCKS space id 95 page no 5 n bits 72 index usname of table `test`.`t1` trx id 27711 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 6c6d6e6f; asc lmno;;
 1: len 4; hex 80000002; asc     ;;

RECORD LOCKS space id 95 page no 4 n bits 72 index PRIMARY of table `test`.`t1` trx id 27711 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000000006c3f; asc     l?;;
 2: len 7; hex 01000001700d61; asc     p a;;
 3: len 4; hex 70717273; asc pqrs;;
 4: len 4; hex 8000000c; asc     ;;

RECORD LOCKS space id 95 page no 5 n bits 72 index usname of table `test`.`t1` trx id 27711 lock mode S locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 70717273; asc pqrs;;
 1: len 4; hex 80000002; asc     ;;

RECORD LOCKS space id 95 page no 5 n bits 72 index usname of table `test`.`t1` trx id 27711 lock mode S
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 7778797a; asc wxyz;;
 1: len 4; hex 80000003; asc     ;;

------------------------------------------------------------------------------------

This is exactly the output to be expected with your test case. You have the same record twice reported. The cause is simple. Once it is locked for the existed name and second time for the new name. Since both sessions are running both DML statements simultaneously, this is the expected behaviour.

Not a bug.

Hello, 
In the bug report you wrote:

*** (1) HOLDS THE LOCK(S):         ---> SAME
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225349 lock_mode X waiting

...

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:         ---> SAME
RECORD LOCKS space id 5 page no 5 n bits 72 index usname of table `dlock`.`tlock` trx id 225349 lock_mode X waiting

and indeed this is the same lock.

But what should not be ignored is the word "waiting" at the end of each of these lines.
This means that the transaction did not yet acquired the X-lock - it's still waiting for it.

The very misleading part of the output is that the section is titled "HOLDS THE LOCK(S)", while in fact not every lock mentioned in this section is actually "held". It's still in the process of acquisition, as evidenced by the word "waiting".
The reason this lock is listed in the section titled "HOLDS THE LOCK(S)" is because for the purpose of deadlock detection algorithm, a request for an X-lock is treated very similarly to already holding an X-lock.
And this is done to correctly model the fact, that transactions which request S-lock, have to wait not only for transactions already holding an X-lock, but also for transactions which merely requested it, but still wait for it to be granted - this is done to avoid the problem of "starvation of writers by readers".
Another reason (perhaps more important for your case) which is a bit similar, is that a transaction which requests an X-lock, has to wait for its turn, giving priority to transaction which already waits for an X-lock (in mysql 5.6 and 5.7 there's simply a FIFO queue, and in 8.0 there is also CATS algorithm, but both algorithms try to ensure that new-coming transaction waits for those already in queue). This again is done to avoid starvation: this time among writers themselves.
So, given that InnoDB forces transactions to wait for a transaction which waits for an X-lock, it might mean that a deadlock is possible, in which one of the edges in the deadlock cycle involves waiting for transaction which still waits for an X-lock.
And the simplest way to model it in the deadlock checking algorithm is to treat the "X-lock being waited for" as "already held".
This later leads to a bit confusing output seen above.
But once you understand the above logic, you can easily interpret the output as saying that:

Transaction (2) already holds an S-lock, but asked for an X-lock, which it can't get granted immediately, because to avoid starving transaction (1) which is waiting for an X-lock, Transaction (2) should give priority to the Transaction (1), thus effectively Transaction (2) waits for Transaction (1), which unfortunately can't get granted the X-lock now, because its request for an X-lock is conflicting with the S-lock already held by Transaction (1).

Here's the cycle:
T2's X-lock request waits for T1's X-lock request, which waits for T2's S-lock to be released.

This could be perhaps solved by changing the implementation of "lock escalation logic": when a T2, which already has S-lock tries to escalate the permission to an X-lock, we could perhaps favour T2 to T1 when they fight over the X-lock, because T1 has to wait for T1 anyway (because of the S-lock), so we are not saving anyone from starving by favouring T1, yet we might cause a deadlock like in your case.
But this idea would be a feature request, not a bug fix, as current implementation is also correct, if suboptimal.

Thank you, Jakub.

The scenario from the bug report rewritten as a mysql-test-run script:

# This is the original bug report from Niranjan R
CREATE TABLE `tlock` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `phnumber` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `usname` (`name`)
);

INSERT INTO tlock VALUES (1,'abcd',98765),(2,'pqrs',12345),(3,'wxyz',54321);
SELECT * FROM tlock;

DELIMITER |;
CREATE PROCEDURE spammer(t INT)
 BEGIN
     WHILE 0<t DO
       SET t = t - 1;
       update tlock set name='lmno' where name='pqrs';
       update tlock set name='pqrs' where name='lmno';
     END WHILE;
 END|
DELIMITER ;|

--connect(C1,localhost,root,,test)
  --send CALL spammer(2000)

--connection default
  CALL spammer(2000);

--connection C1
  --reap

--connection default
--disconnect C1
DROP PROCEDURE spammer;
DROP TABLE tlock;

Note: this fails only sometimes, under congestion, so run it with:
    $ ./mtr --mtr-port-base=31000 --force --retry=0 --max-test-fail=0 --repeat=100 --parallel=20 --testcase-timeout=1 innodb.lock_escalation_101695
You should see several:
    mysqltest: At line 34: Query 'reap' failed.
    ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

A fix for Bug #21356 "Change lock priority so that the transaction holding S-lock gets X-lock first" will improve behaviour in scenarios like this.

Thank you Jakub.