Description:
Hi!
Two simple tables, `member` and `order`, are related to each other through a foreign key (`order` has a MemberId column pointing to the primary key of `member`).
When running an autocommitted UPDATE query on the `member` table concurrently with a simple transaction ending with another UPDATE on the member `table` this can cause deadlocks that are of such a simple kind that they shouldn't happen.
The problem is the foreign key. It causes a shared (S) lock to be aquired for the referenced row the primary key index of the member table. When the final UPDATE of the member table in the transaction is reached, this is escalated into a exclusive (X) lock. If the foreign key is dropped then everything works fine. But you don't want to drop a foreign key in a production environment, of course...
I've tried to skip the RemovedDate condition of the UPDATE queries, but that has no effect.
I've also tried to use another isolation level of the transaction (SERIALIZED) but that didn't help either.
This has been tested with:
MySQL Server 5.6.20 (fails)
MySQL Server 5.6.27 (fails)
MySQL Server 5.7.10 (fails)
Percona XtraDB Cluster 5.6.26-74 (fails)
MariaDB 10.1 (fails really hard)
SQL Server Express 2014 (always succeeds perfectly even though run many times)
A deadlock log entry looks like this:
*** (1) TRANSACTION:
TRANSACTION 2959776709, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 206844, OS thread handle 0x7f2d9bcf3700, query id 237435182 <IP> <USER> updating
UPDATE Member SET LastLoginDate=NOW(), LoginCount=IFNULL(LoginCount,0)+1, LastLoginIP='<IP>' WHERE MemberID=229101 and RemovedDate IS NULL
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1054808 page no 4126 n bits 136 index `PRIMARY` of table `<SCHEMA>`.`member` trx id 2959776709 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 2959776706, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
16 lock struct(s), heap size 2936, 10 row lock(s), undo log entries 3
MySQL thread id 206827, OS thread handle 0x7f2d9876d700, query id 237435203 <IP> <USER> updating
update Member set LastUsedOrderID=6705560 where RemovedDate is null and MemberID=229101
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1054808 page no 4126 n bits 136 index `PRIMARY` of table `<SCHEMA>`.`member` trx id 2959776706 lock mode S locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1054808 page no 4126 n bits 136 index `PRIMARY` of table `<SCHEMA>`.`member` trx id 2959776706 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)
How to repeat:
Table set up:
=============
CREATE TABLE `test`.`member` (
`MemberId` INT NOT NULL AUTO_INCREMENT COMMENT '',
`LastLoginDate` DATETIME NULL COMMENT '',
`LoginCount` INT NULL COMMENT '',
`LastLoginIP` VARCHAR(20) NULL COMMENT '',
`LastUsedOrderId` INT NULL COMMENT '',
`CreationDate` DATETIME NOT NULL COMMENT '',
`RemovedDate` DATETIME NULL COMMENT '',
PRIMARY KEY (`MemberId`) COMMENT '');
CREATE TABLE `test`.`order` (
`OrderId` INT NOT NULL AUTO_INCREMENT COMMENT '',
`MemberId` INT NOT NULL COMMENT '',
PRIMARY KEY (`OrderId`) COMMENT '',
CONSTRAINT `FK_O_MID_M_MID`
FOREIGN KEY (`MemberId`)
REFERENCES `test`.`member` (`MemberId`)
ON DELETE NO ACTION
ON UPDATE NO ACTION);
Test set up:
============
TRUNCATE TABLE `order`;
DELETE FROM member;
-- For a suitable number of test members (~200), do the following
INSERT INTO member (CreationDate) VALUES (NOW()); SELECT LAST_INSERT_ID();
Test structure (expressed as a mix of C# code and SQL statements):
==================================================================
-- One SetLoggedIn is enough to cause a deadlock, but then it happens so rarely that one will have to run it very many times to see it. Doing it this way will cause a deadlock to occur every time you run the program (will attach the full C# code for causing the deadlock). Our production environment causes this problem to occur at least once per hour.
memberIds.ForEach(mid =>
Parallel.Invoke(
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SetLoggedIn(mid),
() => SaveCart(mid),
() => Console.WriteLine(mid)
)
);
-- SetLoggedIn SQL statement
-- -------------------------
UPDATE member SET LastLoginDate=NOW(), LoginCount=IFNULL(LoginCount,0)+1, LastLoginIP='127.0.0.1' WHERE MemberID=?p_MemberId and RemovedDate IS NULL
-- SaveCart SQL statements
-- -----------------------
START TRANSACTION
INSERT INTO `order` (MemberId) VALUES (?p_MemberId); SELECT LAST_INSERT_ID();
-- LastUsedOrderId is set to LAST_INSERT_ID()
UPDATE member SET LastUsedOrderId = ?p_OrderId WHERE MemberId = ?p_MemberId AND RemovedDate IS NULL;
COMMIT
Suggested fix:
Transaction #1 is an autocommitted UPDATE on `member`.
Transaction #2 has 1 action (INSERT) on `order`, then tries to UPDATE the same row of `member`.
It got a Shared lock and wants to escalate it to eXclusive.
I don't know what the best solution is here, I don't even understand why the shared lock is aquired in the first place. Perhaps could the SERIALIZED isolation level let foreign keys conditions aquire exclusive locks immediately?