Bug #111637 Enhancement request: skip locked modifier to skip locked index records
Submitted: 30 Jun 2023 22:38 Modified: 3 Jul 2023 12:57
Reporter: David Rohan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Locking Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[30 Jun 2023 22:38] David Rohan
Description:
Scaling an application with multiple threads/processes/instances that uses a queue-like table causes deadlocks.  A secondary index is used by queries for work, and updated when work is completed.  When the "skip locked" modifier is used, transaction 1 is performing a query for work and transaction 2 is updating rows, deadlocks occur quite frequently.

With MySQL index record locking on locking reads, T1 locks secondary index records and wants the PK index record lock, while T2 finds rows by PK and updates the secondary index value.  MySQL deletes the secondary index and inserts a new tuple in the index when the value changes.  But T1 has the secondary index record locked.

How to repeat:
Create a table with PK id and secondary index status.  In one transaction, select for update where status=<some value> skip locked matching one or more rows, while at the same time in a second transaction update the value for secondary index on at least one of the rows with secondary index locked by transaction one.

Suggested fix:
Multiple potential ways to fix or improve:

1) Change the way MySQL locks index records before evaluating the where clause.  Presumably it works this way, though, for performance and this change would be detrimental to performance and also potentially most risky.  However, it might be reasonably safe to make this change with a config setting to turn it off/on at user's discretion.
2) Change MySQL to filter locked index records and not just locked data rows when using the new skip locked modifier
3) Use shared locks instead of exclusive locks on index records, and upgrade to exclusive locks if/when necessary
4) Use a mutex or similar mechanism to acquire both SI and PK index locks simultaneously or neither, reducing or eliminating potential for deadlock.
5) Possibly other(s)?
[25 Jan 21:05] Jakub Lopuszanski
Please provide a repeatable steps to reproduce the problem.
Ideally in a form of a script.

I ask for this, because, when I try the simplest things possible, which matches the bug description like:
```
CREATE TABLE `t` (
  `id` int NOT NULL,
  `val` int NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `val` (`val`)
);
INSERT INTO t VALUES (0,0), (10,10);
--connect (con1,localhost,root,,)
  BEGIN;
  # obtain X lock on record in PK
  SELECT * FROM t WHERE id=10 FOR UPDATE;
  # obtain X lock on record in val
  SELECT * FROM t WHERE val=10 FOR UPDATE;
--connect (con2,localhost,root,,)
  BEGIN;
  SELECT * FROM t WHERE val<20 FOR UPDATE SKIP LOCKED;
```
then it works as expected: `sel_set_rec_lock()` in `row_search_mvcc()` returns `DB_SKIP_LOCKED`, and we InnoDB skips over the locked row, and returns the one which isn't locked:
```
+----+-----+
| id | val |
+----+-----+
|  0 |   0 |
+----+-----+
```
So, I guess your scenario is much more complicated.
It will suffice if you demonstrate a scenario in which `SELECT...FOR UPDATE SKIP LOCKED` is waiting for any lock - you don't have to demonstrate a deadlock. That would be already interesting, because SKIP LOCKED should prevent the SELECT from having to wait.
[16 Jul 15:46] Vikram Rathod
I think I'm facing a somewhat similar concern the deadlock appears at random when using SKIP LOCKED 

I have a test script where I was able to reproduce this (as it does not happen all the time I can provide a reproducible step)

https://stackoverflow.com/questions/78746193/random-deadlock-with-skip-update-on-using-ind...
[17 Jul 9:58] MySQL Verification Team
Hi Mr. Rathold,

Thank you for your report.

However, let us inform you kindly that this is a forum for the reports with fully reproducible test cases.

Each of these test cases should consist of the set of SQL statements that always leads to the behaviour that you are reporting.

You have not provided such a test case, yet.

Can't repeat.