Bug #104431 Incorrect data locks listed for INSERT
Submitted: 27 Jul 2021 22:13 Modified: 29 Jul 2021 13:25
Reporter: Daniel Nichter (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:8.0.26 OS:Any
Assigned to: CPU Architecture:Any

[27 Jul 2021 22:13] Daniel Nichter
Description:
In the following case, the INSERT holds a X,REC_NOT_GAP lock, but:

1) The lock is not shown at first
2) The lock is shown incorrectly when another statement blocks on it

For 1, neither performance_schema.data_locks nor SHOW ENGINE INNODB STATUS show the lock; they both incorrectly report zero row locks.

For 2, performance_schema.data_locks and performance_schema.data_lock_waits show incorrect (and impossible) locks; but SHOW ENGINE INNODB STATUS is correct.

How to repeat:
CREATE TABLE `lock_list_bug` (
  `id` int NOT NULL,
  `c` char(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO lock_list_bug VALUES (2, ''), (5, '');

--
-- 1st trx, THD id 58
--
BEGIN;
insert into lock_list_bug values (4, 'c');
-- do not commit

--
-- Incorrect lock list
--
select ENGINE_LOCK_ID, ENGINE_TRANSACTION_ID, THREAD_ID, EVENT_ID, OBJECT_NAME, INDEX_NAME, OBJECT_INSTANCE_BEGIN, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, LOCK_DATA from performance_schema.data_locks where object_name='lock_list_bug' ORDER BY THREAD_ID\G

*************************** 1. row ***************************
       ENGINE_LOCK_ID: 140322826896056:1151:140323342862704
ENGINE_TRANSACTION_ID: 1614689
            THREAD_ID: 58
             EVENT_ID: 78
          OBJECT_NAME: lock_list_bug
           INDEX_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140323342862704
            LOCK_TYPE: TABLE
            LOCK_MODE: IX
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL

---TRANSACTION 1614689, ACTIVE 68 sec
1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 1
MySQL thread id 14, OS thread handle 123145460137984, query id 243 localhost root
TABLE LOCK table `test`.`lock_list_bug` trx id 1614689 lock mode IX

--
-- 2nd trx, THD id 62
--

-- Block on record lock pk id=4
update lock_list_bug set c='2' where id=4;

--
-- Incorrect lock list
--
select ENGINE_LOCK_ID, ENGINE_TRANSACTION_ID, THREAD_ID, EVENT_ID, OBJECT_NAME, INDEX_NAME, OBJECT_INSTANCE_BEGIN, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, LOCK_DATA from performance_schema.data_locks where object_name='lock_list_bug' ORDER BY THREAD_ID\G
*************************** 1. row ***************************
       ENGINE_LOCK_ID: 140322826896056:1151:140323342862704
ENGINE_TRANSACTION_ID: 1614689
            THREAD_ID: 58
             EVENT_ID: 78
          OBJECT_NAME: lock_list_bug
           INDEX_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140323342862704
            LOCK_TYPE: TABLE
            LOCK_MODE: IX
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL
*************************** 2. row ***************************
       ENGINE_LOCK_ID: 140322826896848:1151:140323342864688
ENGINE_TRANSACTION_ID: 1614690
            THREAD_ID: 62
             EVENT_ID: 44
          OBJECT_NAME: lock_list_bug
           INDEX_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140323342864688
            LOCK_TYPE: TABLE
            LOCK_MODE: IX
          LOCK_STATUS: GRANTED
            LOCK_DATA: NULL
*************************** 3. row ***************************
       ENGINE_LOCK_ID: 140322826896848:92:4:6:140323301023776
ENGINE_TRANSACTION_ID: 1614690
            THREAD_ID: 62
             EVENT_ID: 44
          OBJECT_NAME: lock_list_bug
           INDEX_NAME: PRIMARY
OBJECT_INSTANCE_BEGIN: 140323301023776
            LOCK_TYPE: RECORD
            LOCK_MODE: X,REC_NOT_GAP
          LOCK_STATUS: WAITING
            LOCK_DATA: 4
*************************** 4. row ***************************
       ENGINE_LOCK_ID: 140322826896056:92:4:6:140323301019168
ENGINE_TRANSACTION_ID: 1614689
            THREAD_ID: 62
             EVENT_ID: 44
          OBJECT_NAME: lock_list_bug
           INDEX_NAME: PRIMARY
OBJECT_INSTANCE_BEGIN: 140323301019168
            LOCK_TYPE: RECORD
            LOCK_MODE: X,REC_NOT_GAP
          LOCK_STATUS: GRANTED
            LOCK_DATA: 4

--
-- Impossible lock wait
--

select * from performance_schema.data_lock_waits\G
*************************** 1. row ***************************
                          ENGINE: INNODB
       REQUESTING_ENGINE_LOCK_ID: 140322826896848:92:4:6:140323301023776
REQUESTING_ENGINE_TRANSACTION_ID: 1614690
            REQUESTING_THREAD_ID: 62
             REQUESTING_EVENT_ID: 44
REQUESTING_OBJECT_INSTANCE_BEGIN: 140323301023776
         BLOCKING_ENGINE_LOCK_ID: 140322826896056:92:4:6:140323301019168
  BLOCKING_ENGINE_TRANSACTION_ID: 1614689
              BLOCKING_THREAD_ID: 62
               BLOCKING_EVENT_ID: 44
  BLOCKING_OBJECT_INSTANCE_BEGIN: 140323301019168
1 row in set (0.00 sec)

--
-- But ENGINE STATUS correct
--
---TRANSACTION 1614690, ACTIVE 29 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
MySQL thread id 18, OS thread handle 123145459531776, query id 246 localhost root updating
update lock_list_bug set c='2' where id=4
------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 92 page no 4 n bits 80 index PRIMARY of table `test`.`lock_list_bug` trx id 1614690 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 128
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 00000018a361; asc      a;;
 2: len 7; hex 810000008f0110; asc        ;;
 3: len 1; hex 63; asc c;;

------------------
TABLE LOCK table `test`.`lock_list_bug` trx id 1614690 lock mode IX
RECORD LOCKS space id 92 page no 4 n bits 80 index PRIMARY of table `test`.`lock_list_bug` trx id 1614690 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 128
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 00000018a361; asc      a;;
 2: len 7; hex 810000008f0110; asc        ;;
 3: len 1; hex 63; asc c;;

---TRANSACTION 1614689, ACTIVE 132 sec
2 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MySQL thread id 14, OS thread handle 123145460137984, query id 243 localhost root
TABLE LOCK table `test`.`lock_list_bug` trx id 1614689 lock mode IX
RECORD LOCKS space id 92 page no 4 n bits 80 index PRIMARY of table `test`.`lock_list_bug` trx id 1614689 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 128
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 00000018a361; asc      a;;
 2: len 7; hex 810000008f0110; asc        ;;
 3: len 1; hex 63; asc c;;

In both data_locks and data_lock_waits, the output reports that THD id 62 is both granted and waiting on the same record lock (pk id=4), which is impossible. In reality, THD id 58 holds the record lock, and THD id 62 is waiting.

EVENT_ID is also wrong, but ENGINE_TRANSACTION_ID is correct.

//

A related issue happens when statements are executed in a slightly different order:

1. trx 1: begin; update lock_list_bug set c='' where id between 2 and 5; /* lock the gap */
2. trx 2: begin; insert into lock_list_bug values (4, '') /* blocks */
3. trx 1: commit

Then data_locks shows "X,GAP,INSERT_INTENTION" for the correct THD id (trx 2), but I think that should be "X,REC_NOT_GAP" because the INSERT statement is done, and now it's holding a record lock on pk id=4.
[28 Jul 2021 12:08] MySQL Verification Team
Hi Mr. Nichter,

Thank you for your bug report.

However, what you are reporting is considered the expected behaviour.

First of all, it takes some time for lock upgrade to be triggered. This depends on the other concurrent transactions. 

Next, what you have shown to us is that both P_S and InnoDB Status show the same X lock. If you are trying to report some inconsistence between P_S and InnoDB Status it is not clearly visible in your test case. It is possible that there is a small flaw in P_S, but it could be due to the timing of how are P_S modules designed, thus dependent a lot on the timing. That would also make the expected behaviour.

What you could do is run InnoDB status  and then P_S query immediately thereafter and analyse the differences.
[28 Jul 2021 12:26] Daniel Nichter
Hello.

Let me see if I can clarify the bug here.

This is on a test instance with no other queries running, so there are no issues of contention, timing, etc.

1.
First trx: "insert into lock_list_bug values (4, 'c');" will hold a record lock on pk id=4 (the inserted row) after it finishes executing.

2.
If 1 is correct, then the 1st bug is that neither P_S nor InnoDB status show that this active trx holds this lock. The correct P_S output would list "X,REC_NOT_GAP" on pk id=4. But my example "-- Incorrect lock list" shows that this is not listed; only the table (IX) lock is listed.

3.
But we know that record lock exists because a 2nd trx cannot modify the record.

4.
But upon trying 3 (trying to modify the record held by the first trx), we get the 2nd bug shown in "-- Impossible lock wait": P_S shows the same THD both holding (granted) and waiting for the record lock on pk=4. This is wrong because it's really the first trx that holds the lock, but this is incorrectly reported as the 2nd trx holding the lock.
[28 Jul 2021 12:40] MySQL Verification Team
Hi Mr. Nichter,

Thank you for clearing up the issue.

Your reasoning is simply incorrect.

INSERT does not set X REC_NOT_GAP locks, but it sets gap and insert intention locks. Next, since InnoDB is a MVCC engine, the concurrent transaction can not see that row, before first transactions commits.

UPDATE is the DML that sets the lock that you are looking for, so everything is working correctly.

Not a bug.
[28 Jul 2021 14:24] Daniel Nichter
I might be wrong, but I don't think that I have been proven wrong yet because no replies to this issue have included any specific details proving me wrong. By contrast, the details I have provide clearly demonstrate a bug.

Let me re-clarify.

1.
Yes, I'm well aware that INSERT obtains an insert intention lock, which is a special type of gap lock. But I was careful to write in my previous reply: "will hold a record lock on pk id=4 (the inserted row) **after it finishes executing**." (emphasis mine)

2.
Re https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html :

"each lock the gap between 4 and 7 with insert intention locks **prior to obtaining the exclusive lock on the inserted row**"
(emphasis mine)

"The following example demonstrates a transaction taking an insert intention lock **prior to obtaining an exclusive lock on the inserted record.**"
(emphasis mine)

Based on my understanding of the manual, that implies that an INSERT holds and record lock on the insert row **after** inserting/after the insert intention lock.

3.
Point 2 is further evidenced by the fact that InnoDB status shows exactly that:

---TRANSACTION 1614689, ACTIVE 132 sec
2 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MySQL thread id 14, OS thread handle 123145460137984, query id 243 localhost root
TABLE LOCK table `test`.`lock_list_bug` trx id 1614689 lock mode IX
RECORD LOCKS space id 92 page no 4 n bits 80 index PRIMARY of table `test`.`lock_list_bug` trx id 1614689 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 128
 0: len 4; hex 80000004; asc     ;;

Here are 3 proofs from that output:

3.1. "lock_mode X locks rec but not gap" = X,REC_NOT_GAP as I have been saying
3.2. "hex 80000004;" is the insert pk id=4
3.3. "1614689" is the correct BLOCKING_ENGINE_TRANSACTION_ID from data_lock_waits:

select * from performance_schema.data_lock_waits\G
*************************** 1. row ***************************
                          ENGINE: INNODB
       REQUESTING_ENGINE_LOCK_ID: 140322826896848:92:4:6:140323301023776
REQUESTING_ENGINE_TRANSACTION_ID: 1614690
            REQUESTING_THREAD_ID: 62
             REQUESTING_EVENT_ID: 44
REQUESTING_OBJECT_INSTANCE_BEGIN: 140323301023776
         BLOCKING_ENGINE_LOCK_ID: 140322826896056:92:4:6:140323301019168
  BLOCKING_ENGINE_TRANSACTION_ID: 1614689 <-- ** HERE ** --------------------
              BLOCKING_THREAD_ID: 62
               BLOCKING_EVENT_ID: 44
  BLOCKING_OBJECT_INSTANCE_BEGIN: 140323301019168

4.
But the bug is that neither InnoDB status nor P_S show that record lock **until** a 2nd trx/query attempts to lock the record. Only then does the lock conflict cause MySQL to show the record lock held the first trx/query.

5.
Even if, despite 3.1, 3.2, and 3.3, I'm wrong about the INSERT statement holding a record lock on the insert row after insertion, I cannot be wrong about the data_lock_waits immediately above: look at REQUESTING_THREAD_ID/REQUESTING_EVENT_ID and BLOCKING_THREAD_ID/BLOCKING_EVENT_ID: they're the same.
If that's accurate (which it is not), it would mean the 2nd trx/query both holds a rec lock *and* is waiting on the record lock that it holds.

6.
Re "Next, since InnoDB is a MVCC engine, the concurrent transaction can not see that row, before first transactions commits."

Yes, but this is irrelevant; where's why. By "see" I presume you mean SELECT (correct me if I'm wrong). In that case, trx see rows/records based on the trx isolation level. Since I didn't state the trx isolation level, I presume we're all thinking the default: REPEATABLE READ. In which case, a SELECT (without FOR SHARE/UPDATE) would be non-locking/non-blocking thanks to its consistent snapshot. Therefore, irrelevant.

If the trx isolation level was READ COMMITTED, there'd be no consistent snapshot, nor would it see the uncommitted INSERT/trx. Therefore, irrelvant.

And I think we can skip SERIALIZABLE and READ UNCOMMITTED.

But writes always access the current, committed record--if they can lock it. So yes, the UPDATE attempts to lock record pkd id=4, which is correct. In fact, this bug relies on that fact: only when the 2nd trx attempts to lock that record does MySQL begin to show the record lock held by the initial INSERT (point 4); and, moreover, it shows it incorrectly point 5).

--

If I'm wrong, it would be most helpful and clear to cite the data provided in the reproducible test case: thd ID, trx ID, event ID, etc.

If this doesn't prove the bug, then I can prove it another way, but hopefully this is sufficient.
[29 Jul 2021 8:14] Jakub Lopuszanski
Hello,
Let me explain in detail what you are observing and why it is not a bug, but simply how the "implicit locking" and "implicit to explicit" conversion are implemented by InnoDB.
I've tried to touch on implicit locking in 
https://mysqlserverteam.com/innodb-data-locking-part-2-locks/ and
https://mysqlserverteam.com/innodb-data-locking-part-2-5-locks-deeper-dive/
but when I CTRL+F for "implicit" in them, I see, that I could've done a better job of explaining how and why it works in cases like yours (which are actually the most common).

When a transaction inserts a row, InnoDB must ensure that no other transaction will see this new row, until the inserting transaction COMMITs (after all, it could change its mind, and ROLLBACK, or get forced to rollback for some other reason, like a deadlock, and then the new row will never really become a part of the database).
While creating an explicit lock object (a thing you can inspect in the performance_schema.data_locks) is one way to achieve it, it is hardly the only way to prevent others from seeing the new row.
The other mechanism InnoDB uses is implicit locking. An "implicit lock" is not an object in memory, and thus can't be inspected (at least not easily) which is why it doesn't show up in performance_schema.data_locks, nor in the SHOW ENGINE INNODB STATUS output - both of these inspection mechanisms simply iterate over explicit lock objects found in the InnoDB's lock system, and thus they'll not find implicit locks. An "implicit lock", as the name suggests, is something to be deduced, inferred, logically implied by something else.

 
The trick is actually quite simple: as we've said in the beginning - "When a transaction inserts a row, InnoDB must ensure that no other transaction will see this new row".
If you treat this not as a goal, but rather as an axiom, then you can deduce a transaction holds a lock on a row, if you can prove two facts:
1. the row was created by transaction
2. the transaction has not committed yet

Both are easy to check actually:
1. each row in InnoDB has a header, in which there is a TRX_ID field, which contains the ID of the transaction which was the last one to create or modify it (which includes: delete-marking and delete-unmarking, too). In other words, it's easy to see if the row is "authored by TRX_ID".
2. InnoDB knows which transactions are already committed (or rolled back), and which are still active

The main benefit of implicit locks is that .. they're not explicit, so they don't consume memory, and do not involve synchronization costs when "acquiring" them. 

However, both operations, while conceptually easy, have their costs:
1. requires you to already have access to the row in question, so that you can read its bytes (in particular the TRX_ID field), which means not only that you have to have the record's page in the buffer pool, not only have to latch the page, but more importantly: you have to know which record to look at! This last reason is why we don't report implicit locks - in a vast database with billions of records, there's no way to know which of them we should inspect for potential implicit locks. We'd have to have some kind of "hint list" to speed it up, but once you think it through, you'll realise that your "hint list" becomes indistinguishable from "explicit locks", so you'd destroy the whole benefit of not having to store the locks explicitly in memory
2. requires some non-trivial coordination to avoid race conditions (InnoDB needs to ensure that the answer to the question "is TRX_ID still active?" remains valid for the duration of acting upon the answer, without slowing down anyone too much)

This means, that while "acquisition" of implicit lock is for free, checking for them is still somewhat costly, so one can think of them as "optimistic" - if you ain't gonna need them, then you don't pay the cost. And when you need them, it's because you are already going to perform a costly explicit lock acquisition, so you can hide in it the cost of checking for implicit lock.

As explained in the blog series, algorithms such as deadlock detection, and checking for conflict, operate on explicit locks. 
Therefore, whenever a transaction is interested in locking a given row, first thing it has to do, is to check if another transaction already "holds implicit lock" on it, and if it does, then it converts it to an equivalent explicit lock (with LOCK_X|LOCK_REC|LOCK_REC_NOT_GAP flags).
Note that this happens from the thread which is interested in locking this record again - not from the thread which held the implicit lock.
In other words, the transaction which comes later is creating an explicit lock on behalf of the earlier transaction which had the implicit lock on it.
This is why the column THREAD_ID of performance_schema.data_locks shows the thread which was performing the implicit-to-explicit conversion,
and why the ENGINE_TRANSACTION_ID clearly shows the real owner of the explicit lock.
BTW, this part is documented in https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.0/en/performance-schema-data-locks-ta... where you can read:

> ENGINE_TRANSACTION_ID
> The storage engine internal ID of the transaction that requested the lock. 
> This can be considered the owner of the lock, although the lock might still be pending, not actually granted yet  (LOCK_STATUS='WAITING').
> If the transaction has not yet performed any write operation (is still considered read only), the column contains internal data that users should not try to interpret. 
> Otherwise, the column is the transaction ID.
> For InnoDB, to obtain details about the transaction, join this column with the TRX_ID column of the INFORMATION_SCHEMA INNODB_TRX table. 

> THREAD_ID
> The thread ID of the session that created the lock. To obtain details about the thread, join this column with the THREAD_ID column of the Performance Schema threads table.
> THREAD_ID can be used together with EVENT_ID to determine the event during which the lock data structure was created in memory. (This event might have occurred before this particular lock request occurred, if the data structure is used to store multiple locks.) 

I grant you all, that this is not clearly explaining that the two are different concepts, and why they could differ in practice. But still, the docs doesn't lie, when it says that THREAD_ID has *created* the lock, but the ENGINE_TRANSACTION_ID *requested* the lock.
(And there are even more edge cases, like when the explicit lock object was already created earlier for a different row on the same page, and we are reusing it...)

When analysing lock requests, it's better to look at ENGINE_TRANSACTION_ID column only.
(The THREAD_ID column is meant as part of <THREAD_ID,EVENT_ID> pair used to join this with events tables, so that you can reconstruct the timeline and that's it.)

So, to summarize:
- the row IS locked, it's just an implicit lock, not explicit lock
- the reporting facilities only deal with explicit locks, almost by definition
- the implicit locks can get converted to explicit locks when needed, but it's done lazily, from the interested thread
- the ENGINE_TRANSACTION_ID and THREAD_ID serve two different purposes

I hope this clarifies the topic, and convinces you that there is no bug in your database :)
[29 Jul 2021 13:25] Daniel Nichter
Amazing! Thank you, Jakub!

Yes, I'm convinced--proven--that it's not a bug. And I'm happy to be proven wrong because I learned something.

I have a vague memory of InnoDB implicit vs. explicit locks--maybe from your blog posts that I had run across in the past--but a vague memory is not knowledge, so my previous level of understanding wasn't deep enough to make sense of the data.

When the 2nd trx runs into the implicit lock and converts it on behalf of the 1st trx, I wonder if it could do that so that data_locks and data_lock_waits show the new explicit lock as belonging to the 1st trx? (Not just by trx ID, but by thd and event ID, too.) It shows up in InnoDB status this way, but using P_S is far more ideal.

In any case, it makes sense now, once one understands the internal details that you shared here and blogged about. Thanks again!
[29 Jul 2021 13:49] Jakub Lopuszanski
Thank you Daniel for kind words.

As for the events infrastructure - I don't know this area, so I don't know.
One way to interpret your proposal is to somehow "patch the past", which I think is not the best idea if we want to report events timeline. IMHO it would be better to have an entry like "now, trx 17 is creating a lock on behalf of trx 15", rather than "some time ago, trx 15 got implicit lock" and try to somehow inject this into the correct place in the timeline. But, this is just my opinion. At any rate, this would be a separate feature request not directly related to current issue.

Unless you really want to join performance_schema.data_lock with some of the event tables, I advice you to ignore THREAD_ID,EVENT_ID columns.
And if you really want to join them, be aware that this will give you very limited piece of info: who and when has created the `struct lock_t` instance in which the lock resides.
And this is not as useful as you may think not only because of implicit-to-explicit conversion, but more importantly, because a single such struct is reused for many locks on the same page of the same transaction - the struct has a bitmap, as large as the number of records on the page, and we set the bits for which the lock is held. Even though we then report these "locks" as separate entries in the performance_schema.data_locks, it's just an illusion, which you can discover by looking at the OBJECT_INSTANCE_BEGIN column, which will be equal for all of them, meaning they all come from the same object instance (for which you'll find just a single creation event).