Bug #104138 mtr index lock release fail
Submitted: 29 Jun 2021 3:21 Modified: 2 Aug 2021 12:21
Reporter: jinpeng shi Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[29 Jun 2021 3:21] jinpeng shi
Description:
If the B+ tree split condition is met, its mtr will hold the SX lock corresponding to the index dict_index_t, and then call btr_page_split_and_insert. This function is to actually perform the entire process of B+ tree splitting. This process is divided into 8 steps in total, and the modification is performed in step 4. B+ tree structure.
After performing step 4, btr0btr.cc: 2504 will call mtr->memo_release(dict_index_get_lock(cursor->index),MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK); to release the SX lock of the index (do not release the page lock of the tree node), because The index modification has been completed, and the release should be at the smallest granularity at this time. The modification of the pages in the next few steps is done by holding the page lock.
The problem is here in btr0btr.cc:2504. The type of lock passed in is MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK. It can be understood that the encoder's intention is that if the MTR holds the X lock or SX lock of this index at this time, it will be released. But the actual lock type passed to memo_release must be a single type, such as MTR_MEMO_X_LOCK (corresponding to 128), MTR_MEMO_SX_LOCK (corresponding to 256), and MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK corresponds to 384, which leads to the actual execution of the Find class of MTR (see code segment B below for details And C), this index lock was not found at all, causing the lock release failure.

Releasing the index lock did not succeed, but it will not cause any exceptions. Only the index SX lock is held for a longer time and can be released when its MTR commit.The SX lock of the index is not mutually exclusive with the S lock of the query thread. In theory, releasing the index lock in advance to reduce the holding time of rw_lock will not improve the query performance.

How to repeat:
Definitely reproducible.
[29 Jun 2021 3:26] jinpeng shi
code segment B:

/** Release an object in the memo stack. */
void mtr_t::memo_release(const void *object, ulint type) {
  ut_ad(m_impl.m_magic_n == MTR_MAGIC_N);
  ut_ad(is_active());

  /* We cannot release a page that has been written to in the
  middle of a mini-transaction. */
  ut_ad(!m_impl.m_modifications || type != MTR_MEMO_PAGE_X_FIX);

  Find find(object, type);
  Iterate<Find> iterator(find);

  if (!m_impl.m_memo.for_each_block_in_reverse(iterator)) {
    memo_slot_release(find.m_slot);
  }
}

code segment C:
/** Find specific object */
struct Find {
  /** Constructor */
  Find(const void *object, ulint type)
      : m_slot(), m_type(type), m_object(object) {
    ut_a(object != NULL);
  }

  /** @return false if the object was found. */
  bool operator()(mtr_memo_slot_t *slot) {
    if (m_object == slot->object && m_type == slot->type) {
      m_slot = slot;
      return (false);
    }

    return (true);
  }

  /** Slot if found */
  mtr_memo_slot_t *m_slot;

  /** Type of the object to look for */
  ulint m_type;

  /** The object instance to look for */
  const void *m_object;
[30 Jun 2021 11:56] MySQL Verification Team
Hi Mr. shi,

Thank you for your bug report.

Sincerely, we find that your report is a very interesting one. However, you have not proved your point.

First of all, the code excerpt does not contain any usage of either MTR_MEMO_X_LOCK or MTR_MEMO_SX_LOCK, whether as OR'ed or separate lock types. Hence, you have put out a nice thesis, without any proof.

Next, we would require that your code excerpts are from our 8.0.25 release or from the latest GitHub pull.

But, most of all, we can not verify this report without a fully repeatable test case. A test case that will show unequivocally prove the lock release failure.

Hence, we are waiting your full feedback.
[1 Jul 2021 1:39] jinpeng shi
Hello MySQL Verification Team:

I think you need to enter debug mode to find the problem.

Please use the following steps to test the recurrence:
(1) create a breakpoint on ./storage/innobase/btr/btr0btr.cc:2504.(mysql-8.0.25)

(2)CREATE DATABASE sbtest;

(3)Use sysbench to perpare a table:time sysbench ./sysbench/oltp_common.lua --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=0 --mysql-db=sbtest --db-driver=mysql --tables=1 --table_size=10 --report-interval=10 --threads=8 --time=120 prepare

(4)Use sysbench to insert a large amount of data into the table, which will trigger the split of the B+ tree:time sysbench ./sysbench/oltp_insert.lua --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=0 --mysql-db=sbtest --db-driver=mysql --tables=1 --table_size=10 --report-interval=10 --threads=8 --time=180 run

(5)This will enter the debug breakpoint.

Enter the breakpoint, and you can find by tracking and debugging that mtr_t::memo_release uses the Iterate<Find> class to find the index lock that meets MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK, but it will never be found. Therefore, memo_slot_release is not called, so the index lock is not released during the entire process. mtr_t::memo_release does not have any return value and assertion judgment, so it cannot be found without using the debug mode.
[1 Jul 2021 11:20] MySQL Verification Team
Hi,

Can you please reply to all our comments in our previous e-mail.

We need a test case that will result in the error. A multi-threaded test case that will show that lock release has failed, although it should not have.

A test case which is proven by debugging is not an acceptable test case.

However, you can prove your point with the SQL test case, which would also contain the output from the SHOW ENGINE INNODB STATUS which proves the point. You have several options for InnoDB SE status output which could help us further process this report.

In short, the SQL test case could prove your point with the unnecessary lock being held or with unnecessary deadlock.

So far, your point is not proven.
[2 Aug 2021 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".