Bug #113862 problem with btree latch acquisition when updating histogram
Submitted: 2 Feb 2024 6:22 Modified: 2 Feb 2024 12:11
Reporter: Boy Zhang Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[2 Feb 2024 6:22] Boy Zhang
Description:
During the histogram sampling, the index S latch will be acquired before traversing the non-leaf page, and will not be released until a range is scanned. If a restore cursor occurs during this process, the index S latch will be requested again, but the S latch cannot Reentrancy, there is something wrong with this locking logic.

static inline void rw_lock_s_lock_func(rw_lock_t *lock, ulint pass,
                                       ut::Location location) {
  /* NOTE: As we do not know the thread ids for threads which have
  s-locked a latch, and s-lockers will be served only after waiting
  x-lock requests have been fulfilled, then if this thread already
  owns an s-lock here, it may end up in a deadlock with another thread
  which requests an x-lock here. Therefore, we will forbid recursive
  s-locking of a latch: the following assert will warn the programmer
  of the possibility of this kind of a deadlock. If we want to implement
  safe recursive s-locking, we should keep in a list the thread ids of
  the threads which have s-locked a latch. This would use some CPU
  time. */

  ut_ad(!rw_lock_own(lock, RW_LOCK_S)); /* see NOTE above */
  ut_ad(!rw_lock_own(lock, RW_LOCK_X));
  ...

How to repeat:
None, logic issue.

Suggested fix:
Release the lock before restoring, then reacquire the index S latch, and use (BTR_SEARCH_TREE | BTR_ALREADY_S_LATCHED) for latch mode.
[2 Feb 2024 12:11] MySQL Verification Team
Hi Mr. Zhang,

Thank you for your bug report.

We do accept bug reports based on the code analysis. However, in that case we require a FULL and very detailed code analysis, with references to flow of code and description on when the latch is acquired or released and when it should be acquired or released.

You also did not specify the version. You also did not specify where in the code is that exactly.

A full test case would be preferable. If there is an error in the code it would have it's repercussions, like waiting on the latch and timing out.

Hence, we can not continue without such a test case.

Can't repeat.