MySQL Bugs: #118054: Potential optimization in row_ins_sec_index_entry_low(): avoid second btr_cur_search_to_nth

Bug #118054	Potential optimization in row_ins_sec_index_entry_low(): avoid second btr_cur_search_to_nth_level() under certain condit
Submitted:	24 Apr 16:23	Modified:	6 May 16:34
Reporter:	ZHAO SONG	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
In the current implementation of row_ins_sec_index_entry_low(), for unique secondary indexes with duplicates not allowed, the function performs 3 separate calls to btr_cur_search_to_nth_level():

1. The first call uses PAGE_CUR_LE with entry->n_fields_cmp = entry->n_fields to search for potential duplicates.

2. If a possible duplicate is found (cursor.low_match or cursor.up_match >= index->n_uniq), the function commits the mini-transaction (mtr_commit()).

3. A second call uses PAGE_CUR_GE with entry->n_fields_cmp = index->n_uniq to locate the first entry equal to the key (if any). During this search:

  a. For each record equal to the entry (based on index->n_uniq), an S next-key lock is acquired, regardless of whether the record is delete-marked or not.

  b. If a non-delete-marked record is found that matches entry, a duplicate error is returned.

  c. If the matching record is delete-marked, the function proceeds to the next record, locking each one in turn, until it finds either:

    c.1. Another non-delete-marked equal record (return duplicate error), or

    c.2 A record with a different key, in which case an S gap lock is acquired on that record.

4. A third call is made with PAGE_CUR_LE and entry->n_fields_cmp = entry->n_fields to find the correct insertion position.

I understand the rationale behind the current implementation:
The first call checks whether a duplicate might exist. If no potential duplicate is found, the PAGE_CUR_LE search directly yields the correct insertion position, and the entry can be inserted immediately.
However, if a possible duplicate is detected, a second btr_cur_search_to_nth_level() call is made to iterate over all records equal to the entry and acquire locks on them. This continues until either a non-delete-marked record is found (indicating a duplicate error), or a different key is reached (meaning no real duplicate exists).
In the latter case, a third call to btr_cur_search_to_nth_level() is then issued to locate the proper insert position and perform the actual insert.

But it involves two full B+Tree traversals in the case of a possible duplicate.

Proposed Optimization:
In certain scenarios, the second B+Tree search can be skipped.

If the first search identifies a potential duplicate (cursor.low_match/up_match >= index->n_uniq), and the located record is equal to the key and not delete-marked, then the function can directly return a duplicate error after acquiring an S next-key lock on that record. There is no need for the second search, as its sole purpose is to:

  a. Lock that same non-delete-marked record

  b. Return a duplicate error

This optimization eliminates the second call to btr_cur_search_to_nth_level() in such cases.

Correctness Consideration:
Unlike the original logic, the proposed optimization does not lock all prior delete-marked records equal to entry. However, this appears safe:

  a. For concurrent INSERTs, the S next-key lock on the non-delete-marked record will still block other inserts with the same key.

  b. For concurrent UPDATEs/DELETEs, those threads will attempt to acquire X locks on the same record and will be blocked accordingly.

Therefore, holding locks on the prior delete-marked versions is unnecessary for correctness in this case.

Benefit:
By skipping one full B+Tree traversal from root to leaf, this optimization reduces contention on the index tree and improves insert performance in high-concurrency environments where duplicate keys are frequent.

How to repeat:
We can construct an extreme example to demonstrate the described behavior and highlight the opportunity for optimization. The idea is to simulate a situation where purge is delayed, causing the secondary index to retain a large number of delete-marked entries.

Step 1: Create a Table

CREATE TABLE tbl (
  col_a INT NOT NULL,
  col_b INT DEFAULT NULL,
  PRIMARY KEY (col_a),
  UNIQUE KEY col_b (col_b)
) ENGINE=InnoDB;

Step 2: Disable Undo Purge by adding a break logic in srv_do_purge()

Step 3: Generate Delete-Marked Records in Secondary Index (Session A)
In Session A, execute the following loop to insert and immediately delete many rows with the same col_b value, creating thousands of delete-marked entries in the unique secondary index on col_b:

-- Pseudocode for session A:
BEGIN;
INSERT INTO tbl VALUES (1, 10);
DELETE FROM tbl WHERE col_a = 1;
INSERT INTO tbl VALUES (2, 10);
DELETE FROM tbl WHERE col_a = 2;
...
INSERT INTO tbl VALUES (9999, 10);
DELETE FROM tbl WHERE col_a = 9999;

-- Final insert that remains committed
INSERT INTO tbl VALUES (10000, 10);
COMMIT;

At this point, the secondary index on col_b contains thousands of delete-marked entries with value 10, and one valid record.

Step 4: Trigger the Duplicate Detection Path (Session B)
In Session B, start a new transaction and attempt to insert another row with the same col_b value:

BEGIN;
INSERT INTO tbl VALUES (10001, 10);
This will trigger the full duplicate-checking logic inside row_ins_sec_index_entry_low(). Because the index contains many delete-marked records, the function will:

  a. Perform a second call to btr_cur_search_to_nth_level() using PAGE_CUR_GE.

  b. Traverse all delete-marked records with col_b = 10, acquiring an S next-key lock on each one, even though they are logically deleted.

Upon reaching the non-delete-marked record with col_b = 10, it will raise a duplicate entry error.

As a result, Session B ends up holding locks on all delete-marked records, which is unnecessary for correctness and negatively affects concurrency.

Suggested fix:
The proposed fix has been detailed in the description section above.

Thank you for your report.