Description:
In the current implementation of row_ins_sec_index_entry_low(), for unique secondary indexes with duplicates not allowed, the function performs 3 separate calls to btr_cur_search_to_nth_level():
1. The first call uses PAGE_CUR_LE with entry->n_fields_cmp = entry->n_fields to search for potential duplicates.
2. If a possible duplicate is found (cursor.low_match or cursor.up_match >= index->n_uniq), the function commits the mini-transaction (mtr_commit()).
3. A second call uses PAGE_CUR_GE with entry->n_fields_cmp = index->n_uniq to locate the first entry equal to the key (if any). During this search:
a. For each record equal to the entry (based on index->n_uniq), an S next-key lock is acquired, regardless of whether the record is delete-marked or not.
b. If a non-delete-marked record is found that matches entry, a duplicate error is returned.
c. If the matching record is delete-marked, the function proceeds to the next record, locking each one in turn, until it finds either:
c.1. Another non-delete-marked equal record (return duplicate error), or
c.2 A record with a different key, in which case an S gap lock is acquired on that record.
4. A third call is made with PAGE_CUR_LE and entry->n_fields_cmp = entry->n_fields to find the correct insertion position.
I understand the rationale behind the current implementation:
The first call checks whether a duplicate might exist. If no potential duplicate is found, the PAGE_CUR_LE search directly yields the correct insertion position, and the entry can be inserted immediately.
However, if a possible duplicate is detected, a second btr_cur_search_to_nth_level() call is made to iterate over all records equal to the entry and acquire locks on them. This continues until either a non-delete-marked record is found (indicating a duplicate error), or a different key is reached (meaning no real duplicate exists).
In the latter case, a third call to btr_cur_search_to_nth_level() is then issued to locate the proper insert position and perform the actual insert.
But it involves two full B+Tree traversals in the case of a possible duplicate.
Proposed Optimization:
In certain scenarios, the second B+Tree search can be skipped.
If the first search identifies a potential duplicate (cursor.low_match/up_match >= index->n_uniq), and the located record is equal to the key and not delete-marked, then the function can directly return a duplicate error after acquiring an S next-key lock on that record. There is no need for the second search, as its sole purpose is to:
a. Lock that same non-delete-marked record
b. Return a duplicate error
This optimization eliminates the second call to btr_cur_search_to_nth_level() in such cases.
Correctness Consideration:
Unlike the original logic, the proposed optimization does not lock all prior delete-marked records equal to entry. However, this appears safe:
a. For concurrent INSERTs, the S next-key lock on the non-delete-marked record will still block other inserts with the same key.
b. For concurrent UPDATEs/DELETEs, those threads will attempt to acquire X locks on the same record and will be blocked accordingly.
Therefore, holding locks on the prior delete-marked versions is unnecessary for correctness in this case.
Benefit:
By skipping one full B+Tree traversal from root to leaf, this optimization reduces contention on the index tree and improves insert performance in high-concurrency environments where duplicate keys are frequent.
How to repeat:
We can construct an extreme example to demonstrate the described behavior and highlight the opportunity for optimization. The idea is to simulate a situation where purge is delayed, causing the secondary index to retain a large number of delete-marked entries.
Step 1: Create a Table
CREATE TABLE tbl (
col_a INT NOT NULL,
col_b INT DEFAULT NULL,
PRIMARY KEY (col_a),
UNIQUE KEY col_b (col_b)
) ENGINE=InnoDB;
Step 2: Disable Undo Purge by adding a break logic in srv_do_purge()
Step 3: Generate Delete-Marked Records in Secondary Index (Session A)
In Session A, execute the following loop to insert and immediately delete many rows with the same col_b value, creating thousands of delete-marked entries in the unique secondary index on col_b:
-- Pseudocode for session A:
BEGIN;
INSERT INTO tbl VALUES (1, 10);
DELETE FROM tbl WHERE col_a = 1;
INSERT INTO tbl VALUES (2, 10);
DELETE FROM tbl WHERE col_a = 2;
...
INSERT INTO tbl VALUES (9999, 10);
DELETE FROM tbl WHERE col_a = 9999;
-- Final insert that remains committed
INSERT INTO tbl VALUES (10000, 10);
COMMIT;
At this point, the secondary index on col_b contains thousands of delete-marked entries with value 10, and one valid record.
Step 4: Trigger the Duplicate Detection Path (Session B)
In Session B, start a new transaction and attempt to insert another row with the same col_b value:
BEGIN;
INSERT INTO tbl VALUES (10001, 10);
This will trigger the full duplicate-checking logic inside row_ins_sec_index_entry_low(). Because the index contains many delete-marked records, the function will:
a. Perform a second call to btr_cur_search_to_nth_level() using PAGE_CUR_GE.
b. Traverse all delete-marked records with col_b = 10, acquiring an S next-key lock on each one, even though they are logically deleted.
Upon reaching the non-delete-marked record with col_b = 10, it will raise a duplicate entry error.
As a result, Session B ends up holding locks on all delete-marked records, which is unnecessary for correctness and negatively affects concurrency.
Suggested fix:
The proposed fix has been detailed in the description section above.