Bug #117237 Unexpected Duplicate Error in DDL Row Log Apply
Submitted: 19 Jan 4:13 Modified: 21 Jan 11:41
Reporter: Yichang SONG (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0.40, 8.4.3, 9.1.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[19 Jan 4:13] Yichang SONG
Description:
I am reporting a potential bug in the MySQL implementation related to the handling of inconsistencies between leaf and non-leaf nodes during the online DDL process for creating a unique key (UK). Specifically, during the row log apply phase, the inconsistencies between the leaf and non-leaf nodes are not being handled precisely. This leads to situations where insertions that should not be considered duplicates result in unexpected duplicate errors.

The issue arises when optimistically deleting the first record on a leaf node. In such cases, the upper-level nodes are not updated accordingly. As a result, the first record on the leaf node may not be consistent with the records in the upper-level nodes.

During the duplicate check for insertions in the clustered index (as seen in the function row_ins_duplicate_error_in_clust()), even if the query finds that cursor->low_match >= n_unique, the system will not consider it a duplicate if page_rec_is_infimum(rec) is true. I believe the reason for this is that, in such cases, the leaf node actually does not contain a duplicate record, but the record values in the upper-level node do not guarantee consistency with the state of the leaf node. Consequently, the low_match value does not accurately reflect the state of the leaf node.

Furthermore, the row log apply process does not handle this situation precisely (refer to row0log.cc:3321). This can lead to a scenario where the states of the leaf and non-leaf nodes are inconsistent, and an insertion that should not be considered a duplicate may unexpectedly result in a duplicate error.

How to repeat:
I have developed a test case to reproduce this bug. Kindly review the attached file below.

Suggested fix:
Review the logic of determining whether duplicate or not.

row0log.cc:3318
```
      case ROW_OP_INSERT:
        if (dict_index_is_unique(index) &&
            (cursor.up_match >= dict_index_get_n_unique(index) ||
             cursor.low_match >= dict_index_get_n_unique(index)) &&
            (!index->n_nullable || !dtuple_contains_null(entry))) {
        duplicate:
```

We can see row0log.cc:3216
```
  if (cursor.low_match >= dict_index_get_n_unique(index) &&
      !page_rec_is_infimum(btr_cur_get_rec(&cursor))) {
```
This code logic is fine.
[19 Jan 4:16] Yichang SONG
testcase

Attachment: bug117237.test (application/octet-stream, text), 1.06 KiB.

[20 Jan 6:59] MySQL Verification Team
Hello Yichang Song,

Thank you for the report and test case.

regards,
Umesh
[21 Jan 5:46] Yichang SONG
I reproduced another scenario related to up_match>=uk_num and rec_get_next == supremum.

See the following testcase and bugfix.
[21 Jan 5:46] Yichang SONG
testcase and bugfix

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug117237.diff (application/octet-stream, text), 4.83 KiB.

[21 Jan 6:18] MySQL Verification Team
Thank you for your Contribution.

regards,
Umesh
[21 Jan 9:16] Yichang SONG
As the reporter of both Bug #116502 and Bug #117237, I would like to highlight a critical connection between these two issues.

Both bugs stem from edge cases in uniqueness validation logic during concurrent DDL operations, specifically when:

NULL values are involved in unique keys (Bug #116502).

B+Tree structural inconsistencies arise between leaf and non-leaf nodes during online DDL (Bug #117237).

For Bug #116502, the original fix included checks for NULL values (dtuple_contains_null(entry)) to avoid false uniqueness violations. However, further analysis from Bug #117237 reveals that infimum/supremum pseudo-records in B+Tree pages can also lead to incorrect up_match/low_match values.

In my opinion, a common algorithm to determine whether the upcoming insert violates duplicate check in an online DDL b-tree is:

  if (((btr_cur->up_match >= dict_index_get_n_unique(index) &&
        !page_rec_is_supremum(page_rec_get_next(btr_cur_get_rec(btr_cur)))) ||
       (btr_cur->low_match >= dict_index_get_n_unique(index) &&
        !page_rec_is_infimum(btr_cur_get_rec(btr_cur)))) &&
      (!index->n_nullable || !dtuple_contains_null(entry))) {
  DUPLICATE:
    error = DB_DUPLICATE_KEY;
  }

P.S.
In my stress testing environment, creating a new unique key index frequently fails with duplicate key errors. After thorough analysis, I identified and reported Bug #117237 as the root cause. During testing, I also observed that scenarios related to Bug #116502 are also affected by infimum/supremum (inf/sup) pseudo-records in B+Tree structures, leading to DDL failures. However, due to the extreme complexity of Bug #116502's reproduction scenario, I was unable to construct a stable test case to reliably reproduce it. In summary, I urge the team to carefully evaluate the common underlying issues shared by Bug #116502 and Bug #117237.
[21 Jan 11:41] Yichang SONG
Update:
The current supremum-related condition in the patch
```
(btr_cur->up_match >= dict_index_get_n_unique(index) &&
        !page_rec_is_supremum(page_rec_get_next(btr_cur_get_rec(btr_cur))))
```
is inappropriate.
The standard duplicate check in row_ins_scan_sec_index_for_duplicate() is correct, but it is heavy. Although the unexpected duplicate error that reaches the supremum corner case still exists, we may have to tolerate it.