Bug #113468 | The lock for unmatched record should be released but not in READ-COMMITTED | ||
---|---|---|---|
Submitted: | 20 Dec 2023 4:07 | Modified: | 20 Dec 2023 13:52 |
Reporter: | shengchun cao | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | lock |
[20 Dec 2023 4:07]
shengchun cao
[20 Dec 2023 13:52]
MySQL Verification Team
Hi Mr. cao, Thank you very much for your bug report. We were able to repeat it with 8.0.35. This is now a verified bug in 8.0 and 8.2.
[20 Dec 2023 18:34]
Jakub Lopuszanski
Hi, nice analysis! While I see how the patch you proposed solves your particular use case, it is unfortunately introducing a severe correctness bug. The reason why we distinguish DB_SUCCESS_LOCKED_REC from DB_SUCCESS is because we need to be sure that a new lock request was granted to satisfy this particular read of the row from this particular scan as opposed to a situation where the current transaction was already granted this lock request earlier, say because of some earlier statement. Consider a transaction which is similar to yours, but additionally does a select for *existing* row before asking for *non-exiting* row: ``` set transaction isolation level read committed; BEGIN; select * from t2 where i1 = 15 and i2 = 25 for share; select * from t2 where i1 = 15 and i2 = 20 for share; ``` In such case, the first SELECT might obtain the S lock on <i1=15,id=5>, and then the later SELECT will release it! That would be a bug, because the transaction should keep the S lock on the row until it commits, as it was needed for the first SELECT. For this fix to be correct, I think we would have to somehow ensure that it only applies to the case where previously we had DB_LOCK_WAIT. So, one tempting thing to do would be to add something like: ``` case DB_LOCK_WAIT: if (trx->releases_non_matching_rows()) { prebuilt->new_rec_lock[row_prebuilt_t::LOCK_PCUR] = true; } ``` however, please note that this case, as of today, has a comment warning devs to not do that: ``` case DB_LOCK_WAIT: /* Lock wait for R-tree should already be handled in sel_set_rtr_rec_lock() */ ut_ad(!dict_index_is_spatial(index)); /* Never unlock rows that were part of a conflict. */ prebuilt->new_rec_lock.reset(); ut_a(!use_semi_consistent); goto lock_wait_or_error; ``` It even tries to clear prebuilt->new_rec_lock. This comment was added 13 years ago in a fix for "Bug#53674: InnoDB: Error: unlock row could not find a 4 mode lock on the record" I don't know what was the justification for this particular rule. I can't find any discussion of it neither in the bug: https://bugs.mysql.com/bug.php?id=53674 nor in the review board: https://rb.mysql.oraclecorp.com/rb/r/368/ What I do know, though, is that the code in this area was buggy, and confused. And for DB_LOCK_WAIT it used to check for ROW_READ_TRY_SEMI_CONSISTENT flag and in such case it released the lock manually, in particular this is how the code looked like in 2022 before commit d59ea3342252: ``` case DB_LOCK_WAIT: /* Lock wait for R-tree should already be handled in sel_set_rtr_rec_lock() */ ut_ad(!dict_index_is_spatial(index)); /* Never unlock rows that were part of a conflict. */ prebuilt->new_rec_lock.reset(); if (UNIV_LIKELY(prebuilt->row_read_type != ROW_READ_TRY_SEMI_CONSISTENT) || unique_search || index != clust_index) { goto lock_wait_or_error; } ut_a(trx->allow_semi_consistent()); /* The following call returns 'offsets' associated with 'old_vers' */ row_sel_build_committed_vers_for_mysql( clust_index, prebuilt, rec, &offsets, &heap, &old_vers, need_vrow ? &vrow : nullptr, &mtr); /* Check whether it was a deadlock or not, if not a deadlock and the transaction had to wait then release the lock it is waiting on. */ DEBUG_SYNC_C("semi_consistent_read_would_wait"); err = lock_trx_handle_wait(trx); switch (err) { case DB_SUCCESS: /* The lock was granted while we were searching for the last committed version. Do a normal locking read. */ offsets = rec_get_offsets(rec, index, offsets, ULINT_UNDEFINED, UT_LOCATION_HERE, &heap); goto locks_ok; case DB_DEADLOCK: goto lock_wait_or_error; case DB_LOCK_WAIT: ut_ad(!dict_index_is_spatial(index)); err = DB_SUCCESS; break; default: ut_error; } if (old_vers == nullptr) { /* The row was not yet committed */ goto next_rec; } did_semi_consistent_read = true; rec = old_vers; prev_rec = rec; ut_d(prev_rec_debug = row_search_debug_copy_rec_order_prefix( pcur, index, prev_rec, &prev_rec_debug_n_fields, &prev_rec_debug_buf, &prev_rec_debug_buf_size)); break; ``` In particular note the call to lock_trx_handle_wait(trx) which basically cancels the waiting lock. So, perhaps we cleared the new_rec_lock array, because we knew we will remove the lock by using lock_trx_handle_wait(trx)? But then in d59ea3342252 I've simplified this, to simply use SELECT_SKIP_LOCKED, so that the wait never really happens for semi-consistent reads. So, perhaps, today, we could remove the: - /* Never unlock rows that were part of a conflict. */ - prebuilt->new_rec_lock.reset(); ? I don't know. This would require a quite long investigation. The area of semi-consistent read, and this related (but separate) mechanism of releasing locks is a very complicated and it is very easy to introduce a bug here. Also, note that in your scenario the use_semi_consistent evaluates to false (because row_read_type is not ROW_READ_TRY_SEMI_CONSISTENT, and also because index is notclust_index). For example, note this cautionary comment: ``` if (!same_user_rec && trx->releases_non_matching_rows()) { /* Since we were not able to restore the cursor on the same user record, we cannot use row_prebuilt_t::try_unlock() to unlock any records, and we must thus reset the new rec lock info. Since in lock0lock.cc we have blocked the inheriting of gap X-locks, we actually do not have any new record locks set in this case. Note that if we were able to restore on the 'same' user record, it is still possible that we were actually waiting on a delete-marked record, and meanwhile it was removed by purge and inserted again by some other user. But that is no problem, because in rec_loop we will again try to set a lock, and new_rec_lock_info in trx will be right at the end. */ prebuilt->new_rec_lock.reset(); } ``` This suggests that one has to take care of issues like "lock inheritance", delete-marked records, etc. The way it is problematic is that by the time the transaction thread wakes up, it might turn out that the lock it got acquired is actually on a delete-marked record, or moved to gap, or removed by purge, or something (I am not saying that any of this options are *really* possible, only that in general they need to be considered and tideously proven not to be a problem) All I can say now, is that your patch is wrong, and figuring out a correct patch is very difficult.
[8 Jan 2024 9:56]
MySQL Verification Team
Thank you, Jacub .......