Bug #115107 | Optimization in row_purge_poss_sec Function for Undo Purge Process | ||
---|---|---|---|
Submitted: | 23 May 2024 18:54 | Modified: | 17 Oct 2024 10:01 |
Reporter: | Zongzhi Chen (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 5.7, 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | innodb, performance |
[23 May 2024 18:54]
Zongzhi Chen
[24 May 2024 9:55]
MySQL Verification Team
Hi Mr. Chen, Thank you for your bug report. This is now a verified performance improvement bug in 8.0 and higher.
[24 May 2024 21:01]
Zongzhi Chen
Yet another optimization can add to the code. ``` switch (search_result) { case ROW_FOUND: /* Before attempting to purge a record, check if it is safe to do so. */ if (row_purge_poss_sec(node, index, entry))) { btr_cur_t *btr_cur = pcur.get_btr_cur(); /* Only delete-marked records should be purged. */ if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur), dict_table_is_comp(index->table))) { ib::error(ER_IB_MSG_1008) << "tried to purge non-delete-marked" " record" " in index " << index->name << " of table " << index->table->name << ": tuple: " << *entry << ", record: " << rec_index_print(btr_cur_get_rec(btr_cur), index); pcur.close(); ut_d(ut_error); ut_o(goto func_exit_no_pcur); } ``` as the comment said, only delete-marked record should be purged. However, this record may be non-delete-marked, such as in this case: ``` insert into t values(1, 2); update t set c2 = 3 where c1 = 1; update t set c2 = 4 where c1 = 1; update t set c2 = 3 where c1 = 1; ``` Then the secondary index record is looks like this: (2, 1, delete-marked), (3, 1, non-delete-marked)(4, 1, delete-marked) when purging the update undo of this sql "update t set c2 = 4 where c1 = 1;" the secondary record (3, 1) is non-delete-marked. If the secondary record is non-delete-marked, the code go into the ib:error(ER_IB_MSG_1008) if this function row_purge_poss_sec() not return false. However, in function row_purge_poss_sec() doesn't check whether the record is delete of not, how could row_purge_poss_sec() know it should return false if the secondary record is non-delete-marked? True be told, the code is really hard to understand. Here's my best understanding of what this code is doing. if the secondary record is delete-marked, then row_purge_poss_sec() => row_vers_old_has_index_entry() need to build previous version to check if the secondary index entry does not refer to any not delete marked version of a clustered index record where DB_TRX_ID is newer than the purge view. if not then return true. if the secondary record is non-delete-marked, in row_vers_old_has_index_entry() function, the secondary index record is naturally equal to the cluster index record. then this function return true. Then the result is that if the secondary record is delete marked and refer to non-delete-marked version of a clustered index or the record is non-delete-marked, row_vers_old_has_index_entry() return true, and row_purge_poss_sec() will return false. Then the code will not go into the error path. However, calling function row_purge_poss_sec() is a bit expensive. In function row_purge_poss_sec(), it need call row_purge_reposition_pcur() to find the cluster index record that relate to this secondary record, this function need to travel the btree and calling btr_cur_search_to_nth_level() to find it. And then calling row_vers_old_has_index_entry() to find So adding a simple check can avoid the consumption of calling function row_purge_poss_sec() rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table)) Combine the optimization of purge view filter, the code looks like: ``` diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc index 1d22e6b6328..379d07fdb1d 100644 --- a/storage/innobase/row/row0purge.cc +++ b/storage/innobase/row/row0purge.cc @@ -371,7 +371,9 @@ index tree. Does not try to buffer the delete. which cannot be purged yet, requires its existence. If some requires, we should do nothing. */ - if (row_purge_poss_sec(node, index, entry)) { + if (rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table)) && + purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))), + index->table->name) || row_purge_poss_sec(node, index, entry)) { /* Remove the index record, which should have been marked for deletion. */ if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur), @@ -499,7 +501,10 @@ if possible. case ROW_FOUND: /* Before attempting to purge a record, check if it is safe to do so. */ - if (row_purge_poss_sec(node, index, entry)) { + if (rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), + dict_table_is_comp(index->table)) && + (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))), + index->table->name) || row_purge_poss_sec(node, index, entry))) { btr_cur_t *btr_cur = pcur.get_btr_cur(); /* Only delete-marked records should be purged. */ ``` in my test case, oltp_read_write with 100 table, the qps improve from 224969 to 247828, and the purge speed also increased.
[27 May 2024 10:11]
MySQL Verification Team
Thank you Mr. Chen, for your new contribution.
[28 May 2024 16:21]
Jakub Lopuszanski
Impressive analysis! I think the later version of the patch in which you first check if the secondary index record is delete marked at all is probably correct. This made me wonder, why wasn't it written in the way you propose from the beginning? After all, as you've said, it's much cheaper to test for delete-mark first. I can see that before 2014 the code was first calling row_purge_poss_sec, and if it said we should remove the record, then - we had a debug assert that it is not delete marked - but in release mode we would gladly remove a non-delete marked error The fix for Bug#18631496 : MULTIPLE PURGE THREADS - CORRUPTION - ERROR IN SEC INDEX ENTRY UPDATE changed that to use an if: ``` if (row_purge_poss_sec(node, index, entry)) { /* Remove the index record, which should have been marked for deletion. */ - ut_ad(REC_INFO_DELETED_FLAG - & rec_get_info_bits(btr_cur_get_rec(btr_cur), - dict_table_is_comp(index->table))); + if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur), + dict_table_is_comp(index->table))) { + fputs("InnoDB: tried to purge sec index entry not" + " marked for deletion in\n" + "InnoDB: ", stderr); + dict_index_name_print(stderr, NULL, index); + fputs("\n" + "InnoDB: tuple ", stderr); + dtuple_print(stderr, entry); + fputs("\n" + "InnoDB: record ", stderr); + rec_print(stderr, btr_cur_get_rec(btr_cur), index); + putc('\n', stderr); + + ut_ad(0); + + goto func_exit; + } btr_cur_pessimistic_delete(&err, FALSE, btr_cur, 0, RB_NONE, &mtr); ``` so it is no longer causing a crash, nor corruption. The commit offers no test case, and there's none in the RB. The bug report has some non-deterministic test cases. But, it seems that nobody ever understood clearly how exactly it can happen that `row_purge_poss_sec` results `true` for a record which is not delete marked. Perhaps this is the reason we've decided to add an error message, so that if it happens again we can try to investigate? So, if we apply your proposal, then it will *hide* the problem, which we have never solved, which sounds a bit bad. Thus, I think, we should first try to use some of the non-deterministic tests that are listed in Bug#18631496 : MULTIPLE PURGE THREADS - CORRUPTION - ERROR IN SEC INDEX ENTRY UPDATE Bug 19138298 - RECORD IN INDEX WAS NOT FOUND ON ROLLBACK, TRYING TO INSERT Bug#18652315 ABORT IN PURGE, ERROR IN SEC INDEX ENTRY DEL UNDO, FK and see if any of them can produce the error message. Then we would could try to find the root cause. Then fix it. Then I think we can use your patch. Alternative approach would be to do something like: ``` if(rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table)){ if(purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur())))) || row_purge_poss_sec(node, index, entry)){ ....delete the row } }else{ // in debug mode assert that we wouldn't try to purge ut_ad(!row_purge_poss_sec(node, index, entry)); } ``` so that we could get best performance in release mode, while also keeping an eye on the problem in debug mode, so that if the problem appears in our CI we will know. One more thing to verify in your patch is the issue of purge_sys->latch which needs to be acquired at least in S mode, to access purge_sys->view. If it is not already acquired by this thread (I think it is not) then we should make sure we acquire it for the shortest possible duration and don't violate latching order (`./mtr --mysqld=--innodb_sync_debug=1`) Anyway, great idea, thanks!
[29 May 2024 10:22]
MySQL Verification Team
Thank you, Jakub.
[3 Oct 2024 8:25]
Jakub Lopuszanski
Posted by developer: Even though the patch for this bug, inspired by this contribution, got approved in code review, the final testing revealed one big problem with this approach. Turns out, that in some situations InnoDB is not updating the page_max_trx_id of a secondary index page, even though the page *is* modified. It is difficult to identify the whole scope of this problem, i.e. to compactly describe under what circumstances it can happen - this is because the logic of bumping page_max_trx_id is not tightly coupled with the logic of modifying secondary index pages, so one has to tediously look at all the places which modify a secondary page, and try to find a call to method which updates page_max_trx_id somewhere around it. Needless to say, even if one did such analysis, it would be quite error prone, and not future proof - there's nothing preventing a developer to add a new code which modifies a secondary index page or remove a code which bumps page_max_trx_id, as the two are pretty much independent actions. "The reason" why they are independent, seems to be that "conceptually" page_max_trx_id is a part of locking system, as opposed to a part of "B-tree page modification system". In other words, it was meant as a heuristic to quickly determine if an implicit lock is being held on a secondary index record without having to perform an expensive dive to primary key and undo log chain. This in turn means, that places which do not care about locking at all in our code base, often do not care about page_max_trx_id neither. One such place, thankfully caught in testing, is mysql.innodb_ddl_log table which tracks "a todo list of cleanup steps needed if DDL operation fails". This table *is* undo logged. Thus we do run purge for it. However, modifications to this table are done using BTR_NO_LOCKING_FLAG flag (see DDL_Log_Table::insert and DDL_Log_Table::remove). When this flag is in use the locking system is not involved at all and/or some of the lock system calls return immediately without bumping page_max_trx_id. There are many other places which use BTR_NO_LOCKING_FLAG. One may wonder how this can be correct? After all page_max_trx_id seems to be used not only for determining implicit locks, but also as a heuristic in MVCC scans, to avoid dives to primary key. I think the answer must be that it is because we use BTR_NO_LOCKING_FLAG only for tables which are never accessed in parallel from more than one transaction - which is perhaps proved case-by-case by referring to various properties ("temporary tables are seen just by 1 session") or mutexes ("persistent table metadata is being modified under protection of a global mutex"). Honestly, I don't know the full list of use cases, nor have I seen a comprehensive proof of correctness. Anyway, the bottom line is, that means the part of the heuristic which tries to rely on page_max_trx_id to infer that no (undo logged) modifications might err on the unsafe side. This contribution also contains another heuristic: looking at the delete mark flag. I think, this part is fine. Perhaps we will push just this part. One way to take advantage of `page_max_trx_id` heuristic, would be to force any operation on secondary index page to bump it. I think this shouldn't be a big problem for performance - the page in question is already in BP and X-latched, and the redo log record to describe it is just a few bytes. However this looks like a rather big change in code, and thus will have to wait.
[3 Oct 2024 9:18]
MySQL Verification Team
Thank you Jakub for this detailed analysis.
[17 Oct 2024 10:01]
Zongzhi Chen
First of all, I acknowledge that the optimization of the delete mark flag should indeed provide some performance improvement, and I believe the other aspects should also work as expected. If there is an mtr failure, we can investigate which mtr failed to pinpoint the issue. However, I have a question. Currently, in row_search_mvcc(), the function lock_sec_rec_cons_read_sees() uses max_trx_id for visibility filtering, which is similar to the proposed logic here. If this logic has an issue, then row_search_mvcc() would encounter the same problem. So, I investigated why row_search_mvcc() can use max_trx_id for visibility filtering without encountering issues. After grepping through the code, I summarized that the use of BTR_NO_LOCKING_FLAG mainly appears in the following four scenarios: 1. In DDL operations, since the pages involved in the DDL process are not accessed by other transactions, the BTR_NO_LOCKING_FLAG can be used when inserting data. 2. During transaction rollback, since the corresponding record lock has already been acquired during the normal execution phase, there is no need to reacquire the record lock when rolling back the insertion. Thus, BTR_NO_LOCKING_FLAG can be used. 3. In B-tree update operations, specifically in the function row_upd_sec_index_entry_low(), since the record lock is already acquired during the search phase, there is no need to acquire the record lock again during the update phase. Therefore, BTR_NO_LOCKING_FLAG can be used during the actual update. 4. In the case of temporary tables, since only the current statement accesses the table, no locking is needed, and BTR_NO_LOCKING_FLAG can be used. I understand that in these scenarios, BTR_NO_LOCKING_FLAG is used mainly because the page won’t be accessed by other transactions, or because the record lock has already been acquired earlier, so there’s no need to acquire it again. In theory, the same issues could arise in row_search_mvcc() because all these scenarios prevent concurrent access to the table. If this is the case, I believe that undo purge could follow a similar logic. If there are concerns, we could assign a very large default value to the max_trx_id field when generating a new page in the B-tree, which should resolve the issue.
[17 Oct 2024 10:42]
MySQL Verification Team
Hi Mr. Chen, Our Development team is the only one who can approve or disapprove a fix.