Bug #115107 Optimization in row_purge_poss_sec Function for Undo Purge Process
Submitted: 23 May 18:54 Modified: 24 May 21:01
Reporter: Zongzhi Chen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, performance

[23 May 18:54] Zongzhi Chen
Description:
During the undo purge process, in the function `row_purge_poss_sec()`, as per the comment:

> "Determines if it is possible to remove a secondary index entry.  
> Removal is possible if the secondary index entry does not refer to any not delete marked version of a clustered index record where DB_TRX_ID is newer than the purge view."

If the purge view is newer than the secondary index page's `max_trx_id`, then it shouldn't need to build the previous version because the secondary index entry's `DB_TRX_ID` is older than the page's `max_trx_id`. If the purge view is newer than the page's `max_trx_id`, it is also absolutely larger than the secondary index entry's `DB_TRX_ID`.

This logic is similar to what `lock_sec_rec_cons_read_sees()` does in `row_search_mvcc()`.

**Code Diff:**

```diff
-  if (row_purge_poss_sec(node, index, entry)) {
+  if (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+        index->table->name) || row_purge_poss_sec(node, index, entry)) {
     /* Remove the index record, which should have been
     marked for deletion. */
     if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
@@ -499,7 +500,8 @@ if possible.
     case ROW_FOUND:
       /* Before attempting to purge a record, check
       if it is safe to do so. */
-      if (row_purge_poss_sec(node, index, entry)) {
+      if (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+            index->table->name) || row_purge_poss_sec(node, index, entry)) {
         btr_cur_t *btr_cur = pcur.get_btr_cur();

         /* Only delete-marked records should be purged. */
```

**Test Case Results:**

In a simple `oltp_update_index` test case:

- **Before Optimization:**
  - `Innodb_trx_history_list_len`: 34,873,334
  - Time taken: 1,311 seconds
  - Average: 26,600 undo logs per second

- **After Optimization:**
  - `Innodb_trx_history_list_len`: 32,273,298
  - Time taken: 1,030 seconds
  - Average: 31,333 undo logs per second

This shows about an 18% improvement in performance.

I believe that if there is an older purge view blocking the purge, the optimization would be even more significant.

How to repeat:
1. Use a transaction (trx) to block the undo purge.
2. Run the `sysbench oltp_update_index` test case.
3. Measure the purge performance before applying the optimization.
4. Apply the code changes as shown in the diff.
5. Run the `sysbench oltp_update_index` test case again.
6. Measure the purge performance after applying the optimization.
7. Compare the results to see the improvement.

Suggested fix:
the code
```
-  if (row_purge_poss_sec(node, index, entry)) {
+  if (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+        index->table->name) || row_purge_poss_sec(node, index, entry)) {
     /* Remove the index record, which should have been
     marked for deletion. */
     if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
@@ -499,7 +500,8 @@ if possible.
     case ROW_FOUND:
       /* Before attempting to purge a record, check
       if it is safe to do so. */
-      if (row_purge_poss_sec(node, index, entry)) {
+      if (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+            index->table->name) || row_purge_poss_sec(node, index, entry)) {
         btr_cur_t *btr_cur = pcur.get_btr_cur();

         /* Only delete-marked records should be purged. */

```
[24 May 9:55] MySQL Verification Team
Hi Mr. Chen,

Thank you for your bug report.

This is now a verified performance improvement bug in 8.0 and higher.
[24 May 21:01] Zongzhi Chen
Yet another optimization can add to the code.

```
  switch (search_result) {
    case ROW_FOUND:
      /* Before attempting to purge a record, check
      if it is safe to do so. */
      if (row_purge_poss_sec(node, index, entry))) {
        btr_cur_t *btr_cur = pcur.get_btr_cur();

        /* Only delete-marked records should be purged. */
        if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
                                  dict_table_is_comp(index->table))) {
          ib::error(ER_IB_MSG_1008)
              << "tried to purge non-delete-marked"
                 " record"
                 " in index "
              << index->name << " of table " << index->table->name
              << ": tuple: " << *entry << ", record: "
              << rec_index_print(btr_cur_get_rec(btr_cur), index);
          pcur.close();
          ut_d(ut_error);
          ut_o(goto func_exit_no_pcur);
        }
```

as the comment said, only delete-marked record should be purged. However, this record may be non-delete-marked, such as in this case:

```
insert into t values(1, 2);
update t set c2 = 3 where c1 = 1;
update t set c2 = 4 where c1 = 1;
update t set c2 = 3 where c1 = 1;																			
```

Then the secondary index record is looks like this:
(2, 1, delete-marked), (3, 1, non-delete-marked)(4, 1, delete-marked)

when purging the update undo of this sql "update t set c2 = 4 where c1 = 1;"
the secondary record (3, 1) is non-delete-marked.

If the secondary record is non-delete-marked, the code go into the ib:error(ER_IB_MSG_1008) if this function row_purge_poss_sec() not return false. However, in function row_purge_poss_sec() doesn't check whether the record is delete of not, how could row_purge_poss_sec() know it should return false if the secondary record is non-delete-marked?

True be told, the code is really hard to understand.
Here's my best understanding of what this code is doing.
if the secondary record is delete-marked, then row_purge_poss_sec() => row_vers_old_has_index_entry() need to build previous version to check if the secondary index entry does not refer to any not delete marked version of a clustered index record where DB_TRX_ID is newer than the purge view. if not then return true.
if the secondary record is non-delete-marked, in row_vers_old_has_index_entry() function, the secondary index record is naturally equal to the cluster index record. then this function return true.

Then the result is that if the secondary record is delete marked and refer to non-delete-marked version of a clustered index or the record is non-delete-marked, row_vers_old_has_index_entry() return true, and row_purge_poss_sec() will return false.
Then the code will not go into the error path.

However, calling function row_purge_poss_sec() is a bit expensive. In function row_purge_poss_sec(), it need call row_purge_reposition_pcur() to find the cluster index record that relate to this secondary record, this function need to travel the btree and calling btr_cur_search_to_nth_level() to find it. And then calling row_vers_old_has_index_entry() to find 

So adding a simple check can avoid the consumption of calling function row_purge_poss_sec()
rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table))

Combine the optimization of purge view filter, the code looks like:

```
diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
index 1d22e6b6328..379d07fdb1d 100644
--- a/storage/innobase/row/row0purge.cc
+++ b/storage/innobase/row/row0purge.cc
@@ -371,7 +371,9 @@ index tree.  Does not try to buffer the delete.
   which cannot be purged yet, requires its existence. If some requires,
   we should do nothing. */

-  if (row_purge_poss_sec(node, index, entry)) {
+  if (rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table)) &&
+      purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+        index->table->name) || row_purge_poss_sec(node, index, entry)) {
     /* Remove the index record, which should have been
     marked for deletion. */
     if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
@@ -499,7 +501,10 @@ if possible.
     case ROW_FOUND:
       /* Before attempting to purge a record, check
       if it is safe to do so. */
-      if (row_purge_poss_sec(node, index, entry)) {
+      if (rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()),
+            dict_table_is_comp(index->table)) &&
+          (purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur()))),
+            index->table->name) || row_purge_poss_sec(node, index, entry))) {
         btr_cur_t *btr_cur = pcur.get_btr_cur();

         /* Only delete-marked records should be purged. */
```

in my test case, oltp_read_write with 100 table, the qps improve from 224969 to 247828, and the purge speed also increased.
[27 May 10:11] MySQL Verification Team
Thank you Mr. Chen, for your new contribution.
[28 May 16:21] Jakub Lopuszanski
Impressive analysis!
I think the later version of the patch in which you first check if the secondary index record is delete marked at all is probably correct.
This made me wonder, why wasn't it written in the way you propose from the beginning?
After all, as you've said, it's much cheaper to test for delete-mark first.

I can see that before 2014 the code was first calling row_purge_poss_sec, and if it said we should remove the record, then 
- we had a debug assert that it is not delete marked
- but in release mode we would gladly remove a non-delete marked error
The fix for Bug#18631496 : MULTIPLE PURGE THREADS - CORRUPTION - ERROR IN SEC INDEX ENTRY UPDATE
changed that to use an if:
```
        if (row_purge_poss_sec(node, index, entry)) {
                /* Remove the index record, which should have been
                marked for deletion. */
-               ut_ad(REC_INFO_DELETED_FLAG
-                     & rec_get_info_bits(btr_cur_get_rec(btr_cur),
-                                         dict_table_is_comp(index->table)));
+               if (!rec_get_deleted_flag(btr_cur_get_rec(btr_cur),
+                                         dict_table_is_comp(index->table))) {
+                       fputs("InnoDB: tried to purge sec index entry not"
+                             " marked for deletion in\n"
+                             "InnoDB: ", stderr);
+                       dict_index_name_print(stderr, NULL, index);
+                       fputs("\n"
+                             "InnoDB: tuple ", stderr);
+                       dtuple_print(stderr, entry);
+                       fputs("\n"
+                             "InnoDB: record ", stderr);
+                       rec_print(stderr, btr_cur_get_rec(btr_cur), index);
+                       putc('\n', stderr);
+
+                       ut_ad(0);
+
+                       goto func_exit;
+               }

                btr_cur_pessimistic_delete(&err, FALSE, btr_cur, 0,
                                           RB_NONE, &mtr);
```
so it is no longer causing a crash, nor corruption.
The commit offers no test case, and there's none in the RB. The bug report has some non-deterministic test cases.
But, it seems that nobody ever understood clearly how exactly it can happen that `row_purge_poss_sec` results `true` for a record which is not delete marked.
Perhaps this is the reason we've decided to add an error message, so that if it happens again we can try to investigate?
So, if we apply your proposal, then it will *hide* the problem, which we have never solved, which sounds a bit bad.
Thus, I think, we should first try to use some of the non-deterministic tests that are listed in
 Bug#18631496 : MULTIPLE PURGE THREADS - CORRUPTION - ERROR IN SEC INDEX ENTRY UPDATE
 Bug 19138298 - RECORD IN INDEX WAS NOT FOUND ON ROLLBACK, TRYING TO INSERT
 Bug#18652315 	ABORT IN PURGE, ERROR IN SEC INDEX ENTRY DEL UNDO, FK 
and see if any of them can produce the error message.
Then we would could try to find the root cause.
Then fix it.
Then I think we can use your patch.

Alternative approach would be to do something like:
```
if(rec_get_deleted_flag(btr_cur_get_rec(pcur.get_btr_cur()), dict_table_is_comp(index->table)){ 
  if(purge_sys->view.changes_visible(page_get_max_trx_id(page_align(btr_cur_get_rec(pcur.get_btr_cur())))) ||
     row_purge_poss_sec(node, index, entry)){
     ....delete the row
  }
}else{
  // in debug mode assert that we wouldn't try to purge
  ut_ad(!row_purge_poss_sec(node, index, entry));
}
```
so that we could get best performance in release mode, while also keeping an eye on the problem in debug mode, so that if the problem appears in our CI we will know.

One more thing to verify in your patch is the issue of purge_sys->latch which needs to be acquired at least in S mode, to access purge_sys->view.
If it is not already acquired by this thread (I think it is not) then we should make sure we acquire it for the shortest possible duration and don't violate latching order (`./mtr --mysqld=--innodb_sync_debug=1`)

Anyway, great idea, thanks!
[29 May 10:22] MySQL Verification Team
Thank you, Jakub.