Bug #110091 optimize for innodb recovery
Submitted: 16 Feb 2023 5:27 Modified: 20 Feb 2023 13:58
Reporter: alex xing (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0.32 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[16 Feb 2023 5:27] alex xing
Description:
For the page already in the buffer pool ,no need to search redo hash again during the apply phase of innodb recovery.
Native logic:
1. first check whether page in buffer pool
2. if in buffer pool then do search redo hash(recv_get_rec) and do sync apply
3. if not in buffer pool, do read ahead and asynchronous apply
My optimization point:
For the second point, we can save the recv_addr and pass it to the apply function(recv_recover_page_in_bufferpool)

How to repeat:
just read the code

Suggested fix:
we can save the recv_addr and pass it to the apply function(recv_recover_page_in_bufferpool) just as the below patch.
In fact, recv_recover_page_in_bufferpool and recv_recover_page_func have a lot in common, and a lot of logic can be reused. The code I've given is not very elegant, but it illustrates the idea
[16 Feb 2023 5:27] alex xing
a simple patch to describe the optimization

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: optimize.patch (text/plain), 8.95 KiB.

[16 Feb 2023 5:53] MySQL Verification Team
Hello Alex Xing,

Thank you for the report and contribution.

regards,
umesh
[17 Feb 2023 9:00] Jakub Lopuszanski
Thank you for contributing to MySQL!

Can you share results of performance tests have you performed to measure the benefit of this optimisation?

I ask, because I thought that that the page is already in buffer pool is very rare. In particular our code coverage tools show this code is never executed during our test runs.
```
1121      350837 : static void recv_apply_log_rec(recv_addr_t *recv_addr) {
1122      350837 :   if (recv_addr->state == RECV_DISCARDED) {
1123        1821 :     ut_a(recv_sys->n_addrs > 0);
1124        1821 :     --recv_sys->n_addrs;
1125        1821 :     return;
1126             :   }
1127             : 
1128      349016 :   bool found;
1129      349016 :   const page_id_t page_id(recv_addr->space, recv_addr->page_no);
1130             : 
1131      349016 :   const page_size_t page_size =
1132      349016 :       fil_space_get_page_size(recv_addr->space, &found);
1133             : 
1134      698032 :   if (!found || recv_sys->missing_ids.find(recv_addr->space) !=
1135      349016 :                     recv_sys->missing_ids.end()) {
1136             :     /* Tablespace was discarded or dropped after changes were
1137             :     made to it. Or, we have ignored redo log for this tablespace
1138             :     earlier and somehow it has been found now. We can't apply
1139             :     this redo log out of order. */
1140             : 
1141           0 :     recv_addr->state = RECV_PROCESSED;
1142             : 
1143           0 :     ut_a(recv_sys->n_addrs > 0);
1144           0 :     --recv_sys->n_addrs;
1145             : 
1146             :     /* If the tablespace has been explicitly deleted, we
1147             :     can safely ignore it. */
1148             : 
1149           0 :     if (recv_sys->deleted.find(recv_addr->space) == recv_sys->deleted.end()) {
1150           0 :       recv_sys->missing_ids.insert(recv_addr->space);
1151             :     }
1152             : 
1153      349016 :   } else if (recv_addr->state == RECV_NOT_PROCESSED) {
1154       51470 :     mutex_exit(&recv_sys->mutex);
1155             : 
1156       51470 :     if (buf_page_peek(page_id)) {
1157           0 :       mtr_t mtr;
1158             : 
1159           0 :       mtr_start(&mtr);
1160             : 
1161           0 :       buf_block_t *block;
1162             : 
1163           0 :       block =
1164           0 :           buf_page_get(page_id, page_size, RW_X_LATCH, UT_LOCATION_HERE, &mtr);
1165             : 
1166           0 :       buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
1167             : 
1168           0 :       recv_recover_page(false, block);
1169             : 
1170           0 :       mtr_commit(&mtr);
1171             : 
1172           0 :     } else {
1173       51470 :       recv_read_in_area(page_id);
1174             :     }
1175             : 
1176       51470 :     mutex_enter(&recv_sys->mutex);
1177             :   }
1178             : }
```
This intuitively makes sense to me (and I even wanted to remove this code some time ago).
The redo log records are processed in batches, and before each batch the BP is empty. 
Processing of a batch starts by iterating over the page_id -> records map, and trying to process each page_id.
If the page is not in BP we use `recv_read_in_area` issue asynchronous reads for this page and neighbouring pages which also have changes to be applied.
The `recv_read_in_area` also adds the "placeholder" for the page to the BP and changes the recv_addr->state of the changes for these pages to `RECV_BEING_READ`.
This means that even though the `buf_page_peek()` would return true for such neighbour, the `if (recv_addr->state == RECV_NOT_PROCESSED)` will not be executed for these neighboring pages,
because they will either be still `RECV_BEING_READ`, or once they are read in the `buf_page_io_complete()` run by io thread will call `recv_recover_page(true, (buf_block_t *)bpage)` which will fully recover that page and further change the state to RECV_BEING_PROCESSED and finally RECV_PROCESSED.

As you see, my current mental model is that this code can not be reached at all, so if you can show a test which demonstrates that it can be executed, I'd be grateful.
Even more so, if you can show a case in which it becomes performance bottleneck.
[20 Feb 2023 13:58] alex xing
Thanks for your reply!
 First , how to download the code coverage tools you mentioned , I'd like to study it.
 Second, If the page is read ahead and hasn't started redo apply yet, the patch might works, as you said this scene is rare.
 And I haven't run crash recovery tests yet.But I am working on redo replication lately, and this optimization is readly useful.