Bug #116195 Undo log scan at startup can cause high downtime if large transaction was open
Submitted: 22 Sep 2024 9:37 Modified: 24 Sep 2024 15:30
Reporter: Shankar Iyer Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:8.0,9.0,5.7 OS:Any
Assigned to: CPU Architecture:Any

[22 Sep 2024 9:37] Shankar Iyer
Description:
During restart recovery after a mysqld crash, InnoDB scans the undo log for resurrecting table locks (trx_resurrect_locks()). A transaction or DML that updated millions of rows but had not yet committed at the time of the crash, can result in the undo log growing to GBs. MySQL is not accessible during this time and the downtime could touch seconds/minutes. 

The code in trx_undo_get_prev_rec_from_prev_page() does a single page/single threaded read at a time. With networked storage volumes in the Cloud, single threaded synchronous read can result in low bandwidth usage due to high latency. The below 2 log extracts show the downtime in current MySQL 9.0 and the significantly reduced downtime with the tested patch :-

Current MySQL 9.0 - Undo scan takes around 140 seconds.
-------------------------------------------------------
2024-09-21T17:08:47.877965Z 0 [System] [MY-010116] [Server] /usr/local/mysql/bin/mysqld (mysqld 9.0.1) starting as process 2861382
2024-09-21T17:08:49.299560Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 42 ms.
2024-09-21T17:08:49.299759Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 5905 found for resurrecting updates
2024-09-21T17:08:49.299910Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1065 to acquire lock
2024-09-21T17:09:19.300296Z 1 [Note] [MY-014020] [InnoDB] Records read: 27838850 - Pages read: 66759
2024-09-21T17:09:49.300304Z 1 [Note] [MY-014020] [InnoDB] Records read: 57529667 - Pages read: 137960
2024-09-21T17:10:19.300571Z 1 [Note] [MY-014020] [InnoDB] Records read: 87837227 - Pages read: 210640
2024-09-21T17:10:49.300730Z 1 [Note] [MY-014020] [InnoDB] Records read: 115513100 - Pages read: 277009
2024-09-21T17:11:09.388664Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 134217728 - Total pages read: 321734 - Total tables acquired: 1
2024-09-21T17:11:09.390356Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-21T17:11:09.390472Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 134217728 row operations to undo
2024-09-21T17:11:09.390495Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 6401
2024-09-21T17:11:09.392586Z 1 [Note] [MY-012255] [InnoDB] Removed temporary tablespace data file: "ibtmp1"
2024-09-21T17:11:09.990084Z 0 [System] [MY-010931] [Server] /usr/local/mysql/bin/mysqld: ready for connections. Version: '9.0.1'  socket: '/tmp/mysql.sock'  port: 0  Source distribution.

Reduced downtime with patch - undo log scan takes 25 seconds compared to 140 seconds
--------------------------------------------------------------
2024-09-21T16:20:20.836581Z 0 [System] [MY-010116] [Server] /usr/local/mysql/bin/mysqld (mysqld 9.0.1) starting as process 2859670
2024-09-21T16:20:22.346639Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-09-21T16:20:22.346742Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 53 ms.
2024-09-21T16:20:22.346821Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 5392 found for resurrecting updates
2024-09-21T16:20:22.347139Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1065 to acquire lock
2024-09-21T16:20:47.385364Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 134217728 - Total pages read: 321734 - Total tables acquired: 1
2024-09-21T16:20:47.387068Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-21T16:20:47.387151Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 134217728 row operations to undo
2024-09-21T16:20:47.387167Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 5889
...
2024-09-21T16:20:48.167851Z 0 [System] [MY-010931] [Server] /usr/local/mysql/bin/mysqld: ready for connections. Version: '9.0.1'  socket: '/tmp/mysql.sock'  port: 0  Source distribution.

Filesystem cache readahead is not effective in this situation because the undo log is being read in the "reverse" order and also the System tablespace is opened in O_DIRECT mode in typical production deployments.

How to repeat:
1. Turn OFF auto-commit 

2. Run a UPDATE (or INSERT/DELETE) that updates millions of rows in a table

3. Kill mysqld when the UPDATE has completed but not committed. Or mysqld can be killed when the UPDATE has update'd few millions of rows (generated enough undo)

4. Clear file system cache (to simulate failover to new VM or if O_DIRECT is disabled).

5. Restart mysqld. 

6. Result : mysqld could spend seconds/minutes in table lock resurrection depending on how much undo was generated by active transaction(s) at time of crash.

mysql> set autocommit=off;
Query OK, 0 rows affected (0.00 sec)

mysql> update orders set status = 'B';

Query OK, 134217728 rows affected (23 min 24.14 sec)
Rows matched: 134217728  Changed: 134217728  Warnings: 0

---TRANSACTION 5905, ACTIVE 1452 sec
844139 lock struct(s), heap size 102457464, 135061866 row lock(s), undo log entries 134217728
MySQL thread id 1157, OS thread handle 130597883205312, query id 25 localhost root

mysql> select space,name,allocated_size from innodb_tablespaces;
+------------+------------------+----------------+
| space      | name             | allocated_size |
+------------+------------------+----------------+
| 4294967294 | mysql            |       26218496 |
| 4294967293 | innodb_temporary |       12582912 |
| 4294967279 | innodb_undo_001  |     5318381568 |
| 4294967151 | innodb_undo_002  |       16777216 |
|          1 | sys/sys_config   |         114688 |
|          2 | test/demo1       |         114688 |
|          3 | test/orders      |    14000599040 |
+------------+------------------+----------------+
7 rows in set (0.00 sec)

<-------- Kill mysqld now

2024-09-21T17:08:47.877965Z 0 [System] [MY-010116] [Server] /usr/local/mysql/bin/mysqld (mysqld 9.0.1) starting as process 2861382
2024-09-21T17:08:49.299560Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 42 ms.
2024-09-21T17:08:49.299759Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 5905 found for resurrecting updates
2024-09-21T17:08:49.299910Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1065 to acquire lock
2024-09-21T17:09:19.300296Z 1 [Note] [MY-014020] [InnoDB] Records read: 27838850 - Pages read: 66759
2024-09-21T17:09:49.300304Z 1 [Note] [MY-014020] [InnoDB] Records read: 57529667 - Pages read: 137960
2024-09-21T17:10:19.300571Z 1 [Note] [MY-014020] [InnoDB] Records read: 87837227 - Pages read: 210640
2024-09-21T17:10:49.300730Z 1 [Note] [MY-014020] [InnoDB] Records read: 115513100 - Pages read: 277009
2024-09-21T17:11:09.388664Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 134217728 - Total pages read: 321734 - Total tables acquired: 1
2024-09-21T17:11:09.390356Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-21T17:11:09.390472Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 134217728 row operations to undo
2024-09-21T17:11:09.390495Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 6401
2024-09-21T17:11:09.392586Z 1 [Note] [MY-012255] [InnoDB] Removed temporary tablespace data file: "ibtmp1"
2024-09-21T17:11:09.990084Z 0 [System] [MY-010931] [Server] /usr/local/mysql/bin/mysqld: ready for connections. Version: '9.0.1'  socket: '/tmp/mysql.sock'  port: 0  Source distribution.

Suggested fix:
Use the principle of prefetch/readahead plus concurrent reads to effectively use the bandwidth of networked storage volumes in the Cloud.

--- a/storage/innobase/trx/trx0undo.cc
+++ b/storage/innobase/trx/trx0undo.cc
@@ -169,20 +169,28 @@ static trx_undo_rec_t *trx_undo_get_prev_rec_from_prev_page(
     return (nullptr);
   }

   space = page_get_space_id(undo_page);

   bool found;
   const page_size_t &page_size = fil_space_get_page_size(space, &found);

   ut_ad(found);

+  if (((prev_page_no + 1) % FSP_EXTENT_SIZE) == 0) /* Reading a new extent from end */
+  {
+    for(page_no_t i = 0; i < FSP_EXTENT_SIZE; i++) {
+      buf_read_page_background(page_id_t(space, (prev_page_no - i)),
+                               page_size, false);
+    }
+  }
+
   buf_block_t *block =
       buf_page_get(page_id_t(space, prev_page_no), page_size,
                    shared ? RW_S_LATCH : RW_X_LATCH, UT_LOCATION_HERE, mtr);

   buf_block_dbg_add_level(block, SYNC_TRX_UNDO_PAGE);

   prev_page = buf_block_get_frame(block);

   return (trx_undo_page_get_last_rec(prev_page, page_no, offset));
 }
[23 Sep 2024 10:21] MySQL Verification Team
HI Mr Iyer,

Thank you for your bug report.

However, we have to try to repeat the behaviour with and without a patch.

Hence, send us a full SQL test case ...... including CREATE TABLE and INSERT's ...... you can use sysbench for that purpose, if necessary.

Also, let us know when we should kill the daemon exactly ........

Hence, we need a fully repeatable test case.
[24 Sep 2024 0:58] Marc Reilly
To repro:

env:
- r6i.4xlarge
- 800G GP3 storage
- CNF, not sure it really matters apart from logging for NOTE messages. 
```
innodb_redo_log_capacity=10G
innodb_buffer_pool_size=10G
innodb-buffer-pool-load-at-startup=OFF
log_error_verbosity=3

```

generate sysbench table with 100m rows
 ./sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=2000000 --mysql-db=test --tables=5 --threads=5 prepare

Use 3 terminals/sessions:

session 1;
sysbench /usr/local/share/sysbench/oltp_update_index.lua --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-host=localhost --mysql-socket=/home/ec2-user/mysql-server/bld/mysql-test/var/tmp/mysqld.1.sock --threads=100 --rand-type=uniform --db-ps-mode=disable --tables=2 --forced-shutdown --table-size=100000000 --report-interval=1 --time=60 run

Session 2:
begin;
select * from test.sbtest1 limit 1;
update sbtest2 set pad='jdnbckjanbckjabsckjasbcjasbcjkabadvc';
# dont commit

select trx_state,trx_lock_structs,trx_lock_memory_bytes,trx_rows_locked,trx_rows_modified from information_schema.innodb_trx\G
*************************** 1. row ***************************
            trx_state: RUNNING
     trx_lock_structs: 1369865
trx_lock_memory_bytes: 154787960
      trx_rows_locked: 101369864
    trx_rows_modified: 100000000

$ ls -lahrt 90-datadir-dirty/undo_00*
-rw-r-----. 1 ec2-user ec2-user 8.9G Sep 24 00:05 90-datadir-dirty/undo_002
-rw-r-----. 1 ec2-user ec2-user  32M Sep 24 00:05 90-datadir-dirty/undo_001

Terminal 3 once session 2 completes update:
sudo kill -9 `pidof mysqld`
sudo sync; 
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

Now start mysql

======
9.0.1 - NO PATCH
321 seconds
=======
2024-09-24T00:06:14.090823Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-09-24T00:06:40.337898Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-09-24T00:06:40.338603Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-09-24T00:06:40.339695Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-09-24T00:06:40.339739Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 229464
2024-09-24T00:06:40.839240Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-09-24T00:06:40.839280Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 499 ms.
2024-09-24T00:06:40.839308Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 196797 found for resurrecting updates
2024-09-24T00:06:40.840479Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1070 to acquire lock
2024-09-24T00:07:10.839626Z 1 [Note] [MY-014020] [InnoDB] Records read: 13473673 - Pages read: 77882
2024-09-24T00:07:40.839740Z 1 [Note] [MY-014020] [InnoDB] Records read: 22791972 - Pages read: 131745
2024-09-24T00:08:10.839788Z 1 [Note] [MY-014020] [InnoDB] Records read: 31994707 - Pages read: 184940
2024-09-24T00:08:40.840256Z 1 [Note] [MY-014020] [InnoDB] Records read: 41131875 - Pages read: 237756
2024-09-24T00:09:10.840405Z 1 [Note] [MY-014020] [InnoDB] Records read: 50016463 - Pages read: 289112
2024-09-24T00:09:40.840682Z 1 [Note] [MY-014020] [InnoDB] Records read: 59531982 - Pages read: 344115
2024-09-24T00:10:10.840947Z 1 [Note] [MY-014020] [InnoDB] Records read: 68732295 - Pages read: 397296
2024-09-24T00:10:40.841725Z 1 [Note] [MY-014020] [InnoDB] Records read: 77966170 - Pages read: 450671
2024-09-24T00:11:10.842185Z 1 [Note] [MY-014020] [InnoDB] Records read: 86953520 - Pages read: 502621
2024-09-24T00:11:40.842579Z 1 [Note] [MY-014020] [InnoDB] Records read: 94717760 - Pages read: 547501
2024-09-24T00:12:02.574074Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 100000000 - Total pages read: 577895 - Total tables acquired: 1
2024-09-24T00:12:02.575541Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-24T00:12:02.575613Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 100000000 row operations to undo
2024-09-24T00:12:02.575623Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 229889

======
9.0.1 - PATCH
185 seconds
=======
2024-09-24T00:14:22.956041Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-09-24T00:14:49.324140Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-09-24T00:14:49.325380Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-09-24T00:14:49.326940Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-09-24T00:14:49.327014Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 229464
2024-09-24T00:14:49.795611Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-09-24T00:14:49.795647Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 468 ms.
2024-09-24T00:14:49.796118Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 196797 found for resurrecting updates
2024-09-24T00:14:49.797440Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1070 to acquire lock
2024-09-24T00:15:19.796305Z 1 [Note] [MY-014020] [InnoDB] Records read: 19529538 - Pages read: 112887
2024-09-24T00:15:49.796634Z 1 [Note] [MY-014020] [InnoDB] Records read: 35073242 - Pages read: 202735
2024-09-24T00:16:19.796816Z 1 [Note] [MY-014020] [InnoDB] Records read: 50611929 - Pages read: 292554
2024-09-24T00:16:49.797064Z 1 [Note] [MY-014020] [InnoDB] Records read: 66172933 - Pages read: 382502
2024-09-24T00:17:19.797256Z 1 [Note] [MY-014020] [InnoDB] Records read: 81735148 - Pages read: 472457
2024-09-24T00:17:49.797524Z 1 [Note] [MY-014020] [InnoDB] Records read: 97294768 - Pages read: 562397
2024-09-24T00:17:54.970497Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 100000000 - Total pages read: 577895 - Total tables acquired: 1
2024-09-24T00:17:54.971637Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-24T00:17:54.971691Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 100000000 row operations to undo
2024-09-24T00:17:54.971700Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 229889
[24 Sep 2024 0:58] Marc Reilly
======
8.4.2 - no PATCH
346 seconds
=======
2024-09-24T00:41:29.987262Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-09-24T00:41:30.088280Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-09-24T00:41:30.089526Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-09-24T00:41:30.090534Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-09-24T00:41:30.090575Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 141309
2024-09-24T00:41:30.283803Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-09-24T00:41:30.283837Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 193 ms.
2024-09-24T00:41:30.283863Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 103761 found for resurrecting updates
2024-09-24T00:41:30.283877Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1070 to acquire lock
2024-09-24T00:42:00.284008Z 1 [Note] [MY-014020] [InnoDB] Records read: 8129300 - Pages read: 46990
2024-09-24T00:42:30.284406Z 1 [Note] [MY-014020] [InnoDB] Records read: 17251590 - Pages read: 99720
2024-09-24T00:43:00.284578Z 1 [Note] [MY-014020] [InnoDB] Records read: 26085143 - Pages read: 150781
2024-09-24T00:43:30.284725Z 1 [Note] [MY-014020] [InnoDB] Records read: 34698467 - Pages read: 200569
2024-09-24T00:44:00.284999Z 1 [Note] [MY-014020] [InnoDB] Records read: 43218371 - Pages read: 249817
2024-09-24T00:44:30.285106Z 1 [Note] [MY-014020] [InnoDB] Records read: 51949335 - Pages read: 300285
2024-09-24T00:45:00.285527Z 1 [Note] [MY-014020] [InnoDB] Records read: 60759360 - Pages read: 351210
2024-09-24T00:45:30.285652Z 1 [Note] [MY-014020] [InnoDB] Records read: 69414377 - Pages read: 401239
2024-09-24T00:46:00.286081Z 1 [Note] [MY-014020] [InnoDB] Records read: 77817333 - Pages read: 449811
2024-09-24T00:46:30.286579Z 1 [Note] [MY-014020] [InnoDB] Records read: 86644658 - Pages read: 500836
2024-09-24T00:47:00.286772Z 1 [Note] [MY-014020] [InnoDB] Records read: 95372162 - Pages read: 551284
2024-09-24T00:47:16.761724Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 100000000 - Total pages read: 577895 - Total tables acquired: 1
2024-09-24T00:47:16.762840Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-24T00:47:16.762892Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 100000000 row operations to undo

======
8.4.2 - PATCH
192 seconds
=======
2024-09-24T00:48:52.335807Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-09-24T00:48:52.436728Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-09-24T00:48:52.437852Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-09-24T00:48:52.438925Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-09-24T00:48:52.438963Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 141309
2024-09-24T00:48:52.617756Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-09-24T00:48:52.617781Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 178 ms.
2024-09-24T00:48:52.617804Z 1 [Note] [MY-014017] [InnoDB] Transaction ID: 103761 found for resurrecting updates
2024-09-24T00:48:52.617816Z 1 [Note] [MY-014018] [InnoDB] Identified table ID: 1070 to acquire lock
2024-09-24T00:49:22.618117Z 1 [Note] [MY-014020] [InnoDB] Records read: 15951495 - Pages read: 92205
2024-09-24T00:49:52.618139Z 1 [Note] [MY-014020] [InnoDB] Records read: 31514056 - Pages read: 182162
2024-09-24T00:50:22.618265Z 1 [Note] [MY-014020] [InnoDB] Records read: 47056203 - Pages read: 272001
2024-09-24T00:50:52.618943Z 1 [Note] [MY-014020] [InnoDB] Records read: 62614612 - Pages read: 361934
2024-09-24T00:51:22.620199Z 1 [Note] [MY-014020] [InnoDB] Records read: 78180460 - Pages read: 451910
2024-09-24T00:51:52.620498Z 1 [Note] [MY-014020] [InnoDB] Records read: 93744578 - Pages read: 541876
2024-09-24T00:52:04.641167Z 1 [Note] [MY-014021] [InnoDB] Total records resurrected: 100000000 - Total pages read: 577895 - Total tables acquired: 1
2024-09-24T00:52:04.642310Z 1 [Note] [MY-014023] [InnoDB] Resurrected 1 transactions doing updates.
2024-09-24T00:52:04.642360Z 1 [Note] [MY-013023] [InnoDB] 1 transaction(s) which must be rolled back or cleaned up in total 100000000 row operations to undo
2024-09-24T00:52:04.642370Z 1 [Note] [MY-013024] [InnoDB] Trx id counter is 141569
[24 Sep 2024 1:00] Marc Reilly
Correction to sysbench gen command:
generate sysbench table with 100m rows should be
 ./sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=100000000 --mysql-db=test --tables=2 --threads=2 prepare
[24 Sep 2024 6:21] Shankar Iyer
Thank you Mark! 

Just summarizing the storage read throughput across all logs :-

My original unpatch'ed 9.0.1
---------------------------
321734 pages x 16384 = 5.27GB in 140 seconds = 37 MB/sec

My 9.01 with optimization patch
-------------------------------
321734 pages x 16384 = 5.27GB in 25 seconds = 210 MB/sec

Mark - ======9.0.1 - NO PATCH 321 seconds=======
------------------------------ 
577895 x 16384 = 9.4GB in 321 seconds = 29 MB/sec

Mark - ====== 9.0.1 - PATCH 185 seconds =======
------------------------------ 
577895 x 16384 = 9.4GB in 185 seconds = 51 MB/sec

Mark - ======8.4.2 - no PATCH 346 seconds =======
-------------------------------
 
577895 x 16384 = 9.4GB in 346 seconds = 27 MB/sec

Mark - ======8.4.2 - PATCH 192 seconds =======
-------------------------------
577895 x 16384 = 9.4GB in 192 seconds = 50 MB/sec

The improvement in undo log scan time and corresponding reduction in downtime will depend on :-

1. VM type & storage volume bandwidth/IOPS guaranteed by Cloud provider.

2. Is MySQL is built with native AIO? If MySQL is not built with AIO, the number of concurrent page reads will be limited by the number of background read threads.
[24 Sep 2024 12:03] MySQL Verification Team
HI Mr. Iyer and Reilly,

Sorry, we could not reproduce the problem.

We followed your instructions to the last letter. 

We only used our official binaries, without any patches added.

On our machine , we changed the configuration as you indicated, while all other settings were used as they are.

On 8.0.39 we had recovery in 35 seconds, on 8.4.2 we had recovery of 33 seconds and on 9.0.1 we had a recovery of 29 seconds.

We can't repeat your problem.

Can't repeat.
[24 Sep 2024 13:29] Shankar Iyer
>On 8.0.39 we had recovery in 35 seconds, on 8.4.2 we had recovery of 33 seconds and on 9.0.1 we had a recovery of 29 seconds.

If this was tried with the Sysbench schema test case from the earlier comment : 9.4 GB read in ~30 seconds indicates 300 MB/s storage throughput. Can you let know if you were trying on instance with local SSD? Single threaded/single page synchronous read on a ultra-low latency, local NVME SSD on-prem could show good throughput. Again, if a outage of ~30 seconds can be reduced to 15 seconds by the patch, that will be still a win.

The problem reported in this ticket is more acute in Cloud and with network attached storage. Typically latencies are as high as 500us to 2ms. Thus the single threaded/single page read through reaches on 30MB-50MB/sec.
[24 Sep 2024 13:31] Shankar Iyer
Correction : Thus the single threaded/single page read throughput reaches only 30MB-50MB/sec and thus it will take time to read 5GB or 10GB or more.
[24 Sep 2024 14:19] MySQL Verification Team
Hi,

No, that is not the case here. 

What we have is InnoDB configured with  100 % ACID operation, hence there is very little work  to be done after it is killed.
[24 Sep 2024 15:30] Shankar Iyer
> What we have is InnoDB configured with  100 % ACID operation, hence there is very little work  to be done after it is killed.

Interesting. If there were no active, uncommitted transactions AND there were no dirty pages after the last checkpoint, MySQL will certainly have very little work to do for undo recovery and redo recovery. Can you quickly check if you saw messages like '1 transaction(s) which must be rolled back' in your log?

The below commit added progress reporting for undo log scan, 1 message per 30 seconds. I am guessing that was done because customers reported MySQL spending time in recovery without indicating what is going on. The patch in this current ticket is precisely for the same situation and reduces downtime.

https://github.com/mysql/mysql-server/commit/6337405f71b59ffbd7ef234ebb04ce4a641c8fbb
[24 Sep 2024 16:02] MySQL Verification Team
Hi,

Yes, we saw a couple of those messages.

But, all data and logs were fully synced on the disk and undo logs were applied very fast, due to the way we configure numbers and sizes of redo and undo logs. Some of us also use separate  filesystems especially for those logs. Some even use different type of the filesystem.

There are many details on how to optimise the application of the undo logs, which we can not share.