Bug #109831 Fail to start if prepared XA transactions is larger than xarecover_st::len
Submitted: 29 Jan 12:13 Modified: 30 Jan 5:34
Reporter: zhai weixiang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[29 Jan 12:13] zhai weixiang
Description:
As we use external XA transaction on production, and we found a case that the crash recovery loops in function xa::recovery::recover_one_ht

the error log looks like this:

2023-01-29T19:56:06.450836+08:00 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T19:56:07.306047+08:00 0 [Note] [MY-013035] [InnoDB] 65536 transactions in prepared state after recovery
2023-01-29T19:56:07.348717+08:00 0 [Note] [MY-010224] [Server] Found 65536 prepared transaction(s) in InnoDB
2023-01-29T19:56:07.403739+08:00 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T19:56:08.261776+08:00 0 [Note] [MY-013035] [InnoDB] 65536 transactions in prepared state after recovery
2023-01-29T19:56:08.304281+08:00 0 [Note] [MY-010224] [Server] Found 65536 prepared transaction(s) in InnoDB
2023-01-29T19:56:08.358430+08:00 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T19:56:09.214465+08:00 0 [Note] [MY-013035] [InnoDB] 65536 transactions in prepared state after recovery
2023-01-29T19:56:09.261115+08:00 0 [Note] [MY-010224] [Server] Found 65536 prepared transaction(s) in InnoDB
2023-01-29T19:56:09.320649+08:00 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T19:56:10.181868+08:00 0 [Note] [MY-013035] [InnoDB] 65536 transactions in prepared state after recovery
2023-01-29T19:56:10.227623+08:00 0 [Note] [MY-010224] [Server] Found 65536 prepared transaction(s) in InnoDB
2023-01-29T19:56:10.295218+08:00 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...

The recovered XA transaction is 71184, which is larger than info->len in xa::recovery::recover_one_ht(65536)

Let's check how info->len is determined:

in function ha_recover:

```
  for (info.len = MAX_XID_LIST_SIZE;
       info.list == nullptr && info.len > MIN_XID_LIST_SIZE; info.len /= 2) {
    info.list = new (std::nothrow) XA_recover_txn[info.len];
  }
```

It means the info.len could be 128 which in lack of memory.

And let's check the code in function xa::recovery::recover_one_ht:

It'll get prepared transactions from engine layer, suppose we have 200 XA transactions in prepared state. since the info.len is 128, it only gets 128 trxs from engine and add to xa cache.

It'll loop again, but actually the prepared XA transaction is not removed from trx list in innodb, it'll gain same 128 transactions, then loop and loop forever...

How to repeat:
In case to easily repeat the case, let's slightly change the code:

```
diff --git a/mysql-8.0.32/sql/xa.cc b/mysql-8.0.32/sql/xa.cc
index 63491b4..5db15bf 100644
--- a/mysql-8.0.32/sql/xa.cc
+++ b/mysql-8.0.32/sql/xa.cc
@@ -308,11 +308,16 @@ int ha_recover(Xid_commit_list *commit_list, Xa_state_list *xa_list) {
     tc_heuristic_recover = TC_HEURISTIC_RECOVER_ROLLBACK;  // forcing ROLLBACK
     info.dry_run = false;
   }
-
+#if 0
   for (info.len = MAX_XID_LIST_SIZE;
        info.list == nullptr && info.len > MIN_XID_LIST_SIZE; info.len /= 2) {
     info.list = new (std::nothrow) XA_recover_txn[info.len];
   }
+#else
+  info.len = 4;
+  info.list = new (std::nothrow) XA_recover_txn[info.len];
+#endif
+
   if (!info.list) {
     LogErr(ERROR_LEVEL, ER_SERVER_OUTOFMEMORY,
            static_cast<int>(info.len * sizeof(XID)));
```

Let's hardcode info.len to 4, and then prepare 5 xa transactions:

root@(none) 07:49:02>xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            3 |            0 | 567  |
|        1 |            3 |            0 | 456  |
|        1 |            3 |            0 | 345  |
|        1 |            3 |            0 | 234  |
|        1 |            3 |            0 | 123  |
+----------+--------------+--------------+------+
5 rows in set (0.00 sec)

then kill the server and restart, check error log:

2023-01-29T11:50:13.437808Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437813Z 0 [Note] [MY-013035] [InnoDB] 4 transactions in prepared state after recovery
2023-01-29T11:50:13.437818Z 0 [Note] [MY-010224] [Server] Found 4 prepared transaction(s) in InnoDB
2023-01-29T11:50:13.437825Z 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T11:50:13.437830Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328007 in prepared state after recovery
2023-01-29T11:50:13.437835Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437841Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328005 in prepared state after recovery
2023-01-29T11:50:13.437846Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437852Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328003 in prepared state after recovery
2023-01-29T11:50:13.437858Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437864Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328001 in prepared state after recovery
2023-01-29T11:50:13.437869Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437874Z 0 [Note] [MY-013035] [InnoDB] 4 transactions in prepared state after recovery
2023-01-29T11:50:13.437879Z 0 [Note] [MY-010224] [Server] Found 4 prepared transaction(s) in InnoDB
2023-01-29T11:50:13.437886Z 0 [Note] [MY-013032] [InnoDB] Starting recovery for XA transactions...
2023-01-29T11:50:13.437891Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328007 in prepared state after recovery
2023-01-29T11:50:13.437896Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437902Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328005 in prepared state after recovery
2023-01-29T11:50:13.437907Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437914Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328003 in prepared state after recovery
2023-01-29T11:50:13.437919Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows
2023-01-29T11:50:13.437925Z 0 [Note] [MY-013033] [InnoDB] Transaction 497328001 in prepared state after recovery
2023-01-29T11:50:13.437930Z 0 [Note] [MY-013034] [InnoDB] Transaction contains changes to 1 rows

It loops forever and can't startup the instance

Suggested fix:
tag the transaction that has been scanned and ignore it in next loop
[29 Jan 12:14] zhai weixiang
a simple fix, not fully tested

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: recover_trx.diff (application/octet-stream, text), 1.81 KiB.

[29 Jan 12:31] zhai weixiang
btw, our production is using 8.0.18, and the example above is using 8.0.32 which has same issue.
[30 Jan 5:34] MySQL Verification Team
Hello zhai,

Thank you for the report and contribution.

regards,
Umesh