MySQL Bugs: #106628: Tencent Cloud-Native Database team: trx_t with PREPARED state in trx

Bug #106628	Tencent Cloud-Native Database team: trx_t with PREPARED state in trx_pools
Submitted:	3 Mar 2022 7:30	Modified:	3 Mar 2022 8:58
Reporter:	hao wu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0.28	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	Contribution

Description:
We found a suitation that trx_t object may be in TRX_STATE_PREPARED state in trx_pools

row_commit_for_mysql:

  case TRX_STATE_PREPARED:
    trx->op_info = "committing";

    /* For GTID persistence we need update undo segment. */ 
    db_err = trx_undo_gtid_add_update_undo(trx, false, false);
    if (db_err != DB_SUCCESS) {
      return (db_err);
    }

If undo log allocate failed in trx_undo_gtid_add_update_undo because of out of disk space or some reason, the error will be silenced in release version of mysqld

and returned, leaving trx with TRX_STATE_PREPARED state. And later will invoke trx_free_for_background, put back the trx_t object into trx_pools. If the trx_t object reused by other readers, system will probably crash, for example in trx_start_if_not_started_low

How to repeat:
See description

Suggested fix:
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 6773e200cbb..bf5a3b19186 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -2297,6 +2297,12 @@ dberr_t trx_commit_for_mysql(trx_t *trx) /*!< in/out: transaction */
/* For GTID persistence we need update undo segment. */
db_err = trx_undo_gtid_add_update_undo(trx, false, false);
if (db_err != DB_SUCCESS) {
+  trx->error_state = DB_SUCCESS;
+  trx_rollback_to_savepoint(trx, NULL);
+  trx->error_state = DB_SUCCESS;
return (db_err);
}

fix.patch

Attachment: fix.patch (application/octet-stream, text), 619 bytes.

Hello hao wu,

Thank you for the report and contribution.

regards,
Umesh

Thank you very much for your patch contribution, we appreciate it!

In order for us to continue the process of reviewing your contribution to MySQL, please send us a signed copy of the Oracle Contributor Agreement (OCA) as outlined in https://oca.opensource.oracle.com

Signing an OCA needs to be done only once and it's valid for all other Oracle governed Open Source projects as well.

Getting a signed/approved OCA on file will help us facilitate your contribution - this one, and others in the future.  

Please let me know, if you have any questions.

Thank you for your interest in MySQL.

fix.patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: fix.patch (application/octet-stream, text), 619 bytes.

Posted by developer:
 
Here's a testcase to reproduce the issue. 
Unfortunately, by default ./mtr has a limit of `--max-connections=128`, which can't be changed from the level of *-master.opt file (it is a limit of the test-runner itself set with the same name as the mysqld variable, which *can* and *must* be changed from *-master.opt file).
The test requires at least ~260 connections to be spawned to exhaust the number of available Undo Slots.
There are UNIV_PAGE_SIZE/16 Undo Slots in each Rollback Segment.
There are --innodb_rollback_segments Rollback Segments in each Undo Tablespace.
There are 2 redo-logged Undo Tablespaces by default.
So, using the minimal page size, --innodb_page_size=4k, w get 512 Undo Slots.
Each transaction may have two redo-logged Undo Logs attached: one for INSERTs, the other for other operations (aka "Update Undo Log").
So, to exhaust 512 all Undo Slots, we need at least 256 transactions.

mysql-test/suite/innodb/t/too_many_trxs-master.opt:
```
--initialize=--innodb_page_size=4k
--initialize=--innodb_rollback_segments=1
--max-connections=1000
--gtid_mode=ON_PERMISSIVE
```

mysql-test/suite/innodb/t/too_many_trxs-master.opt:
```
CREATE TABLE t (id INT PRIMARY KEY);
SELECT @@innodb_page_size, @@innodb_rollback_segments, @@max_connections;
# 4K / 16 per Undo Slots per Rollback Segment
# 1 Rollback Segment per Undo Tablespace
# 2 default redo-logged Undo Tablespaces
# 4k / 16 * 1 * 2 = 512
# We gonna create 300 connections, alloc 1 Insert Undo Log first, which
# should succeed even if there are some background transactions, and
# then for each of them alloc also 1 Update Undo Log, which should fail
--let $n=300
--let $i=0
while($i < $n)
{
   --echo Connecting $i-th client and assigning Insert Undo Segment
   --connect (con$i,localhost,root,,)
     --eval XA START 'trx$i'
     --eval INSERT INTO t VALUES ($i)
     # Should not see uncommitted inserts of others
     SELECT COUNT(*) FROM t;
   --inc $i
}
--let $i=0
while($i < $n)
{
   --echo Preparing $i-th client and allocating Update Undo Segment
   --connection con$i
#     --eval DELETE FROM t WHERE id=$i
     --eval XA END 'trx$i'
     --eval XA PREPARE 'trx$i'
   --inc $i
}
--let $i=0
while($i < $n)
{
   --echo Committing $i-th client
   --connection con$i
     --eval XA COMMIT 'trx$i'
   --disconnect con$i
   --inc $i
}
--connection default
SELECT COUNT(*) FROM t;
DROP TABLE t;
```

Posted by developer:
 
The test above shows what happens if the Undo Slots are exhausted during XA PREPARE, while the bug is about XA COMMIT.
Both are buggy.

In case of XA PREPARE, it looks like MYSQL_BIN_LOG::process_commit_stage_queue ignores the return value from trx_coordinator::set_prepared_in_tc_in_engines(head, all), which seems to be 177 if DB_TOO_MANY_CONCURRENT_TRXS is reported from trx_undo_gtid_add_update_undo.
So, the client sees just:
```
mysql> XA PREPARE 'b';
Query OK, 0 rows affected (5 min 51.57 sec)
```
so not even a warning, while in the error log there's:
```
2023-08-04T11:41:52.968777Z 8 [ERROR] [MY-011980] [InnoDB] Could not allocate undo segment slot for persisting GTID. DB Error: 47
```
I'd say it's a bug in Server layer, so I'll file it separately.

To reproduce scenario where Undo Slots are exhausted during COMMIT (as opposed to PREPARE) we need to use
XA COMMIT .. ONE PHASE, instead. 
This might be a bit more tricky, because as we commit the transactions, their undo logs get available for reuse by the others, due to the Undo Segment caching machanism.
So to keep the Undo Slots occupied, we need to start a new transaction in the place on the committed one, and ensure it reuses the Undo Segment:
```
--let $i=0
while($i < $n)
{
   --echo Preparing $i-th client and allocating Update Undo Segment
   --connection con$i
     --eval XA END 'trx$i'
     --eval XA COMMIT 'trx$i' ONE PHASE
     --eval XA BEGIN 'trx$i'
     --eval DELETE FROM t WHERE id=$i
   --inc $i
}
```

This seems to fail in the way consistent with the bug report: on an assert that a transaction which we try to start, should not be in prepared state, but it is:
```
Preparing 211-th client and allocating Update Undo Segment
XA END 'trx211';
XA COMMIT 'trx211' ONE PHASE;
XA BEGIN 'trx211';
DELETE FROM t WHERE id=211    ;
safe_process[2853]: Child process: 2854, exit: 1
[...]
2023-08-04T11:59:05.903174Z 221 [ERROR] [MY-011980] [InnoDB] Could not allocate undo segment slot for persisting GTID. DB Error: 47
2023-08-04T11:59:05.903268Z 221 [ERROR] [MY-013183] [InnoDB] Assertion failure: trx0trx.cc:3367 thread 140155122390784
[...]
Thread 1 (Thread 0x7f7868489700 (LWP 4024)):
#0  0x00007f7b44178aa1 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000fd2f7d in handle_fatal_signal () at /home/jlopusza/local-drive/mysql/sql/signal_handler.cc:230
#2  <signal handler called>
#3  0x00007f7b42466387 in raise () from /lib64/libc.so.6
#4  0x00007f7b42467a78 in abort () from /lib64/libc.so.6
#5  0x0000000000c795ad in my_server_abort () at /home/jlopusza/local-drive/mysql/sql/signal_handler.cc:286
#6  0x00000000020adfca in my_abort () at /home/jlopusza/local-drive/mysql/mysys/my_init.cc:261
#7  0x00000000022c28e2 in ut_dbg_assertion_failed(char const*, char const*, unsigned long) () at /home/jlopusza/local-drive/mysql/storage/innobase/ut/ut0dbg.cc:99
#8  0x00000000022baf9c in trx_start_if_not_started_xa_low () at /home/jlopusza/local-drive/mysql/storage/innobase/trx/trx0trx.cc:3367
#9  0x000000000224b9e8 in trx_start_if_not_started () at /home/jlopusza/local-drive/mysql/storage/innobase/include/trx0trx.h:1304
#10 row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) () at /home/jlopusza/local-drive/mysql/storage/innobase/row/row0sel.cc:4731
#11 0x00000000021103e7 in ha_innobase::index_read () at /home/jlopusza/local-drive/mysql/storage/innobase/handler/ha_innodb.cc:10272
#12 0x00000000010e4061 in index_read_map () at /home/jlopusza/local-drive/mysql/sql/handler.h:5490
#13 handler::ha_index_read_map () at /home/jlopusza/local-drive/mysql/sql/handler.cc:3256
#14 0x00000000010e4f76 in handler::read_range_first () at /home/jlopusza/local-drive/mysql/sql/handler.cc:7344
#15 0x00000000020df423 in ha_innobase::read_range_first(key_range const*, key_range const*, bool, bool) () at /home/jlopusza/local-drive/mysql/storage/innobase/handler/ha_innodb.cc:10760
#16 0x00000000010e5545 in handler::multi_range_read_next () at /home/jlopusza/local-drive/mysql/sql/handler.cc:6505
#17 0x00000000010e1ad8 in handler::ha_multi_range_read_next () at /home/jlopusza/local-drive/mysql/sql/handler.cc:6443
#18 0x00000000012f7eb8 in IndexRangeScanIterator::Read () at /home/jlopusza/local-drive/mysql/sql/range_optimizer/index_range_scan.cc:365
#19 0x000000000134d825 in Sql_cmd_delete::delete_from_single_table(THD*) () at /home/jlopusza/local-drive/mysql/sql/sql_delete.cc:591
#20 0x0000000000ec3864 in Sql_cmd_dml::execute(THD*) () at /home/jlopusza/local-drive/mysql/sql/sql_select.cc:788
#21 0x0000000000e6a6f3 in mysql_execute_command(THD*, bool) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:4803
#22 0x0000000000e6e504 in dispatch_sql_command(THD*, Parser_state*) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:5453
#23 0x0000000000e70f36 in dispatch_command(THD*, COM_DATA const*, enum_server_command) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:2118
#24 0x0000000000e71a77 in do_command () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:1459
#25 0x0000000000fc3988 in handle_connection () at /home/jlopusza/local-drive/mysql/sql/conn_handler/connection_handler_per_thread.cc:303
#26 0x0000000002729a45 in pfs_spawn_thread () at /home/jlopusza/local-drive/mysql/storage/perfschema/pfs.cc:3043
#27 0x00007f7b44173ea5 in start_thread () from /lib64/libpthread.so.0
#28 0x00007f7b4252eb2d in clone () from /lib64/libc.so.6
```
This is an `ut_error` in `trx_start_if_not_started_xa_low` after the switch(trx->state) has seen `TRX_STATE_PREPARED` or `TRX_STATE_COMMITTED_IN_MEMORY`.

Posted by developer:
 
Please note that above was for Release build. In Debug build we crash in `trx_undo_gtid_add_update_undo(..)`:
```
  /* In rare cases we might find no available update undo segment for insert
  only transactions. It is still fine to return error at prepare stage.
  Cannot do it earlier as GTID information is not known before. Keep the
  debug assert to know if it really happens ever. */
  if (db_err != DB_SUCCESS) {
    trx->persists_gtid = false;
    ib::error(ER_IB_CLONE_GTID_PERSIST)
        << "Could not allocate undo segment"
        << " slot for persisting GTID. DB Error: " << db_err;
    ut_d(ut_error); <--------------------------------------------------- HERE
  }
```
In Release we continue hoping that somebody will handle db_err.