Bug #106628 | Tencent Cloud-Native Database team: trx_t with PREPARED state in trx_pools | ||
---|---|---|---|
Submitted: | 3 Mar 2022 7:30 | Modified: | 3 Mar 2022 8:58 |
Reporter: | hao wu (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0.28 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[3 Mar 2022 7:30]
hao wu
[3 Mar 2022 7:31]
hao wu
fix.patch
Attachment: fix.patch (application/octet-stream, text), 619 bytes.
[3 Mar 2022 8:58]
MySQL Verification Team
Hello hao wu, Thank you for the report and contribution. regards, Umesh
[3 Mar 2022 8:58]
MySQL Verification Team
Thank you very much for your patch contribution, we appreciate it! In order for us to continue the process of reviewing your contribution to MySQL, please send us a signed copy of the Oracle Contributor Agreement (OCA) as outlined in https://oca.opensource.oracle.com Signing an OCA needs to be done only once and it's valid for all other Oracle governed Open Source projects as well. Getting a signed/approved OCA on file will help us facilitate your contribution - this one, and others in the future. Please let me know, if you have any questions. Thank you for your interest in MySQL.
[4 Mar 2022 1:26]
hao wu
fix.patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: fix.patch (application/octet-stream, text), 619 bytes.
[4 Aug 2023 11:28]
Jakub Lopuszanski
Posted by developer: Here's a testcase to reproduce the issue. Unfortunately, by default ./mtr has a limit of `--max-connections=128`, which can't be changed from the level of *-master.opt file (it is a limit of the test-runner itself set with the same name as the mysqld variable, which *can* and *must* be changed from *-master.opt file). The test requires at least ~260 connections to be spawned to exhaust the number of available Undo Slots. There are UNIV_PAGE_SIZE/16 Undo Slots in each Rollback Segment. There are --innodb_rollback_segments Rollback Segments in each Undo Tablespace. There are 2 redo-logged Undo Tablespaces by default. So, using the minimal page size, --innodb_page_size=4k, w get 512 Undo Slots. Each transaction may have two redo-logged Undo Logs attached: one for INSERTs, the other for other operations (aka "Update Undo Log"). So, to exhaust 512 all Undo Slots, we need at least 256 transactions. mysql-test/suite/innodb/t/too_many_trxs-master.opt: ``` --initialize=--innodb_page_size=4k --initialize=--innodb_rollback_segments=1 --max-connections=1000 --gtid_mode=ON_PERMISSIVE ``` mysql-test/suite/innodb/t/too_many_trxs-master.opt: ``` CREATE TABLE t (id INT PRIMARY KEY); SELECT @@innodb_page_size, @@innodb_rollback_segments, @@max_connections; # 4K / 16 per Undo Slots per Rollback Segment # 1 Rollback Segment per Undo Tablespace # 2 default redo-logged Undo Tablespaces # 4k / 16 * 1 * 2 = 512 # We gonna create 300 connections, alloc 1 Insert Undo Log first, which # should succeed even if there are some background transactions, and # then for each of them alloc also 1 Update Undo Log, which should fail --let $n=300 --let $i=0 while($i < $n) { --echo Connecting $i-th client and assigning Insert Undo Segment --connect (con$i,localhost,root,,) --eval XA START 'trx$i' --eval INSERT INTO t VALUES ($i) # Should not see uncommitted inserts of others SELECT COUNT(*) FROM t; --inc $i } --let $i=0 while($i < $n) { --echo Preparing $i-th client and allocating Update Undo Segment --connection con$i # --eval DELETE FROM t WHERE id=$i --eval XA END 'trx$i' --eval XA PREPARE 'trx$i' --inc $i } --let $i=0 while($i < $n) { --echo Committing $i-th client --connection con$i --eval XA COMMIT 'trx$i' --disconnect con$i --inc $i } --connection default SELECT COUNT(*) FROM t; DROP TABLE t; ```
[4 Aug 2023 12:04]
Jakub Lopuszanski
Posted by developer: The test above shows what happens if the Undo Slots are exhausted during XA PREPARE, while the bug is about XA COMMIT. Both are buggy. In case of XA PREPARE, it looks like MYSQL_BIN_LOG::process_commit_stage_queue ignores the return value from trx_coordinator::set_prepared_in_tc_in_engines(head, all), which seems to be 177 if DB_TOO_MANY_CONCURRENT_TRXS is reported from trx_undo_gtid_add_update_undo. So, the client sees just: ``` mysql> XA PREPARE 'b'; Query OK, 0 rows affected (5 min 51.57 sec) ``` so not even a warning, while in the error log there's: ``` 2023-08-04T11:41:52.968777Z 8 [ERROR] [MY-011980] [InnoDB] Could not allocate undo segment slot for persisting GTID. DB Error: 47 ``` I'd say it's a bug in Server layer, so I'll file it separately. To reproduce scenario where Undo Slots are exhausted during COMMIT (as opposed to PREPARE) we need to use XA COMMIT .. ONE PHASE, instead. This might be a bit more tricky, because as we commit the transactions, their undo logs get available for reuse by the others, due to the Undo Segment caching machanism. So to keep the Undo Slots occupied, we need to start a new transaction in the place on the committed one, and ensure it reuses the Undo Segment: ``` --let $i=0 while($i < $n) { --echo Preparing $i-th client and allocating Update Undo Segment --connection con$i --eval XA END 'trx$i' --eval XA COMMIT 'trx$i' ONE PHASE --eval XA BEGIN 'trx$i' --eval DELETE FROM t WHERE id=$i --inc $i } ``` This seems to fail in the way consistent with the bug report: on an assert that a transaction which we try to start, should not be in prepared state, but it is: ``` Preparing 211-th client and allocating Update Undo Segment XA END 'trx211'; XA COMMIT 'trx211' ONE PHASE; XA BEGIN 'trx211'; DELETE FROM t WHERE id=211 ; safe_process[2853]: Child process: 2854, exit: 1 [...] 2023-08-04T11:59:05.903174Z 221 [ERROR] [MY-011980] [InnoDB] Could not allocate undo segment slot for persisting GTID. DB Error: 47 2023-08-04T11:59:05.903268Z 221 [ERROR] [MY-013183] [InnoDB] Assertion failure: trx0trx.cc:3367 thread 140155122390784 [...] Thread 1 (Thread 0x7f7868489700 (LWP 4024)): #0 0x00007f7b44178aa1 in pthread_kill () from /lib64/libpthread.so.0 #1 0x0000000000fd2f7d in handle_fatal_signal () at /home/jlopusza/local-drive/mysql/sql/signal_handler.cc:230 #2 <signal handler called> #3 0x00007f7b42466387 in raise () from /lib64/libc.so.6 #4 0x00007f7b42467a78 in abort () from /lib64/libc.so.6 #5 0x0000000000c795ad in my_server_abort () at /home/jlopusza/local-drive/mysql/sql/signal_handler.cc:286 #6 0x00000000020adfca in my_abort () at /home/jlopusza/local-drive/mysql/mysys/my_init.cc:261 #7 0x00000000022c28e2 in ut_dbg_assertion_failed(char const*, char const*, unsigned long) () at /home/jlopusza/local-drive/mysql/storage/innobase/ut/ut0dbg.cc:99 #8 0x00000000022baf9c in trx_start_if_not_started_xa_low () at /home/jlopusza/local-drive/mysql/storage/innobase/trx/trx0trx.cc:3367 #9 0x000000000224b9e8 in trx_start_if_not_started () at /home/jlopusza/local-drive/mysql/storage/innobase/include/trx0trx.h:1304 #10 row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) () at /home/jlopusza/local-drive/mysql/storage/innobase/row/row0sel.cc:4731 #11 0x00000000021103e7 in ha_innobase::index_read () at /home/jlopusza/local-drive/mysql/storage/innobase/handler/ha_innodb.cc:10272 #12 0x00000000010e4061 in index_read_map () at /home/jlopusza/local-drive/mysql/sql/handler.h:5490 #13 handler::ha_index_read_map () at /home/jlopusza/local-drive/mysql/sql/handler.cc:3256 #14 0x00000000010e4f76 in handler::read_range_first () at /home/jlopusza/local-drive/mysql/sql/handler.cc:7344 #15 0x00000000020df423 in ha_innobase::read_range_first(key_range const*, key_range const*, bool, bool) () at /home/jlopusza/local-drive/mysql/storage/innobase/handler/ha_innodb.cc:10760 #16 0x00000000010e5545 in handler::multi_range_read_next () at /home/jlopusza/local-drive/mysql/sql/handler.cc:6505 #17 0x00000000010e1ad8 in handler::ha_multi_range_read_next () at /home/jlopusza/local-drive/mysql/sql/handler.cc:6443 #18 0x00000000012f7eb8 in IndexRangeScanIterator::Read () at /home/jlopusza/local-drive/mysql/sql/range_optimizer/index_range_scan.cc:365 #19 0x000000000134d825 in Sql_cmd_delete::delete_from_single_table(THD*) () at /home/jlopusza/local-drive/mysql/sql/sql_delete.cc:591 #20 0x0000000000ec3864 in Sql_cmd_dml::execute(THD*) () at /home/jlopusza/local-drive/mysql/sql/sql_select.cc:788 #21 0x0000000000e6a6f3 in mysql_execute_command(THD*, bool) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:4803 #22 0x0000000000e6e504 in dispatch_sql_command(THD*, Parser_state*) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:5453 #23 0x0000000000e70f36 in dispatch_command(THD*, COM_DATA const*, enum_server_command) () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:2118 #24 0x0000000000e71a77 in do_command () at /home/jlopusza/local-drive/mysql/sql/sql_parse.cc:1459 #25 0x0000000000fc3988 in handle_connection () at /home/jlopusza/local-drive/mysql/sql/conn_handler/connection_handler_per_thread.cc:303 #26 0x0000000002729a45 in pfs_spawn_thread () at /home/jlopusza/local-drive/mysql/storage/perfschema/pfs.cc:3043 #27 0x00007f7b44173ea5 in start_thread () from /lib64/libpthread.so.0 #28 0x00007f7b4252eb2d in clone () from /lib64/libc.so.6 ``` This is an `ut_error` in `trx_start_if_not_started_xa_low` after the switch(trx->state) has seen `TRX_STATE_PREPARED` or `TRX_STATE_COMMITTED_IN_MEMORY`.
[4 Aug 2023 12:22]
Jakub Lopuszanski
Posted by developer: Please note that above was for Release build. In Debug build we crash in `trx_undo_gtid_add_update_undo(..)`: ``` /* In rare cases we might find no available update undo segment for insert only transactions. It is still fine to return error at prepare stage. Cannot do it earlier as GTID information is not known before. Keep the debug assert to know if it really happens ever. */ if (db_err != DB_SUCCESS) { trx->persists_gtid = false; ib::error(ER_IB_CLONE_GTID_PERSIST) << "Could not allocate undo segment" << " slot for persisting GTID. DB Error: " << db_err; ut_d(ut_error); <--------------------------------------------------- HERE } ``` In Release we continue hoping that somebody will handle db_err.