Bug #110430 XA COMMIT may not persist GTID into undo header
Submitted: 20 Mar 7:11 Modified: 28 Mar 5:19
Reporter: Libing Song (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[20 Mar 7:11] Libing Song
Description:
After the server recovered from a crash, XA COMMIT will not persist GTID into undo with an error “[InnoDB] Could not persist GTID as space for GTID is not allocated.“, if crash happens just before set PREPARED_IN_TC flag. If crash happens, it may cause inconsistency between GTID and data.

It will hit an assertion failure in debug mode.

How to repeat:
--source include/have_debug.inc
--source include/have_binlog_format_row.inc
--source include/master-slave.inc

CREATE TABLE t1(c1 INT PRIMARY KEY, c2 varchar(100));

--source include/rpl_connection_slave.inc
--source include/stop_slave_io.inc
--source include/rpl_connection_master.inc

--echo #
--echo # Crash happened on XA PREPARE
--echo ##############################
--echo #
XA START "xid1";
INSERT INTO t1 VALUES(1, "insert1"),(2, "insert1"),(3, "insert1");
INSERT INTO t1 VALUES(4, "insert2");
UPDATE t1 SET c2 = "update1" WHERE c1 > 2;
DELETE FROM t1 WHERE c1 = 2;
XA END "xid1";
--source include/expect_crash.inc
SET debug = "d,crash_commit_after_log";
--error 2013 # Lost connection
XA PREPARE "xid1";

--let $rpl_server_number = 1
--source include/rpl_start_server.inc
XA RECOVER;
XA COMMIT "xid1";
SELECT * FROM t1;

Suggested fix:
The reason is that undo->m_gtid_storage is not initialized correctly after server restart. The undo header did have reserved space for GTIDs before crash. But since GTID of XA PREPARE was not written into undo header yet. m_gtid_storage is initialized as NONE which implies that there is no space for GTID. Thus XA COMMIT skips the step to persist its GTID into undo header.
[20 Mar 9:51] MySQL Verification Team
Hello Libing,

Thank you for the report and feedback.

regards,
Umesh
[27 Mar 9:57] Sven Sandberg
Posted by developer:
 
Thanks for the bug report, Libing! However, I only see expected behavior in the test case, and no crash in debug mode:

[...]
SET debug = "d,crash_commit_after_log";
XA PREPARE "xid1";
ERROR HY000: Lost connection to MySQL server during query
include/rpl_start_server.inc [server_number=1]
XA RECOVER;
formatID	gtrid_length	bqual_length	data
1	4	0	xid1
XA COMMIT "xid1";
SELECT * FROM t1;
c1	c2
1	insert1
3	update1
4	update1

From the output above, it looks like the transaction was successfully recovered to prepared state, and then XA RECOVER reported the transaction as expected, and then XA COMMIT succeeded and made the changes persistent as expected.

Which 8.0.X version are you using? Please make sure you use the latest, and paste the observed output from the test case. Thank you!

(FWIW, I think rpl_gtid_xa_prepare_failure_before_prepare_2nd_phase.test is an equivalent scenario.)
[28 Mar 0:38] Libing Song
Hi Sven,

I tested on 8.0.32. rpl_gtid_xa_prepare_failure_before_prepare_2nd_phase.test could not repeat the bug. Since there is only a INSERT into transaction. You need a DELETE or UPDATE to make an UPDATE undo log if you want to repeat the bug with rpl_gtid_xa_prepare_failure_before_prepare_2nd_phase.test.
[28 Mar 0:38] Libing Song
Hi Sven,

I tested on 8.0.32. rpl_gtid_xa_prepare_failure_before_prepare_2nd_phase.test could not repeat the bug. Since there is only a INSERT into transaction. You need a DELETE or UPDATE to make an UPDATE undo log if you want to repeat the bug with rpl_gtid_xa_prepare_failure_before_prepare_2nd_phase.test.
[28 Mar 0:42] Libing Song
Fix for the bug.

Attachment: bug110430.diff (application/octet-stream, text), 8.53 KiB.

[28 Mar 0:42] Libing Song
Fix for the bug.

Attachment: bug110430.diff (application/octet-stream, text), 8.53 KiB.

[28 Mar 0:43] Libing Song
Fix for the bug.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug110430.diff (application/octet-stream, text), 8.53 KiB.

[30 Mar 2:49] huahua xu
Hi Sven,

I had repeated the bug on 8.0.32.

2023-03-27T02:50:06.879608Z 8 [ERROR] [MY-011980] [InnoDB] Could not persist GTID as space for GTID is not allocated.
2023-03-27T02:50:06.881108Z 8 [ERROR] [MY-013183] [InnoDB] Assertion failure: trx0undo.cc:660 thread 16152
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2023-03-27T02:50:06Z UTC - mysqld got exception 0x16 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x20cd86e0160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff7525f0d03    mysqld.exe!my_print_stacktrace()[stacktrace.cc:429]
7ff7507a29e6    mysqld.exe!print_fatal_signal()[signal_handler.cc:158]
7ff7507a2dcf    mysqld.exe!my_server_abort()[signal_handler.cc:269]
7ff7525a9e6a    mysqld.exe!my_abort()[my_init.cc:263]
7ff752b8d16d    mysqld.exe!ut_dbg_assertion_failed()[ut0dbg.cc:99]
7ff752fa6cb5    mysqld.exe!trx_undo_gtid_set()[trx0undo.cc:663]
7ff752bfec7b    mysqld.exe!trx_write_serialisation_history()[trx0trx.cc:1594]
7ff752bf4655    mysqld.exe!trx_commit_low()[trx0trx.cc:2140]
7ff752bf43f0    mysqld.exe!trx_commit()[trx0trx.cc:2217]
7ff752bf4d9b    mysqld.exe!trx_commit_for_mysql()[trx0trx.cc:2427]
7ff75276d9a4    mysqld.exe!innobase_commit_low()[ha_innodb.cc:5681]
7ff752798aa4    mysqld.exe!innobase_commit_by_xid()[ha_innodb.cc:20082]
7ff75026d06e    mysqld.exe!`anonymous namespace'::commit_one_ht()[tc_log.cc:189]
7ff7502da037    mysqld.exe!plugin_foreach_with_mask()[sql_plugin.cc:2760]
7ff7502d9a6e    mysqld.exe!plugin_foreach_with_mask()[sql_plugin.cc:2774]
7ff750268d87    mysqld.exe!trx_coordinator::commit_detached_by_xid()[tc_log.cc:115]
7ff75026906a    mysqld.exe!trx_coordinator::commit_in_engines()[tc_log.cc:144]
7ff752008578    mysqld.exe!`anonymous namespace'::finish_transaction_in_engines()[binlog.cc:11560]
7ff751fea341    mysqld.exe!MYSQL_BIN_LOG::process_commit_stage_queue()[binlog.cc:8517]
7ff751feba56    mysqld.exe!MYSQL_BIN_LOG::ordered_commit()[binlog.cc:9037]
7ff751fefdcf    mysqld.exe!MYSQL_BIN_LOG::commit()[binlog.cc:8281]
7ff750a36c2b    mysqld.exe!Sql_cmd_xa_commit::process_detached_xa_commit()[sql_xa_commit.cc:205]
7ff750a36206    mysqld.exe!Sql_cmd_xa_commit::trans_xa_commit()[sql_xa_commit.cc:88]
7ff750a360bd    mysqld.exe!Sql_cmd_xa_commit::execute()[sql_xa_commit.cc:63]
7ff7506db638    mysqld.exe!mysql_execute_command()[sql_parse.cc:4688]
7ff7506d49f6    mysqld.exe!dispatch_sql_command()[sql_parse.cc:5322]
7ff7506e01b6    mysqld.exe!dispatch_command()[sql_parse.cc:2040]
7ff7506dd75e    mysqld.exe!do_command()[sql_parse.cc:1439]
7ff7502b40ae    mysqld.exe!handle_connection()[connection_handler_per_thread.cc:302]
7ff753559a6f    mysqld.exe!pfs_spawn_thread()[pfs.cc:2988]
7ff7525c5cfc    mysqld.exe!win_thread_start()[my_thread.cc:73]
7fff3ffd542c    ucrtbased.dll!_register_onexit_function()
7fffab2a7034    KERNEL32.DLL!BaseThreadInitThunk()
7fffab802651    ntdll.dll!RtlUserThreadStart()

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (20cd86e71b0): XA COMMIT "xid1"
Connection ID (thread ID): 8
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Additionally, the XA transaction will recover 'successfully' after mysqld restart again, but its both Prepare and Commit gtid informations are missed in the undo header.