Bug #120125 xid_t::operator< breaks std::map ordering; binlog XA recovery may rollback prepared XA
Submitted: 23 Mar 2:43
Reporter: Zhang JiYang Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S2 (Serious)
Version:8.4 8.0.44 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, crash recovery, xa

[23 Mar 2:43] Zhang JiYang
Description:
[Summary]
xid_t::operator< violates strict weak ordering when XA format_id differs, so std::map<XID,…> used during binlog crash recovery can fail find(xid) for an inserted key; a prepared external XA that is already in the binlog may be rolled back incorrectly.

[Description]
Externally coordinated XA state recovered from the binary log is kept in std::map keyed by XID with std::less<XID> (sql/handler.h, Xa_state_list::list; populated in sql/binlog/recovery.cc, Binlog_recovery::m_external_xids). That relies on xid_t::operator< in sql/xa.cc.

The implementation returns true when this->get_format_id() < rhs.get_format_id() but does not return false when this->get_format_id() > rhs.get_format_id(). It then compares gtrid_length / bqual_length / payload. That breaks strict weak ordering (e.g. transitivity), so the map ordering is invalid and find() may miss an element that was inserted.

During crash recovery, xa::recovery::recover_one_external_trx() (sql/xa/recovery.cc) calls xa_list->find(xa_trx.id). If the lookup incorrectly returns NOT_FOUND while the binlog recorded XA PREPARE, the engine rolls the transaction back, contradicting the binlog.

How to repeat:
Requirements: Debug build (--with-debug / CMAKE_BUILD_TYPE=Debug), binary logging enabled, row binlog format for this suite combination. No custom code injection: uses existing CONDITIONAL_SYNC_POINT_FOR_TIMESTAMP("before_set_prepared_in_tc") in sql/tc_log.cc and standard MTR kill_mysqld.inc.

Add the following MTR files under mysql-test/suite/binlog/t/ and mysql-test/suite/binlog/r/ (or attach them to the bug).

Run:

cd mysql-test
./mysql-test-run.pl --suite=binlog --do-test=binlog_xa_external_xids_map_ordering
With the buggy comparator: the row variant fails on the first assertion (no matching error-log note for Successfully prepared 1 XA transaction; InnoDB recovery logs rolling back 1 XA). After fixing operator< as below, the test passes against the recorded .result.

File: suite/binlog/t/binlog_xa_external_xids_map_ordering.test

# ==== Purpose ====
#
# Reproducer for broken strict weak ordering of xid_t::operator< when
# format_id differs: std::map m_external_xids can contain an XID key but
# find() fails (transitivity violated by comparing format_id only on one
# side). Binlog recovery then treats a prepared-in-binlog XA as absent and
# rolls it back.
#
# Setup uses three external XIDs (A,B,C) with:
#   A: format 1, gtrid length 50
#   B: format 2, gtrid length 5
#   C: format 1, gtrid length 10
# so that A<B and B<C in the buggy comparator but C<A.
#
# A and B are fully committed so only C remains prepared after crash.
# Crash at before_set_prepared_in_tc for C (binlog already has XA PREPARE).
#
# Expected recovery: one prepared XA (C), error log reports
# "Successfully prepared 1 XA transaction", two committed rows visible.
# Requires xid_t::operator< to order format_id before gtrid_length (see sql/xa.cc).
#
# ==== References ====
#
# xid_t::operator< in sql/xa.cc
# binlog::Binlog_recovery::m_external_xids in sql/binlog/recovery.cc
#
--source include/not_valgrind.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/have_log_bin.inc
--source include/have_binlog_format_row.inc
let $messages = Found .* prepared XA transactions
.*Checksum mismatch in datafile.*;
--source include/suppress_messages.inc
CALL mtr.add_suppression("Statement is unsafe because it is being used inside a XA transaction");
CREATE TABLE t1 (c1 INT PRIMARY KEY);
--let $xid_a = `SELECT CONCAT("X'", REPEAT('aa',50), "',X'',1")`
--let $xid_b = `SELECT CONCAT("X'", REPEAT('bb',5), "',X'',2")`
--let $xid_c = `SELECT CONCAT("X'", REPEAT('cc',10), "',X'',1")`
--connect(con1, localhost, root,,)
--connection con1
--eval XA START $xid_a
INSERT INTO t1 VALUES (1);
--eval XA END $xid_a
--eval XA PREPARE $xid_a
--eval XA COMMIT $xid_a
--connect(con2, localhost, root,,)
--connection con2
--eval XA START $xid_b
INSERT INTO t1 VALUES (2);
--eval XA END $xid_b
--eval XA PREPARE $xid_b
--eval XA COMMIT $xid_b
--connect(con3, localhost, root,,)
--connection con3
--eval XA START $xid_c
INSERT INTO t1 VALUES (3);
--eval XA END $xid_c
--let $auxiliary_connection = default
--let $statement_connection = con3
--let $statement = XA PREPARE $xid_c
--let $sync_point = before_set_prepared_in_tc
--source include/execute_to_conditional_timestamp_sync_point.inc
--source include/kill_mysqld.inc
--source extra/xa_crash_safe_tests/cleanup_connection.inc
--source include/start_mysqld.inc
--let $assert_select = Successfully prepared 1 XA transaction
--source extra/xa_crash_safe_tests/assert_recovery_message.inc
--let $expected_prepared_xa_count = 1
--source extra/xa_crash_safe_tests/assert_xa_recover.inc
--let $expected_row_count = 2
--source extra/xa_crash_safe_tests/assert_row_count.inc
--eval XA ROLLBACK $xid_c
DROP TABLE t1;

Suggested fix:
[Root cause]
In sql/xa.cc, xid_t::operator< must define a strict weak ordering for use in std::map. When this->format_id > rhs.format_id, the code must not fall through to gtrid_length. Counterexample with the buggy comparator:

A: format_id=1, gtrid_len=50
B: format_id=2, gtrid_len=5
C: format_id=1, gtrid_len=10
Then A < B (1<2), B < C (after format check, 5<10), but C < A (10<50) while A < C is false → ordering is not transitive; a std::map can store all three keys yet find(C) fails (reproducible in a small standalone C++ program using the same comparison logic).

[Impact / Severity]
Wrong crash recovery: binlog indicates XA PREPARE / TC state PREPARED_IN_TC, but the server may roll back the prepared XA.
Data loss relative to binlog and inconsistent XA RECOVER vs binlog.
Trigger requires multiple external XAs with different format_id and specific gtrid lengths, plus a crash in the window after binlog flush, before set_prepared_in_tc in engines (same class as existing WL#11300 crash-safe XA tests).
Severity: S2 (Serious) — incorrect, silent recovery path.
[Suggested fix]
After the first format_id comparison, add the symmetric branch so format_id is totally ordered before gtrid_length / bqual_length / memcmp:

bool xid_t::operator<(const xid_t &rhs) const {
  if (this->get_format_id() < rhs.get_format_id()) return true;
  if (rhs.get_format_id() < this->get_format_id()) return false;
  if (this->get_gtrid_length() < rhs.get_gtrid_length()) {
    return true;
  }
  // ... remainder unchanged ...
}