Description:
Error code 1399 is not getting skipped with parameter "slave skip errors".
Last_SQL_Errno: 1399 Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state'
Configured master slave binlog replication on versions 5.7.38 and 5.7.40 .
When the replica encountered the error 1399, its not getting skip based on the parameter slave-skip-errors=1399 following with a mysql restart.
But the error can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; SLAVE START;
Tested on versions 5.7.38, 5.7.40.
Note: The main concern is not here finding why the 1399 error triggers but why the error 1399 is not getting skipped based on the parameter slave-skip-errors in cnf file.
How to repeat:
Repro steps:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=378263522945796&parent=EXT...
https://bugs.mysql.com/bug.php?id=99045
https://bugs.mysql.com/bug.php?id=86819
https://bugs.mysql.com/bug.php?id=83295
I generated below errors on the replica and all the errros 1032, 1062, 1397, 3227 are getting skipped based on the slave-skip-errors parameter from cnf file, but error code 1399 is not getting skipped.
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table test.test1; Can't find record in 'test1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000004, end_log_pos 1066
Last_SQL_Errno: 1062
Could not execute Write_rows event on table
Last_Errno: 1397
Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'test'. Query: 'XA ROLLBACK X'31',X'',1'
Skip_Counter: 0
Last_Errno: 3227
Last_Error: Error in Xid_log_event: Commit could not be completed, 'The use of replication filters with XA transactions is not supported, and can lead to an undefined state in the replication slave.'
Last_SQL_Errno: 1399
Last_SQL_Error: Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state' on query. Default database: 'test'. Query: 'XA END X'31',X'',1'
From the error log post I got 1399 on replica,
[ERROR] Slave SQL for channel '': Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state' on query. Default database: 'test'. Query: 'XA END X'31',X'',1', Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state Error_code: 1399
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 3799.
Post then added slave-skip-errors=1399 on replica cnf file following with a reboot.
During the engine start up on replica,
[Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[Warning] Slave SQL for channel '': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
[Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000005' at position 3799, relay log '/var/log/mysql/mysql-relay-bin.000021' position: 699
[Note] Slave SQL for channel '': Could not execute Query event. Detailed error: XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state;, Error_code: 1399
[ERROR] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state', Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state Error_code: 1399
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 3884.
[Note] Slave I/O thread for channel '': connected to master 'slave_user@IP:3306',replication started in log 'mysql-bin.000005' at position 3921
[Note] Event Scheduler: Loaded 0 events
[Note] /usr/sbin/mysqld: ready for connections.
The error still exists on replica post reboot as well and the parameter slave-skip-errors=1399 is not applying to skip the error.
Last_SQL_Errno: 1399
Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state'
Replicate_Ignore_Server_Ids:
The error can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
Ask: Why the error code is not getting skipped with slave skip errors parameter but can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;