Bug #109620 Slave_skip_errors not skipping the error 1399 on MySQL 5.7.38,5.7.39,5.7.40
Submitted: 13 Jan 2023 10:11 Modified: 10 Jan 2024 17:19
Reporter: sravan gogana Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.38, 5.7.39, 5.7.40 OS:Red Hat
Assigned to: CPU Architecture:Any

[13 Jan 2023 10:11] sravan gogana
Description:
Error code 1399 is not getting skipped with parameter "slave skip errors".

Last_SQL_Errno: 1399 Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the NON-EXISTING state'

Configured master slave binlog replication on versions 5.7.38 and 5.7.40 .
When the replica encountered the error 1399, its not getting skip based on the parameter slave-skip-errors=1399 following with a mysql restart.

But the error can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; SLAVE START; 

Tested on versions 5.7.38, 5.7.40.

Note: The main concern is not here finding why the 1399 error triggers but why the error 1399 is not getting skipped based on the parameter slave-skip-errors in cnf file.

How to repeat:
Repro steps:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=378263522945796&parent=EXT...

https://bugs.mysql.com/bug.php?id=99045                                                                     
https://bugs.mysql.com/bug.php?id=86819
https://bugs.mysql.com/bug.php?id=83295

I generated below errors on the replica and all the errros 1032, 1062, 1397, 3227 are getting skipped based on the slave-skip-errors parameter from cnf file, but error code 1399 is not getting skipped.

Last_SQL_Errno: 1032
 Last_SQL_Error: Could not execute Update_rows event on table test.test1; Can't find record in 'test1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000004, end_log_pos 1066

Last_SQL_Errno: 1062
Could not execute Write_rows event on table

 Last_Errno: 1397
 Last_Error: Error 'XAER_NOTA: Unknown XID' on query. Default database: 'test'. Query: 'XA ROLLBACK X'31',X'',1'
                 Skip_Counter: 0

Last_Errno: 3227
Last_Error: Error in Xid_log_event: Commit could not be completed, 'The use of replication filters with XA transactions is not supported, and can lead to an undefined state in the replication slave.'

Last_SQL_Errno: 1399
Last_SQL_Error: Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state' on query. Default database: 'test'. Query: 'XA END X'31',X'',1'

From the error log post I got 1399 on replica,
[ERROR] Slave SQL for channel '': Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state' on query. Default database: 'test'. Query: 'XA END X'31',X'',1', Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state Error_code: 1399
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 3799.

Post then added slave-skip-errors=1399 on replica cnf file following with a reboot.

During the engine start up on replica,

[Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[Warning] Slave SQL for channel '': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
[Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000005' at position 3799, relay log '/var/log/mysql/mysql-relay-bin.000021' position: 699
[Note] Slave SQL for channel '': Could not execute Query event. Detailed error: XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state;, Error_code: 1399
[ERROR] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state', Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state Error_code: 1399
[Warning] Slave: XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state Error_code: 1399
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 3884.
[Note] Slave I/O thread for channel '': connected to master 'slave_user@IP:3306',replication started in log 'mysql-bin.000005' at position 3921
[Note] Event Scheduler: Loaded 0 events
[Note] /usr/sbin/mysqld: ready for connections.

The error still exists on replica post reboot as well and the parameter slave-skip-errors=1399 is not applying to skip the error.

 Last_SQL_Errno: 1399
               Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'XAER_RMFAIL: The command cannot be executed when global transaction is in the  NON-EXISTING state'
  Replicate_Ignore_Server_Ids: 

The error can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; 

Ask: Why the error code is not getting skipped with slave skip errors parameter but can be skipped with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
[13 Jan 2023 10:18] sravan gogana
5.7.38, 5.7.39, 5.7.40
[13 Jan 2023 13:50] MySQL Verification Team
Hi Mr. gogana,

Thank you for your bug report.

However, this bug report is a duplicate of the following one:

https://bugs.mysql.com/bug.php?id=99045

If you disagree with this conclusion, then send us your original test case which will be significantly different from all the sources that you have cited .....

Duplicate ......
[13 Jan 2023 14:15] sravan gogana
Hello,
I understand the cause for the error.
But the ask here is why the slave-skip-errors parameter is not skipping error 1399 ?
[13 Jan 2023 14:20] sravan gogana
Hello,
I'm not sure why you set the report status as duplicate as the ask is here on why only the 1399 error (1032,1062, and few other error codes skips fine) is not skipping when we set the slave-skip-errors=1399 in the mysql cnf file following with a reboot.
[19 Jan 2023 15:19] sravan gogana
Hello Team,
Can this be verified please.

The ask here is why the slave-skip-errors parameter is not skipping error 1399 ?
[10 Jan 2024 17:19] sravan gogana
Any update please
[14 Jan 2024 11:38] MySQL Verification Team
Please follow up the progres on the main report that this is duplicate of:
https://bugs.mysql.com/bug.php?id=99045