Bug #94884 slave breaks with error 1778 due to an incomplete transaction
Submitted: 3 Apr 8:49 Modified: 3 Jun 11:02
Reporter: Christian Roser Email Updates:
Status: Need Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.23, 5.7.25 OS:Debian
Assigned to: Bogdan Kecman CPU Architecture:x86
Tags: error_1778

[3 Apr 8:49] Christian Roser
Description:
I had a slave crashing due to an incomplete transaction in relaylog. It looks like it's the same error like described in #81119 except for that I'm not using multi source replication.

Error:
                   Last_Errno: 1778
                   Last_Error: Error 'Cannot execute statements with implicit commit inside a transaction when @@SESSION.GTID_NEXT == 'UUID:NUMBER'.' on query. Default database: 'xxxxx'. Query: 'BEGIN'

First relay log:
# at 64130581
#190403  0:59:50 server id 183837207  end_log_pos 64130473 CRC32 0x5ba8b612     GTID    last_committed=59709    sequence_number=59710   rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '42e63d21-e2ab-11e8-a1fd-001a4a15002a:7190468'/*!*/;
# at 64130646
#190403  0:59:50 server id 183837207  end_log_pos 64130549 CRC32 0x64c4cd4e     Query   thread_id=4155606       exec_time=0     error_code=0
SET TIMESTAMP=1554245990/*!*/;
SET @@session.pseudo_thread_id=4155606/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1434451976/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 64130722
#190403  0:59:50 server id 183837207  end_log_pos 64130614 CRC32 0xb244b533     Table_map: `xxxxx`.`tbl` mapped to number 1901827
# at 64130787
#190403  0:59:50 server id 183837207  end_log_pos 64155060 CRC32 0x97ce3a0a     Delete_rows: table id 1901827 flags: STMT_END_F
### DELETE FROM `xxxxx`.`tbl`
### WHERE
###
...
# at 64155233
#190320  7:07:29 server id 183829015  end_log_pos 64155280 CRC32 0xbcd10c19     Rotate to relay-bin.000095  pos: 4
ROLLBACK /* added by mysqlbinlog */ /*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

relay log rotation happened here

Next relay log:
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#190320  7:07:29 server id 183829015  end_log_pos 123 CRC32 0x99cb57b6  Start: binlog v 4, server v 5.7.23-log created 190320  7:07:29
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at 123
#190320  7:07:29 server id 183829015  end_log_pos 194 CRC32 0xf469853e  Previous-GTIDs
# 42e63d21-e2ab-11e8-a1fd-001a4a15002a:7699-7190467
# at 194
#700101  1:00:00 server id 183837207  end_log_pos 0 CRC32 0xc4feb34e    Rotate to mysql-bin.000040  pos: 4
# at 241
#190403  0:23:45 server id 183837207  end_log_pos 123 CRC32 0xbb52840d  Start: binlog v 4, server v 5.7.23-log created 190403  0:23:45
# at 360
#190320  7:07:29 server id 0  end_log_pos 407 CRC32 0x7dcf5cb7  Rotate to mysql-bin.000040  pos: 234
# at 407
#190320  7:07:29 server id 0  end_log_pos 454 CRC32 0x541815ba  Rotate to mysql-bin.000040  pos: 64130408
# at 454
#190403  0:59:50 server id 183837207  end_log_pos 64130473 CRC32 0x5ba8b612     GTID    last_committed=59709    sequence_number=59710   rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '42e63d21-e2ab-11e8-a1fd-001a4a15002a:7190468'/*!*/;
# at 519
#190403  0:59:50 server id 183837207  end_log_pos 64130549 CRC32 0x64c4cd4e     Query   thread_id=4155606       exec_time=0     error_code=0
SET TIMESTAMP=1554245990/*!*/;
SET @@session.pseudo_thread_id=4155606/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1434451976/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 595
#190403  0:59:50 server id 183837207  end_log_pos 64130614 CRC32 0xb244b533     Table_map: `xxxxx`.`tbl` mapped to number 1901827
# at 660
#190403  0:59:50 server id 183837207  end_log_pos 64155060 CRC32 0x97ce3a0a     Delete_rows: table id 1901827 flags: STMT_END_F
### DELETE FROM `xxxxx`.`tbl`
### WHERE
...
# at 25106
#190403  0:59:50 server id 183837207  end_log_pos 64155137 CRC32 0x759ae2fe     Query   thread_id=4155606       exec_time=0     error_code=0
SET TIMESTAMP=1554245990/*!*/;
COMMIT
/*!*/;

How can it happen that relay logs are rotated during an open transaction?

Even though it should have been fixed in 5.7.19 it still occurs from time to time in more modern versions (verified for 5.7.23)

How to repeat:
unfortunately I don't know

Suggested fix:
unfortunately I don't know
[4 Apr 12:23] Bogdan Kecman
Hi,

I cannot reproduce this issue with 5.7.23 nor 5.7.25. Can you give us the config files and any other info about your load so we can try to reproduce the problem.

thanks
Bogdan
[8 Apr 9:23] Christian Roser
The load on the system is very moderate, it's used as a webhosting database and serves applications like wordpress for example. It has around 700 databases and 400-500 queries per second.
[10 Apr 4:22] Christian Roser
Same issue occured twice since the bug was opened. Both times it was at 5.7.25
[17 Apr 20:34] Bogdan Kecman
Hi,

The root cause of BUG#81119 was not the use of multi-source replication, but GTID skipping a partial transaction. I am still analyzing your issue and will return back with some answers.

Thanks
Bogdan
[23 Apr 13:14] Christian Roser
Thanks a lot for the feedback so far. Btw.. it happened twice since my last comment.

regards
Christian
[3 Jun 11:02] Bogdan Kecman
Hi,

I am still not able to reproduce this on 5.7.26.

The dev team asked for GTID executed on the slave at the time of the error, to know if the offending transaction you mentioned was already applied or not.

Thanks
Bogdan