Bug #78211 ER_GTID_NEXT_TYPE_UNDEFINED_GROUP ON BINLOGLESS SLAVE AFTER IO THREAD RECONNECT
Submitted: 25 Aug 2015 16:22 Modified: 27 Nov 2015 17:19
Reporter: João Gramacho Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[25 Aug 2015 16:22] João Gramacho
Description:
The IO thread is fully retrieving a transaction on reconnection.

The SQL thread is supposed to rollback the partial transaction (the events retrieved before the reconnection) and start applying it again from the next relay log file. The problem is that the SQL thread is failing to rollback the partially executed transaction when the slave is configured to not use binary logging.

How to repeat:
Apply the following diff:
========================

diff --git a/mysql-test/suite/rpl/t/rpl_mts_execute_partial_trx_with_auto_pos_on-slave.opt b/mysql-test/suite/rpl/t/rpl_mts_execute_partial_trx_with_auto_pos_on-slave.opt
index 0367a27..c24f91c 100644
--- a/mysql-test/suite/rpl/t/rpl_mts_execute_partial_trx_with_auto_pos_on-slave.opt
+++ b/mysql-test/suite/rpl/t/rpl_mts_execute_partial_trx_with_auto_pos_on-slave.opt
@@ -1 +1,3 @@
 --slave-transaction-retries=0
+--disable-log-bin
+--disable-log-slave-updates

And run the following mtr test case:
===================================
--mem --debug --mysqld=--enforce-gtid-consistency --mysqld=--log-slave-updates --mysqld=--gtid-mode=on rpl_mts_execute_partial_trx_with_auto_pos_on

Suggested fix:
This is just a suggestion, it was not extensively tested:

diff --git a/sql/log_event.cc b/sql/log_event.cc
index 36f10ab..18dd4a7 100644
--- a/sql/log_event.cc
+++ b/sql/log_event.cc
@@ -13008,7 +13008,8 @@ int Gtid_log_event::do_apply_event(Relay_log_info const *rli)
       the partial transaction being logged with the GTID on the slave,
       causing data corruption on replication.
     */
-    if (thd->get_transaction()->is_active(Transaction_ctx::SESSION))
+    if (thd->get_transaction()->is_active(Transaction_ctx::SESSION) ||
+        thd->server_status & SERVER_STATUS_IN_TRANS)
     {
       /* This is not an error (XA is safe), just an information */
       rli->report(INFORMATION_LEVEL, 0,
[27 Nov 2015 17:19] David Moss
The following was noted in the 5.7.11 change log:
When a slave was configured with log_bin=OFF, the applier (SQL) thread was failing to correctly roll back partial transactions left in the relay log. The fix ensures that on reconnection, the applier thread correctly rolls back a partial transaction and starts applying it again from the next relay log file.