Bug #78281 Big insert to MyISAM table causing SQL thread stop under 5.6 GTID
Submitted: 31 Aug 2015 2:45 Modified: 31 Aug 2015 6:47
Reporter: Fungo Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.16, 5.6.26 OS:Red Hat
Assigned to: CPU Architecture:Any
Tags: GTID, relay_log_info_repository, replication

[31 Aug 2015 2:45] Fungo Wang
Description:
When we using row binlog format, and do a big insert on master, the corresponding row events are too big to fit in ONE relay log on slave, i.e. split into TWO(or more) relay logs. The SQL thread complains with error 1837 as below:

Last_Errno: 1837
               Last_Error: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is '01985921-4f84-11e5-9408-a0d3c1f20ae4:10003'.

there are 4 requirements to repeat:
1. gtid_mode=ON
2. binlog_format=ROW
3. big update to MyISAM table
4. relay_log_info_repository=TABLE

How to repeat:
//test case as below
--source include/master-slave.inc
--source include/have_binlog_format_row.inc

connection master;
create table base(id int, name varchar(3000)) engine = MyISAM;
create table t1(id int, name varchar(3000)) engine = MyISAM;

let $i=0;
--disable_query_log
while ($i<10000)
{
  inc $i;
  eval INSERT INTO base VALUES ($i, repeat('go for fun! ', 100));
}
--enable_query_log
--sync_slave_with_master

## create a big insert statement
connection master;
insert into t1 select * from base;

drop table base;
drop table t1;

## uncomment and connect to slave to see error
## sleep 300;

--sync_slave_with_master
--source include/rpl_end.inc

// master.opt
--enforce_gtid_consistency=ON
--gtid_mode=ON
--binlog_format=ROW
--log-slave-updates

// slave.opt
--enforce_gtid_consistency=ON
--gtid_mode=ON
--binlog_format=ROW
--log-slave-updates
--max_relay_log_size=5M
--master_info_repository=TABLE
--relay_log_info_repository=TABLE

Suggested fix:
I think the reason is below stack:

(gdb) bt
#0  MYSQL_BIN_LOG::commit (this=0x20859c0, thd=0x2b65c0000a50, all=false) at /path/to/mysql-server/sql/binlog.cc:6480
#1  0x0000000000666461 in ha_commit_trans (thd=0x2b65c0000a50, all=false, ignore_global_read_lock=true) at /path/to/mysql-server/sql/handler.cc:1436
#2  0x0000000000da37cd in Rpl_info_table_access::close_table (this=0x2350ba0, thd=0x2b65c0000a50, table=0x2555930, backup=0x2b65bc03f4e0, error=false) at /path/to/mysql-server/sql/rpl_info_table_access.cc:163
#3  0x0000000000da0e5e in Rpl_info_table::do_flush_info (this=0x24a8ed0, force=true) at /path/to/mysql-server/sql/rpl_info_table.cc:238
#4  0x0000000000d81c38 in Rpl_info_handler::flush_info (this=0x24a8ed0, force=true) at /path/to/mysql-server/sql/rpl_info_handler.h:92
#5  0x0000000000d8c37e in Relay_log_info::flush_info (this=0x2540c50, force=true) at /path/to/mysql-server/sql/rpl_rli.cc:2032
#6  0x0000000000d16f84 in MYSQL_BIN_LOG::purge_first_log (this=0x2541390, rli=0x2540c50, included=false) at /path/to/mysql-server/sql/binlog.cc:4086
#7  0x0000000000d777ba in next_event (rli=0x2540c50) at /path/to/mysql-server/sql/rpl_slave.cc:7794
#8  0x0000000000d65782 in exec_relay_log_event (thd=0x2b65c0000a50, rli=0x2540c50) at /path/to/mysql-server/sql/rpl_slave.cc:4176
#9  0x0000000000d6f627 in handle_slave_sql (arg=0x234e550) at /path/to/mysql-server/sql/rpl_slave.cc:6115
#10 0x000000000139c7f2 in pfs_spawn_thread (arg=0x2b65b8008d00) at /path/to/mysql-server/storage/perfschema/pfs.cc:1860
#11 0x00002b6561dda851 in start_thread () from /lib64/libpthread.so.0
#12 0x00002b65629a067d in clone () from /lib64/libc.so.6
(gdb) p cache_mngr->stmt_cache.is_binlog_empty()
$4 = false
(gdb) p stuff_logged
$5 = true

Before the SQL thread rotate to next relay log, MYSQL_BIN_LOG::purge_first_log is called. Under TABLE repository configuration, mysqld try to do a commit, and the stmt_cache is not empty, a binlog commit happens which consumes the GTID_NEXT,but the insert statement is not finished yet. After SQL thread rotates to next relay log, and trying to replay the next half insert statement, GTID_NEXT is already undefined, so complains no valid GTID.

Maybe SQL thread should not call Relay_log_info::flush_info when purging relay log.
[31 Aug 2015 6:47] MySQL Verification Team
Hello Fungo Wang,

Thank you for the report and test case.
Observed this with 5.6.26 build.

2015-08-31 09:11:32 7602 [ERROR] Slave SQL: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable
 manual page for detailed explanation. Current @@SESSION.GTID_NEXT is '1f2b42b8-4fa7-11e5-94ed-0010e05f3e06:10003'. Error_code: 1837
2015-08-31 09:11:32 7602 [Warning] Slave: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable m
anual page for detailed explanation. Current @@SESSION.GTID_NEXT is '1f2b42b8-4fa7-11e5-94ed-0010e05f3e06:10003'. Error_code: 1837
2015-08-31 09:11:32 7602 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 14850515

Thanks,
Umesh
[4 Apr 2016 13:18] monty solomon
We got a similar error after converting a MyISAM table to InnoDB on a slave and performing a master/slave swap. The error occurred on all of the slaves that had the MyISAM table.

Unlike the original reporter,  binlog_format=MIXED and the table update was not big.

               Last_SQL_Errno: 1837
               Last_SQL_Error: Error 'When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'e7888981-f2dd-11e5-bd37-0ea2a07d0529:9'.' on query. Default database: 'Email'. Query: 'INSERT IGNORE INTO eventDetails ...
[4 Apr 2016 16:27] monty solomon
Was this issue fixed as part of bug # 68525?