Bug #76959 Gaps in Retrieved_Gtid_Set while no gaps in Executed_Gtid_Set
Submitted: 6 May 2015 17:48 Modified: 7 Sep 2015 14:36
Reporter: Sveta Smirnova (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.24 OS:Any
Assigned to: CPU Architecture:Any

[6 May 2015 17:48] Sveta Smirnova
Description:
Under certain circumstances it is possible that Retrieved_Gtid_Set on slave will contain gaps while actually no gaps will be in Executed_Gtid_Set and slave binary logs.

This happens when slave rotates relay log in such a way that last event contains record which sets GTID_NEXT, then following log contains few GTIDs, then slave restarts. After restart GTIDs for Retrieved_Gtid_Set executed wrongly and there are phantom gaps.

How to repeat:
Modify sources as follow (this is against MySQL from GitHub):

diff --git a/sql/binlog.cc b/sql/binlog.cc
index a02b987..9adb8bd 100644
--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -5057,6 +5057,7 @@ bool MYSQL_BIN_LOG::after_append_to_relay_log(Master_info *mi)
         DBUG_EVALUATE_IF("rotate_slave_debug_group", 500, max_size))
     {
       error= new_file_without_locking(mi->get_mi_description_event());
+ DBUG_EXECUTE_IF("gtid_next_bug", if (0 == max_size) {max_size = 1073741824;});
     }
   }

@@ -5092,7 +5093,7 @@ bool MYSQL_BIN_LOG::append_event(Log_event* ev, Master_info *mi)
 }

-bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi)
+bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi, Log_event_type event_type)
 {
   DBUG_ENTER("MYSQL_BIN_LOG::append_buffer");

@@ -5101,6 +5102,8 @@ bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi)
   DBUG_ASSERT(is_relay_log);
   mysql_mutex_assert_owner(&LOCK_log);

+ DBUG_EXECUTE_IF("gtid_next_bug", if (GTID_LOG_EVENT == event_type) {max_size=0; fprintf(stderr, "executing\n");});
+
   // write data
   bool error= false;
   if (my_b_append(&log_file,(uchar*) buf,len) == 0)
diff --git a/sql/binlog.h b/sql/binlog.h
index 38efe64..4ecda34 100644
--- a/sql/binlog.h
+++ b/sql/binlog.h
@@ -607,7 +607,7 @@ public:
   bool is_query_in_union(THD *thd, query_id_t query_id_param);

 #ifdef HAVE_REPLICATION
- bool append_buffer(const char* buf, uint len, Master_info *mi);
+ bool append_buffer(const char* buf, uint len, Master_info *mi, Log_event_type event_type);
   bool append_event(Log_event* ev, Master_info *mi);
 private:
   bool after_append_to_relay_log(Master_info *mi);
diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc
index 30e85fc..b60e916 100644
--- a/sql/rpl_slave.cc
+++ b/sql/rpl_slave.cc
@@ -7051,7 +7051,7 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
   else
   {
     /* write the event to the relay log */
- if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0))
+ if (likely(rli->relay_log.append_buffer(buf, event_len, mi, event_type) == 0))
     {
       mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos);
       DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos()));

Run attached test case.

You will see in the end:

Retrieved_Gtid_Set 6030e94d-f416-11e4-bfd7-28b2bd7442fd:1-3:5-6
Executed_Gtid_Set 6030e94d-f416-11e4-bfd7-28b2bd7442fd:1-6

Suggested fix:
Either change way how GTIDs are calculated after restart (preferable) or check if replication does not write GTID_NEXT event as last event in the relay log file or both (because relay log can be corrupted in such a way).
[6 May 2015 17:48] Sveta Smirnova
test case

Attachment: rpl_issue52049.test (application/octet-stream, text), 724 bytes.

[6 May 2015 17:49] Sveta Smirnova
option file for master

Attachment: rpl_issue52049-master.opt (application/octet-stream, text), 103 bytes.

[6 May 2015 17:51] Sveta Smirnova
option file for slave

Attachment: rpl_issue52049-slave.opt (application/octet-stream, text), 102 bytes.

[6 May 2015 17:53] Sveta Smirnova
See also https://bugs.launchpad.net/mysql-server/+bug/1452397
[7 May 2015 6:53] MySQL Verification Team
Hello Sveta,

Thank you for the report and test case.
Observed this with 6.2.4.

Thanks,
Umesh
[7 May 2015 6:57] MySQL Verification Team
test results

Attachment: 76959.txt (text/plain), 12.82 KiB.

[7 Sep 2015 14:36] Jon Stephens
Documented fix in the MySQL 5.6.27 changelog as follows:

    Under certain circumstances it was possible for
    Retrieved_Gtid_Set on the slave to contain gaps while no gaps
    appeared in Executed_Gtid_Set or the slave's binary logs. This
    could happen when slave rotated the relay log in such a way that
    the last event of this log contained the record which set
    GTID_NEXT, and was then restarted after reading GTIDs from the
    following log. Following the restart, Retrieved_Gtid_Set
    contained GTIDs which were executed incorrectly as well as
    spurious or "phantom" gaps.

    The fix for this problem adds a GTID to Retrieved_Gtid_Set
    before writing the event to the relay log, rather than after. If
    for some reason writing to relay log fails, the GTID is removed
    from Retrieved_Gtid_Set.

Closed.