| Bug #76959 | Gaps in Retrieved_Gtid_Set while no gaps in Executed_Gtid_Set | ||
|---|---|---|---|
| Submitted: | 6 May 2015 17:48 | Modified: | 7 Sep 2015 14:36 |
| Reporter: | Sveta Smirnova (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.6.24 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[6 May 2015 17:48]
Sveta Smirnova
test case
Attachment: rpl_issue52049.test (application/octet-stream, text), 724 bytes.
[6 May 2015 17:49]
Sveta Smirnova
option file for master
Attachment: rpl_issue52049-master.opt (application/octet-stream, text), 103 bytes.
[6 May 2015 17:51]
Sveta Smirnova
option file for slave
Attachment: rpl_issue52049-slave.opt (application/octet-stream, text), 102 bytes.
[6 May 2015 17:53]
Sveta Smirnova
See also https://bugs.launchpad.net/mysql-server/+bug/1452397
[7 May 2015 6:53]
MySQL Verification Team
Hello Sveta, Thank you for the report and test case. Observed this with 6.2.4. Thanks, Umesh
[7 May 2015 6:57]
MySQL Verification Team
test results
Attachment: 76959.txt (text/plain), 12.82 KiB.
[7 Sep 2015 14:36]
Jon Stephens
Documented fix in the MySQL 5.6.27 changelog as follows:
Under certain circumstances it was possible for
Retrieved_Gtid_Set on the slave to contain gaps while no gaps
appeared in Executed_Gtid_Set or the slave's binary logs. This
could happen when slave rotated the relay log in such a way that
the last event of this log contained the record which set
GTID_NEXT, and was then restarted after reading GTIDs from the
following log. Following the restart, Retrieved_Gtid_Set
contained GTIDs which were executed incorrectly as well as
spurious or "phantom" gaps.
The fix for this problem adds a GTID to Retrieved_Gtid_Set
before writing the event to the relay log, rather than after. If
for some reason writing to relay log fails, the GTID is removed
from Retrieved_Gtid_Set.
Closed.

Description: Under certain circumstances it is possible that Retrieved_Gtid_Set on slave will contain gaps while actually no gaps will be in Executed_Gtid_Set and slave binary logs. This happens when slave rotates relay log in such a way that last event contains record which sets GTID_NEXT, then following log contains few GTIDs, then slave restarts. After restart GTIDs for Retrieved_Gtid_Set executed wrongly and there are phantom gaps. How to repeat: Modify sources as follow (this is against MySQL from GitHub): diff --git a/sql/binlog.cc b/sql/binlog.cc index a02b987..9adb8bd 100644 --- a/sql/binlog.cc +++ b/sql/binlog.cc @@ -5057,6 +5057,7 @@ bool MYSQL_BIN_LOG::after_append_to_relay_log(Master_info *mi) DBUG_EVALUATE_IF("rotate_slave_debug_group", 500, max_size)) { error= new_file_without_locking(mi->get_mi_description_event()); + DBUG_EXECUTE_IF("gtid_next_bug", if (0 == max_size) {max_size = 1073741824;}); } } @@ -5092,7 +5093,7 @@ bool MYSQL_BIN_LOG::append_event(Log_event* ev, Master_info *mi) } -bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi) +bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi, Log_event_type event_type) { DBUG_ENTER("MYSQL_BIN_LOG::append_buffer"); @@ -5101,6 +5102,8 @@ bool MYSQL_BIN_LOG::append_buffer(const char* buf, uint len, Master_info *mi) DBUG_ASSERT(is_relay_log); mysql_mutex_assert_owner(&LOCK_log); + DBUG_EXECUTE_IF("gtid_next_bug", if (GTID_LOG_EVENT == event_type) {max_size=0; fprintf(stderr, "executing\n");}); + // write data bool error= false; if (my_b_append(&log_file,(uchar*) buf,len) == 0) diff --git a/sql/binlog.h b/sql/binlog.h index 38efe64..4ecda34 100644 --- a/sql/binlog.h +++ b/sql/binlog.h @@ -607,7 +607,7 @@ public: bool is_query_in_union(THD *thd, query_id_t query_id_param); #ifdef HAVE_REPLICATION - bool append_buffer(const char* buf, uint len, Master_info *mi); + bool append_buffer(const char* buf, uint len, Master_info *mi, Log_event_type event_type); bool append_event(Log_event* ev, Master_info *mi); private: bool after_append_to_relay_log(Master_info *mi); diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc index 30e85fc..b60e916 100644 --- a/sql/rpl_slave.cc +++ b/sql/rpl_slave.cc @@ -7051,7 +7051,7 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len) else { /* write the event to the relay log */ - if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0)) + if (likely(rli->relay_log.append_buffer(buf, event_len, mi, event_type) == 0)) { mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos); DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); Run attached test case. You will see in the end: Retrieved_Gtid_Set 6030e94d-f416-11e4-bfd7-28b2bd7442fd:1-3:5-6 Executed_Gtid_Set 6030e94d-f416-11e4-bfd7-28b2bd7442fd:1-6 Suggested fix: Either change way how GTIDs are calculated after restart (preferable) or check if replication does not write GTID_NEXT event as last event in the relay log file or both (because relay log can be corrupted in such a way).