Bug #55231 | COM_BINLOG_DUMP needs to accept 64-bit positions else slaves can break | ||
---|---|---|---|
Submitted: | 13 Jul 2010 20:40 | Modified: | 24 Nov 2010 18:32 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | In progress | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.48, 5.7.22 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[13 Jul 2010 20:40]
Shane Bester
[13 Jul 2010 20:42]
MySQL Verification Team
workaround is to reduce size of transactions so they never cause binlog to get larger than 4G.
[15 Jul 2010 14:12]
MySQL Verification Team
workaround is to set a lower max_binlog_cache_size, around this size: (4*1024*1024*1024) - max_binlog_size maybe some overhead bytes can be deducted too.
[15 Jul 2010 22:42]
Davi Arnaut
Also, please, let's ensure that the max_allowed_packet error message is not used for unrelated problems.. it's just plain confusing.
[15 Jul 2010 22:43]
Davi Arnaut
Also, oh.. the irony: /* TODO: The following has to be changed to an 8 byte integer */ pos = uint4korr(packet);
[19 Oct 2010 1:59]
Roel Van de Paar
Also note that maximum allowed value for max_binlog_size is 1G, but that server allows creating larger binlogs to prevent spreading transactions across binlogs.
[19 Oct 2010 23:56]
Roel Van de Paar
Note that besides a stop/start, this issue can potentially be created by a network glitch; 100910 08:09:10 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
[28 Oct 2010 14:03]
Sven Sandberg
It's possible to fix this bug and keep backward and forward compatibility if we use a new server command code instead of COM_BINLOG_DUMP. We need to do this: - In the master, add COM_BINLOG_DUMP_64BIT. It is exactly like COM_BINLOG_DUMP except it uses 8 bytes for the position instead of 4. - In the master, add the variable "@@GLOBAL.HAVE_64_BIT_BINLOG_POSITIONS" with the constant value 1. - In the slave startup code (get_master_version_and_clock() or similar function), send "SHOW VARIABLES LIKE 'HAVE_64_BIT_BINLOG_POSITIONS'" to the master to determine if COM_BINLOG_DUMP_64BIT is supported. If it is supported, send 64 bit position. Otherwise, send 32 bit position. - I checked interfaces that we expose. CHANGE MASTER, START SLAVE UNTIL, SHOW SLAVE STATUS, SHOW MASTER STATUS, SHOW BINARY LOGS, and the plugin interface all use 64 bits. master.info and relay-log.info use a string, so we just need to make it write and parse 64 bit numbers. We'll have to ensure that WL#2775 and WL#3656 use 64 bits. So I think all external interfaces are ok. Then, we have to fix the internal interfaces. I found the following in next-mr by grepping 'pos' in rpl_slave.cc, rpl_mi.cc and rpl_rli.cc: in handle_slave_sql: sql_print_information("'SQL_SLAVE_SKIP_COUNTER=%ld' executed at " "relay_log_file='%s', relay_log_pos='%ld', master_log_name='%s', " "master_log_pos='%ld' and new position at " "relay_log_file='%s', relay_log_pos='%ld', master_log_name='%s', " "master_log_pos='%ld' ", (ulong) saved_skip, saved_log_name, (ulong) saved_log_pos, saved_master_log_name, (ulong) saved_master_log_pos, rli->get_group_relay_log_name(), (ulong) rli->get_group_relay_log_pos(), rli->get_group_master_log_name(), (ulong) rli->get_group_master_log_pos()); in process_io_rotate: DBUG_PRINT("info", ("master_log_pos: '%s' %lu", mi->get_master_log_name(), (ulong) mi->get_master_log_pos())); in queue_binlog_ver_1_event and queue_binlog_ver_3_event: DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); in queue_event: DBUG_PRINT("info", ("master_log_pos: %lu, event originating from %u server, ignored", (ulong) mi->get_master_log_pos(), uint4korr(buf + SERVER_ID_OFFSET))); DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); in next_event: DBUG_PRINT("info", ("assertion skip %lu file pos %lu event relay log pos %lu file %s\n", (ulong) rli->slave_skip_counter, (ulong) my_b_tell(cur_log), (ulong) rli->get_event_relay_log_pos(), rli->get_event_relay_log_name())); DBUG_PRINT("info", ("next_event group master %s %lu group relay %s %lu event %s %lu\n", rli->get_group_master_log_name(), (ulong) rli->get_group_master_log_pos(), rli->get_group_relay_log_name(), (ulong) rli->get_group_relay_log_pos(), rli->get_event_relay_log_name(), (ulong) rli->get_event_relay_log_pos())); In two places ^^^ in change_master: DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); in two places ^^^ in relay_log_info::wait_for_pos: DBUG_PRINT("info",("group_master_log_name: '%s' pos: %lu", group_master_log_name, (ulong) group_master_log_pos)); in Relay_log_info::inc_group_relay_log_pos: DBUG_PRINT("info", ("log_pos: %lu group_master_log_pos: %lu", (long) log_pos, (long) group_master_log_pos)); in master_info::flush_info: DBUG_PRINT("enter",("master_pos: %lu", (ulong) master_log_pos));
[12 Nov 2010 10:38]
Andrei Elkin
Consider Bug#55247 as related.
[24 Nov 2010 18:39]
Sven Sandberg
This is more complicated than indicated above. We also have to: - Extend the binlog format so that the end_log_pos of the common-header can be 64 bits. This is probably doable, we just change the common-header size to 23 bytes instead of 19. The end_log_pos will be split up in two pieces: first the low 4 bytes, then other header fields, and lastly the high 4 bytes. One possible problem is that the formats of Format_description_log_event and Rotate_log_event are supposed to be fixed. This is probably OK for Format_description_log_event because it can only appear in the beginning of the binlog, so it does not need the extra bits for the end_log_pos. It is unclear how this limitation affects Rotate_log_event - that remains to be investigated. - Extend mysqlbinlog --read-from-remote-server so that it first checks the capabilities of the server, then executes either COM_BINLOG_DUMP or COM_BINLOG_DUMP_64BIT accordingly.
[22 Feb 2011 21:04]
James Day
For older production versions I assume it's easy to change the error message, so please do that even if a full fix is too risky for a particular version.
[31 Aug 2018 4:33]
MySQL Verification Team
Bug #92250 marked as duplicate of this one
[10 Sep 2018 11:54]
MySQL Verification Team
Bug#92346 is set as duplicate of this one
[23 Apr 2019 5:01]
MySQL Verification Team
Bug #95074 marked as duplicate of this one
[1 Dec 2020 15:57]
MySQL Verification Team
https://bugs.mysql.com/bug.php?id=101795 is marked as a duplicate of this bug
[5 May 2021 15:40]
MySQL Verification Team
Bug #95418 is marked as duplicate of this one
[23 Feb 2022 20:22]
MySQL Verification Team
Bug #106536 is marked as duplicate of this one
[17 Dec 2023 17:31]
Yuriy Tabolin
Sadly, the 13-year bug is still actual. Got it on mysql 8.0.26 2023-12-17T14:20:47.369937Z 133875 [ERROR] [MY-013118] [Repl] Slave I/O for channel '': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name mysql80-bin.172113 log_pos 772469444, Error_code: MY-013118 2023-12-17T14:20:47.370663Z 133875 [ERROR] [MY-013122] [Repl] Slave I/O for channel '': Relay log write failure: could not queue event from master, Error_code: MY-013122 The binlog file size is 11GB (usually it is 1GB). After that, the replica was stuck and couldn't replicate further from the master. And trying to read from binlog from master causes mysql master to crash. mysql> show binlog events in 'mysql80-bin.172113' from 3663667491; ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Event too big mysql> show binlog events in 'mysql80-bin.172113' from 3663667490; ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Event too big mysql> show binlog events in 'mysql80-bin.172113'; ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> \q Is there any workaround for how I can fix my replica? Maybe I can split big binlog file somehow and apply it to replica? I read this and the duplicate issues mentioned here and haven't found anything to help.
[18 Dec 2023 16:27]
Yuriy Tabolin
I've tried two things: 1. setting GTID_ONLY = 1 on the replica (I had to upgrade it to 8.0.27) doesn't help, I got the same error "Unexpected master's heartbeat data: heartbeat is not compatible with local info" 2. setting MASTER_HEARTBEAT_PERIOD = 400000 (just dome big value) helps, I've managed to launch my replica and it started to chase source.