Description:
Executing rpl.rpl_row_event_max_size 'row' on mysqld with checksum capability
of wl#2540 revealed an issue of reading an event not from the beginning of event buffer.
While rep+2, next-mr trees can't see the problem due to specific design of
rpl.rpl_row_event_max_size, the checksum-aware master stops at reading event to computed incorrect checksum.
Such not-from-the-beginning reading is possible when an event is from the hot log,
and due to convention to treat all but LOG_READ_MEM as LOG_READ_EOF the dump thread repeats reading through Log_event::read_log_event(&log, packet, (pthread_mutex_t*) 0)) and does this the 2nd time from advanced position left after the first failed attempt.
By the way the test designed the 2nd read is supposed to return an "event" that
has iccidentially suitable size (< max_packet_size), and that - being nohow a valid event - is sent over to slave.
Another way to prove the issue is to catch Show Slave Status info at the end
of the test run which shows along expected
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'
the following which is *not* expected:
=====================================V
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
How to repeat:
Easy way:
take 5.1-rep+2 branch
apply the patch
--- mysql-test/suite/rpl/t/rpl_row_event_max_size.test 2010-06-18 11:10:53 +0000
+++ mysql-test/suite/rpl/t/rpl_row_event_max_size.test 2010-07-04 17:44:12 +0000
@@ -71,5 +71,6 @@ call mtr.add_suppression("Found invalid
# Show slave last IO errno
drop table t1;
+select sleep(300);
let slave_io_errno= 1236;
source include/wait_for_slave_io_error.inc;
run ./mtr rpl.rpl_row_event_max_size
to test with SSS the Last_SQL_Err* of the description.
Harder one:
run ./mtr --manual-gdb rpl.rpl_row_event_max_size
and watch execution of Log_event::read_log_event() on the master size
and read_event() on the slave io thread side.
Compare sizes of read-out events with ones that mysqlbinlog master-bin.000001
shows up as well.
The partial read happens when master reaches the group marked as
# at 8098563
The size of the rows-event is
end_log_pos 11098724 - end_log_pos 8098678 >
max(current_thd->variables.max_allowed_packet,
opt_binlog_rows_event_max_size + MAX_LOG_EVENT_HEADER)
(fixes of bug#49931 yet more complicated the matter)
and the event # at 8098678 is read out twice as described.
Suggested fix:
Improve logics of double-reading to never attempt to read from the advanced position of the first read.
Also, reconsider carefully fixes of bug#49931, and necessarily
logics of rpl.rpl_row_event_max_size.