Bug #54988 mysql_binlog_send attempts to read events partly
Submitted: 4 Jul 2010 17:56 Modified: 1 Oct 2010 15:47
Reporter: Andrei Elkin Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1+ OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: binlog, dump thread, replication

[4 Jul 2010 17:56] Andrei Elkin
Description:
Executing rpl.rpl_row_event_max_size 'row' on mysqld with checksum capability
of wl#2540 revealed an issue of reading an event not from the beginning of event buffer.

While rep+2, next-mr trees can't see the problem due to specific design of 
rpl.rpl_row_event_max_size, the checksum-aware master stops at reading event to computed incorrect checksum.

Such not-from-the-beginning reading is possible when an event is from the hot log,
and due to convention to treat all but LOG_READ_MEM as LOG_READ_EOF the dump thread repeats reading through Log_event::read_log_event(&log, packet, (pthread_mutex_t*) 0)) and does this the 2nd time from advanced position left after the first failed attempt.
By the way the test designed the 2nd read is supposed to return an "event" that
has iccidentially suitable size (< max_packet_size), and that - being nohow a valid event - is sent over to slave.

Another way to prove the issue is to catch Show Slave Status info at the end
of the test run which shows along expected
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'

the following which is *not* expected:
=====================================V

               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

How to repeat:
Easy way: 
take 5.1-rep+2 branch
apply the patch

--- mysql-test/suite/rpl/t/rpl_row_event_max_size.test	2010-06-18 11:10:53 +0000
+++ mysql-test/suite/rpl/t/rpl_row_event_max_size.test	2010-07-04 17:44:12 +0000
@@ -71,5 +71,6 @@ call mtr.add_suppression("Found invalid 
 
 # Show slave last IO errno
 drop table t1;
+select sleep(300);
 let slave_io_errno= 1236;
 source include/wait_for_slave_io_error.inc;

run ./mtr rpl.rpl_row_event_max_size 

to test with SSS the Last_SQL_Err* of the description.

Harder one:

run ./mtr --manual-gdb rpl.rpl_row_event_max_size 

and watch execution of Log_event::read_log_event() on the master size
and read_event() on the slave io thread side.
Compare sizes of read-out events with ones that mysqlbinlog master-bin.000001
shows up as well.
The partial read happens when master reaches the group marked as
# at 8098563
The size of the rows-event is
end_log_pos 11098724 - end_log_pos 8098678 > 

 max(current_thd->variables.max_allowed_packet,
                     opt_binlog_rows_event_max_size + MAX_LOG_EVENT_HEADER)

 (fixes of bug#49931 yet more complicated the matter)

and the event # at 8098678 is read out twice as described.

Suggested fix:
Improve logics of double-reading to never attempt to read from the advanced position of the first read.

Also, reconsider carefully fixes of bug#49931, and necessarily 
logics of rpl.rpl_row_event_max_size.
[1 Oct 2010 15:47] Andrei Elkin
Bug #55675 is the parent for this bug. Coincidently the parent and his child were
discovered within few days interval, ... and by very different tools.
[29 Nov 2010 11:11] Bugs System
Pushed into mysql-trunk 5.6.1-m5 (revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (version source revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (merge vers: 5.6.1-m5) (pib:23)