MySQL Bugs: #27583: slave sql fails to read from iocache when slave got stopped at pos==4

Bug #27583	slave sql fails to read from iocache when slave got stopped at pos==4
Submitted:	2 Apr 2007 8:40	Modified:	24 Apr 2007 0:51
Reporter:	Andrei Elkin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.1.18	OS:	Any
Assigned to:	Mats Kindahl	CPU Architecture:	Any
Tags:	relay_log

Description:
test failed

rpl_start_stop_slave
rpl_row_create_table

with similar logs:

070401  4:33:55 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:11220',replication started in log 'FIRST' at position 4
070401  4:33:55 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/dev/shm/var-ps_row-122/log/slave-relay-bin.000001' position: 4
070401  4:33:55 [Note] next log '/dev/shm/var-ps_row-122/log/slave-relay-bin.000002' is currently active
070401  4:33:55 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 4
070401  4:33:55 [Note] next log '/dev/shm/var-ps_row-122/log/slave-relay-bin.000003' is currently active
070401  4:33:55 [Note] Error reading relay log event: slave SQL thread was killed
070401  4:33:56 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 4, relay log '/dev/shm/var-ps_row-122/log/slave-relay-bin.000002' position: 150
070401  4:33:56 [ERROR] Slave SQL thread: I/O error reading event(errno: -1  cur_log->error: 5)
070401  4:33:56 [ERROR] Error reading relay log event: Aborting slave SQL thread because of partial event read
070401  4:33:56 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
070401  4:33:56 [ERROR] Slave: Unknown error, Error_code: 1105

Consequently tests could not complete.

How to repeat:
use similar to pb env (e.g sapsrv1) and execute (copied&pasted from pb's invocation)

./mysql-test-run.pl --tmpdir=/dev/shm/pbtmp-ps_row-122 --vardir=/dev/shm/var-ps_row-122 --timer --force --ps-protocol --mysqld=--binlog-format=row
Logging: ./mysql-test-run.pl --tmpdir=/dev/shm/pbtmp-ps_row-122 --vardir=/dev/shm/var-ps_row-122 --timer --force --ps-protocol --mysqld=--binlog-format=row rpl_start_stop_slave

Suggested fix:
Consider first the reason of failing of read by slave sql iocache.

The issue with 5.0 tests does not relate.
The problem occures when slave io gets interrupted with STOP SLAVE having
received the first 4 bytes from master.
After START SLAVE SQL_THREAD, the latter gets wrong beyond the relay log values 
to read from in _my_b_read().

To get to the situation:

<#rep>
mats, first, i found out that the alg :) to reproduce does not affect 5.0 nor 5.1 w/o bug23171 ...

 the alg in the following:
 
 1. (gdb) br queue_event
 2. change master ...
 3. start slave;
 4. (gdb) fin
 5. (gdb) br Log_event::read_log_event(st_io_cache*, st_safe_mutex_t*, Format_description_log_event const*) at log_event.cc:814
 6. stop slave;
 7. (gdb) cont
 8. start slave sql_thread
 9. (gdb) watch how _my_b_read()  calculates wrong pos_in_file

As I There is great chance

just never mind on the last clause :)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/24069

ChangeSet@1.2545, 2007-04-09 19:26:51+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +1 -0
  Bug #27583 slave sql fails to read from iocache when slave got stopped at pos==4
  
  showed up in set of tests rpl_start_stop_slave.test,  rpl_row_create_table.test, rpl000017.test
  as failure of slave sql thread to read from iocache when slave got stopped having read only first 4 bytes
  from the master binlog.
  
  Appeared to be forgotten assignment to rpl->group_relay_log_name. If the group pos is changed at that particur
  place a similar change should happen to the group log name in step.
  After the fix slave might safely stop and relay log info will consists of correct pair 
  (relay log_name, relay log_pos).
  The correct relay log info is the guarantee that slave will restart w/o the problem.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/24262

ChangeSet@1.2548, 2007-04-11 11:10:14+02:00, mats@romeo.(none) +1 -0
  BUG#27583 (slave sql fails to read from iocache when slave got
  stopped at pos==4):
  
  Submitting patch on behalf of Andrei, who discovered the problem
  and provided the patch.
  
  An update of the group relay log coordinates when rotating forgot to
  update the group relay log name and only updated the group relay log
  position (and group master log name and position).
  
  This patch adds code to update the group relay log *name* as well as
  the position

Pushed into 5.1.18-beta

Noted in 5.1.18 changelog.

Group relay log rotation updated only the log position and not the
name, causing the slave to stop.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29176

ChangeSet@1.2494, 2007-06-20 13:21:16+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +1 -0
  Bug #27583 slave sql fails to read from iocache when slave got stopped at pos==4
  
  forgotten merge with 5.0. There can be some bugs waiting for this fix in 5.0 like Bug@29232

Pushed into 5.0.46

Pushed into 5.1.20-beta