Bug #21474 There is a rotation before the last table_map
Submitted: 7 Aug 2006 8:23 Modified: 26 Oct 2006 3:11
Reporter: Sveta Smirnova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.1 OS:
Assigned to: Mats Kindahl CPU Architecture:Any

[7 Aug 2006 8:23] Sveta Smirnova
Description:
As wrote Wei Li in http://lists.mysql.com/internals/33779:

"there is a rotation before the last table_map
event and the last Row_log_event. During the rotation,
flush_relay_log_info() will be called and group_relay_log_name/pos
will be updated unexpectedly."

>> #060804 11:08:29 server id 1 end_log_pos 20229 Table_map:
>> `test`.`t1` mapped to number 16
>> BINLOG '
>> HY3TRBMBAAAAJwAAAAVPAAAAABAAAAAAAAAABHRlc3QAAnQxAAED
>> ';
>> # at 20415
>> #060804 11:08:29 server id 1 end_log_pos 20263 Write_rows: table
>> id 16
>> BINLOG '
>> HY3TRBQBAAAAIgAAACdPAAAQABAAAAAAAAEAAf/9DgIAAA==
>> ';
>> # at 20449
>> #060804 11:08:29 server id 1 end_log_pos 20302 Table_map:
>> `test`.`t1` mapped to number 16
>>
>> BINLOG '
>> HY3TRBMBAAAAJwAAAE5PAAAAABAAAAAAAAAABHRlc3QAAnQxAAED
>> ';
>> # at 20488
>> #060804 11:21:17 server id 2 end_log_pos 20537 Rotate to
>> slave-relay-bin.000004 pos: 4

How to repeat:
insert into foo values (...);
insert into foo values (...);
[7 Aug 2006 14:07] Miguel Solorzano
Bug: http://bugs.mysql.com/bug.php?id=21479 was marked as duplicate of
this one.
[6 Oct 2006 20:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13282

ChangeSet@1.2345, 2006-10-06 22:03:12+02:00, mats@romeo.(none) +1 -0
  BUG#21474 (There is a rotation before the last table map):
  Removing code to step the group log position and just stepping
  the event log position.  If the group log position were stepped
  one time too many, it might be that the group starts at a position
  that is not possible, e.g., at a Rows_log_event, or between an
  Intvar_log_event and the following associated Query_log_event.
[10 Oct 2006 13:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13394

ChangeSet@1.2345, 2006-10-10 15:18:21+02:00, mats@romeo.(none) +2 -0
  BUG#21474 (There is a rotation before the last table map):
  Removing code to step the group log position and just stepping
  the event log position.  If the group log position were stepped
  one time too many, it might be that the group starts at a position
  that is not possible, e.g., at a Rows_log_event, or between an
  Intvar_log_event and the following associated Query_log_event.
[11 Oct 2006 9:54] Lars Thalmann
Pushed into 5.1.12.

-----------------

Guilhem described the problem like this (slightly edited by Lars):

- Master has a Table_map and a Rows event to send to slave (the Rows
  event is about the table id defined in the Table_map). This can also
  apply to a transaction (some "BEGIN" query event, then some "INSERT",
  then some "COMMIT").

- Master sends the Table_map, slave receives it, writes it to relay
  log, notices that relay log is too big (option --max-relay-log-size):
  creates a new relay log;

- Master sends the Rows event, slave receives it, writes to 2nd relay
  log. So we have on slave:

  - first relay log:
    Table_map event
    Rotate to 2nd relay log (this event has server id of the slave)

  - second relay log:
    Format_description event (this event has server id of the slave)
    Rows event

Slave SQL thread comes to executing the Table_map, then the Rotate
would cause inc_group_relay_log_pos() to be called, thus recording in
relay-log.info that the last group starts at the position of the
Rotate.

If now slave is stopped, it will restart from the Rotate, come to the
Rows, and wonder for what table this table id is. In other words, the
Rows is logically part of the same group as the Table_map, and the
Rotate got in the way.

-----------------

The fix is making sure that skipped events due to same server id do
not increase the group position.
[11 Oct 2006 9:57] Lars Thalmann
Note that it is not required that table maps and rows events that
belong together are in the same relay log file.

What is required is that the group position (the position stored in
relay_log.info) is always correct.
[11 Oct 2006 10:10] Lars Thalmann
See also BUG#23171
[11 Oct 2006 17:40] Wei Li
I think the fix should apply to all cases that might call flush_relay_log_info().  It is possible that some other events might call flush_relay_log_info()

For example, Format_description_log_event::exec_event() also makes the call.
[12 Oct 2006 8:47] Mats Kindahl
This is entirely correct and the second patch do not step the group log position nor flush the relay log info to disk when executing Format_description_log_event.

Although this solution eliminates the immediate potential for failure, the solution is not entirely satisfactory. For that reason, we created BUG#23171, which will implement a complete solution so that skipping and execution will update relay log information in an identical manner.
[26 Oct 2006 3:11] Paul Dubois
Noted in 5.1.12 changelog.