Bug #56912 binlog_killed fails sporadically on mysql-next-mr-bugfixing codebase.
Submitted: 21 Sep 2010 23:24 Modified: 15 Nov 2010 19:36
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:mysql-next-mr-bugfixing, 5.6.99 OS:Any
Assigned to: Luis Soares CPU Architecture:Any

[21 Sep 2010 23:24] Luis Soares
Description:
binlog_killed test fails sporadically on next-mr-bugfixing codebase recently.

I have noticed it mostly on mysql-next-mr-bufixing tree.

binlog.binlog_killed 'stmt'              [ fail ]
        Test ended at 2010-09-17 15:58:36

CURRENT_TEST: binlog.binlog_killed
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1835008, event_type: -109
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116
--- /export/home3/pb2/build/sb_2-None-1284723696.84/mysql-next-mr-bugfixing-gcov/mysql-test/suite/binlog/r/binlog_killed.result	2010-09-17 14:43:29.000000000 +0300
+++ /export/home3/pb2/build/sb_2-None-1284723696.84/mysql-next-mr-bugfixing-gcov/mysql-test/suite/binlog/r/binlog_killed.reject	2010-09-17 16:58:36.000000000 +0300
@@ -9,7 +9,7 @@
 insert into t2 values (null, null), (null, get_lock("a", 10));
 select @result /* must be zero either way */;
 @result
-0
+2
 select RELEASE_LOCK("a");
 RELEASE_LOCK("a")
 1

mysqltest: Result content mismatch

How to repeat:
Look at PB2: http://tinyurl.com/2477lot

List of failures: http://tinyurl.com/26fgnfh

Suggested fix:
.
[26 Sep 2010 23:03] Luis Soares
Analysis
========

  - Test description 
   
    Roughly, the failing test asserts that a killed session
    either inserts rows in a table and writes the event to the
    binary log or it does neither of the two.

    Techincally, the assertion is done by: 

      1. selecting the row count from the table and storing the
         value in $row;
  
      2. dumping the contents of the binary log from position 175
         into a file and then parsing the file searching for the
         'error_code' field and for the insert statement. If
         there is a match one is return for either, otherwise
         zero is returned. The values are stored in $error_code
         and $insert_binlogged variables.

      3. subtracting to $row the values of $error_code and
         $insert_binlogged:

         set @result= $rows- $error_code - $insert_binlogged;

         If no rows were inserted on the table, then nothing must
         have been binlogged: 

            set @result= 0 - 0 - 0 = 0

         On the other hand, if the two rows were inserted on the
         table, then

            set @result= 2 - 1 - 1 = 0

         The final result must be zero either way. 

  - Problem Description
  
    The test fails with:

 insert into t2 values (null, null), (null, get_lock("a", 10));
 select @result /* must be zero either way */;
 @result
-0
+2

    This suggests that there is something wrong with killing and
    binlogging on-going statements. However, from mysqlbinlog
    output we also have this:

 CURRENT_TEST: binlog.binlog_killed
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1835008, event_type:
-109
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688,
event_type: 116
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688,
event_type: 116

    This shows that something wrong happened while dumping the
    contents of the binary log (see step #2 of the assertion
    procedure above).

    Looking at the contents of the binary log, when the test
    fails, we can find:

/*!*/;
# at 177
#100924 18:31:48 server id 1  end_log_pos 205 	Intvar
SET INSERT_ID=1/*!*/;
# at 205
#100924 18:31:48 server id 1  end_log_pos 329 	Query	thread_id=4	exec_time=0	error_code=0
use test/*!*/;
SET TIMESTAMP=1285349508/*!*/;
insert into t2 values (null, null), (null, get_lock("a", 10))

     Now... From the test case, we find that the dump is
     performed from position 175, which is wrong and should be 177.

   - Some observations

     1. The test fails sporadically because the dump only fails
        when the kill command is slow enough to allow the insert
        to succeed. In that case, the events will be written to
        the binary log and the dump will fail. OTOH, if the kill
        succeeds before the connection is able to execute the
        insert statement, then the dump succeeds and consequently
        the assertion succeeds.

     2. The format description event grew two bytes recently
        because it has been added two new event types to the
        post_header_len, which do shift the positions by two
        bytes. The new events were added in revision
        dao-gang.qu@sun.com-20100707064850-sja6v94mepw470al :

-  
+
+  /*
+    In some situations, it is necessary to send over ignorable
+    data to the slave: data that a slave can handle in case there
+    is code for handling it, but which can be ignored if it is not
+    recognized.
+  */
+  IGNORABLE_LOG_EVENT= 28,
+  ROWS_QUERY_LOG_EVENT= 29,
+

        and then the post_header_len grew two bytes - checking
        the FD event constructor we have:

    number_of_event_types= LOG_EVENT_TYPES;
    /* we'll catch my_malloc() error in is_valid() */
    post_header_len=(uint8*) my_malloc(number_of_event_types*sizeof(uint8),
                                       MYF(0));

Conclusion
==========

  1. The test fails sporadically because the assertion checks for
     the atomicity, either: (a) we insert into the table and
     write the corresponding event to the binary log, or (b) we
     don't insert and don't write. When (a) happens, it fails.
 
  2. The FD event has grown 2 bytes recently, and that explains
     why the assertion (a), because the parsing of the binary log
     is done based on the start position of the insert event.

  3. The immediate fix would be to change the mysqlbinlog
     parameter from --start-position=175 to --start-position=177.
[26 Sep 2010 23:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119129

3298 Luis Soares	2010-09-27
      BUG#56912: binlog_killed fails sporadically on
                 mysql-next-mr-bugfixing codebase.
      
      Two new event types were added recently. This made the FD event
      to grow 2 bytes, thence the event positions in the binary log
      shifted two bytes as well. Given that sometimes the assertion in
      the binlog_killed test relies on absolute binary log positions,
      it will try to read events from a current obsolete position (175,
      while the new one is 177).
      
      We fix this by making the test to read the current position from
      SHOW MASTER STATUS into $start_pos and then issuing mysqlbinlog
      with --start-position=$start_pos instead of a hardcoded absolute
      position. This will also make the dump to read the BEGIN event
      that precedes the actual INSERT event (whereas before it would
      not), but this should not affect the assertion correctness.
[30 Sep 2010 11:52] Luis Soares
Queued in mysql-next-mr-bugfixing:
http://pb2.norway.sun.com/web.py?template=push_details&push=1612070
[2 Oct 2010 18:13] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:11] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[15 Nov 2010 19:36] Jon Stephens
Affects testing only; nothing to document for end users.

Closed.