Bug #38262 Replication failure when comparing integer column to string literal
Submitted: 21 Jul 2008 14:37 Modified: 7 Apr 2009 15:23
Reporter: Philip Stoev Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:6.0 BZR OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Triage: Triaged: D2 (Serious)

[21 Jul 2008 14:37] Philip Stoev
Description:
When executing the queries from the test below, the slave will end up with one less row than the master, that is, the INSERT will not be replicated.

This appears to be caused by an update statement of the form

UPDATE t1 SET pk = 10 WHERE int_nokey = 'z';

How to repeat:
--source include/master-slave.inc
--disable_abort_on_error
--source include/have_innodb.inc

SET BINLOG_FORMAT='ROW';

--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings

CREATE TABLE t1 (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `varchar_nokey` varchar(3) NOT NULL,
  PRIMARY KEY (`pk`)
) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1,8,'c'), (2,0,' '), (3,0,' ');

UPDATE t1 SET pk = 10 WHERE int_nokey = 'z';
INSERT INTO t1 ( varchar_nokey ) VALUES ('a' );

SELECT * FROM t1;
--save_master_pos
--connection slave
--sync_with_master
SELECT * FROM t1;
[21 Jul 2008 19:11] Sveta Smirnova
Thank you for the report.

Verified as described.

Version 6.0 fails with " Could not execute Write_rows event on table test.t1; Field 'int_nokey' doesn't have a default value, Error_code: 1364; handler error HA_ERR_ROWS_EVENT_APPLY; the event's master log master-bin.000001, end_log_pos 764"

Workaround: use MIXED BINLOG_FORMAT.
[15 Aug 2008 13:20] Mats Kindahl
Not reproducible on 5.1. However, it seems to trigger BUG#32709 without having to exhaust them memory of the server.
[30 Mar 2009 13:33] Andrei Elkin
Taking over as the issue most probably the same as  Bug #38173.
[7 Apr 2009 13:38] Luis Soares
Analysis
========

  The analysis is based on the test case provided on the bug report
  and its description is split into three sections: i) test; ii)
  results; iii) conclusions. Results were obtained in three runs: RUN-1)
  normal; RUN-2) using gdb; RUN-3) using strict_all_tables mode.

Test Case
=========

I have used a test case based on the one provided in the bug report.
I just added some extra commands to be able to run it smoothly on mtr,
but at its core the test remains the same.

  1 source include/master-slave.inc;
  2 source include/have_binlog_format_row.inc;
  3 
  4 disable_abort_on_error;
  5 connection slave;
  6 source include/have_innodb.inc;
  7 connection master;
  8 source include/have_innodb.inc;
  9 
 10 disable_warnings;
 11 DROP TABLE IF EXISTS t1;
 12 enable_warnings;
 13 
 14 CREATE TABLE t1 (`pk` int(11) NOT NULL AUTO_INCREMENT, 
 15                  `int_nokey` int(11) NOT NULL, 
 16                  `varchar_nokey` varchar(3) NOT NULL, 
 17                  PRIMARY KEY (`pk`)) ENGINE=InnoDB;
 18 
 19 INSERT INTO t1 VALUES (1,8,'c'),
 20                       (2,0,' '),
 21                       (3,0,' ');
 22 
 23 UPDATE t1 SET pk = 10 WHERE int_nokey = 'z'; 
 24 INSERT INTO t1 ( varchar_nokey ) VALUES ('a' );
 25 
 26 SELECT * FROM t1;
 27 sync_slave_with_master;
 28 SELECT * FROM t1;
 29 
 30 connection master;
 31 DROP TABLE t1;
 32 sync_slave_with_master;
 33 
 34 exit;

I have then run it:
  RUN-1: simple run and colleted results
  RUN-2: simple run with gdb
  RUN-3: simple run with strict_all_tables on master

Results
=======

  * RUN-1

    I have observed the following:

    1. The UPDATE (line 23) does not get replicated as it fails on the
    master:
        
        UPDATE t1 SET pk = 10 WHERE int_nokey = 'z';
        ERROR 23000: Duplicate entry '10' for key 'PRIMARY'

     2. The INSERT (line 24) is executed in the master, however it
     produces warnings:
        
        INSERT INTO t1 ( varchar_nokey ) VALUES ('a' );
        Warnings:
        Warning	1364	Field 'int_nokey' doesn't have a default value

     3. Test fails ultimately because the slave cannot apply the
     changes:

        "(...) Could not execute Write_rows event on table test.t1;
        Field 'int_nokey' doesn't have a default value (...)"

     => Punch line: It seems that the failure is triggered by the
     insert at line (24).

  * RUN-2

     For this run I used mtr --gdb and set a breakpoint (b
     Write_rows_log_event::do_exec_row)

     I have observed the following:

       1.There are three exec_rows called. This should be replication
       of changes from first insert (lines: 19,20,21)
   
       2. There is a fourth exec_rows called. This should be
       replication of changes from second insert (line 24). Stepping
       into the method I end up in prepare_record:

(gdb) where
#0  prepare_record (table=0x45cd368, cols=0x44b1c58, width=3, check=true) at rpl_record.cc:404
#1  0x00000000008831f0 in Rows_log_event::write_row (this=0x44b1bf8, rli=0x42116c8, overwrite=false) at log_event.cc:8383
#2  0x0000000000883e11 in Write_rows_log_event::do_exec_row (this=0x44b1bf8, rli=0x42116c8) at log_event.cc:8584
#3  0x000000000088875e in Rows_log_event::do_apply_event (this=0x44b1bf8, rli=0x42116c8) at log_event.cc:7333
#4  0x0000000000940aee in Log_event::apply_event (this=0x44b1bf8, rli=0x42116c8) at log_event.h:1079
#5  0x0000000000938a44 in apply_event_and_update_pos (ev=0x44b1bf8, thd=0x45f4598, rli=0x42116c8, skip=true) at slave.cc:2030
#6  0x0000000000939f8d in exec_relay_log_event (thd=0x45f4598, rli=0x42116c8) at slave.cc:2173
#7  0x000000000093acec in handle_slave_sql (arg=0x42101e0) at slave.cc:2857
#8  0x00007feae51e03ea in start_thread () from /lib/libpthread.so.0
#9  0x00007feae4152cbd in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()

         At this point, inside prepare_record there is a check for
         default values that fails, returning:
       
           error = HA_ERR_ROWS_EVENT_APPLY;
 
         The name of the field being checked and fails the default
         values check is:

           (gdb) p f->field_name
           $2 = 0x3fbb79c "int_nokey"

      => Punch line: gdb seems to confirm that the failure is
      triggered by the INSERT (line 24), which when applied on the
      slave does not have a default value for int_nokey. The
      difference from the slave to the master execution is that the
      slave is checking for default values regardless of the sql_mode
      (STRICT vs NOT STRICT), unlike the master that executes in
      normal (NOT STRICT).

  * RUN-3

     For this run I set on the master STRICT_ALL_TABLES and repeat the
     test case.

     I have observed the following:

         1. The test now succeeds, but that is because the last INSERT
         (line 24) is not replicated. The master just throws an error
         while executing it, but since the test case runs with
         disable_abort_on_error; (line 4) the test ends up succeeding:

               INSERT INTO t1 ( varchar_nokey ) VALUES ('a' );
               ERROR HY000: Field 'int_nokey' doesn't have a default value

      => Punch line: With master and slave operating with similar
      behavior the problem does not reveal itself.

CONCLUSIONS
===========

  1. This bug seems not related to the comparison of integer column to
  string literal.
  2. Given the above analysis, this bug seems DUPLICATE of BUG#38173.
  3. Mixed-mode may not exhibit this problem because execution path
  for some statements (safe ones) will be handled as statement events
  and not as row events, thence, they will be executed with the
  correct mode (most likely the same as in the master).
[7 Apr 2009 15:23] Andrei Elkin
Luis, analysis is utmost convincing, agreed.

Thanks, Andrei