Bug #38262 | Replication failure when comparing integer column to string literal | ||
---|---|---|---|
Submitted: | 21 Jul 2008 14:37 | Modified: | 7 Apr 2009 15:23 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S2 (Serious) |
Version: | 6.0 BZR | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[21 Jul 2008 14:37]
Philip Stoev
[21 Jul 2008 19:11]
Sveta Smirnova
Thank you for the report. Verified as described. Version 6.0 fails with " Could not execute Write_rows event on table test.t1; Field 'int_nokey' doesn't have a default value, Error_code: 1364; handler error HA_ERR_ROWS_EVENT_APPLY; the event's master log master-bin.000001, end_log_pos 764" Workaround: use MIXED BINLOG_FORMAT.
[15 Aug 2008 13:20]
Mats Kindahl
Not reproducible on 5.1. However, it seems to trigger BUG#32709 without having to exhaust them memory of the server.
[30 Mar 2009 13:33]
Andrei Elkin
Taking over as the issue most probably the same as Bug #38173.
[7 Apr 2009 13:38]
Luis Soares
Analysis ======== The analysis is based on the test case provided on the bug report and its description is split into three sections: i) test; ii) results; iii) conclusions. Results were obtained in three runs: RUN-1) normal; RUN-2) using gdb; RUN-3) using strict_all_tables mode. Test Case ========= I have used a test case based on the one provided in the bug report. I just added some extra commands to be able to run it smoothly on mtr, but at its core the test remains the same. 1 source include/master-slave.inc; 2 source include/have_binlog_format_row.inc; 3 4 disable_abort_on_error; 5 connection slave; 6 source include/have_innodb.inc; 7 connection master; 8 source include/have_innodb.inc; 9 10 disable_warnings; 11 DROP TABLE IF EXISTS t1; 12 enable_warnings; 13 14 CREATE TABLE t1 (`pk` int(11) NOT NULL AUTO_INCREMENT, 15 `int_nokey` int(11) NOT NULL, 16 `varchar_nokey` varchar(3) NOT NULL, 17 PRIMARY KEY (`pk`)) ENGINE=InnoDB; 18 19 INSERT INTO t1 VALUES (1,8,'c'), 20 (2,0,' '), 21 (3,0,' '); 22 23 UPDATE t1 SET pk = 10 WHERE int_nokey = 'z'; 24 INSERT INTO t1 ( varchar_nokey ) VALUES ('a' ); 25 26 SELECT * FROM t1; 27 sync_slave_with_master; 28 SELECT * FROM t1; 29 30 connection master; 31 DROP TABLE t1; 32 sync_slave_with_master; 33 34 exit; I have then run it: RUN-1: simple run and colleted results RUN-2: simple run with gdb RUN-3: simple run with strict_all_tables on master Results ======= * RUN-1 I have observed the following: 1. The UPDATE (line 23) does not get replicated as it fails on the master: UPDATE t1 SET pk = 10 WHERE int_nokey = 'z'; ERROR 23000: Duplicate entry '10' for key 'PRIMARY' 2. The INSERT (line 24) is executed in the master, however it produces warnings: INSERT INTO t1 ( varchar_nokey ) VALUES ('a' ); Warnings: Warning 1364 Field 'int_nokey' doesn't have a default value 3. Test fails ultimately because the slave cannot apply the changes: "(...) Could not execute Write_rows event on table test.t1; Field 'int_nokey' doesn't have a default value (...)" => Punch line: It seems that the failure is triggered by the insert at line (24). * RUN-2 For this run I used mtr --gdb and set a breakpoint (b Write_rows_log_event::do_exec_row) I have observed the following: 1.There are three exec_rows called. This should be replication of changes from first insert (lines: 19,20,21) 2. There is a fourth exec_rows called. This should be replication of changes from second insert (line 24). Stepping into the method I end up in prepare_record: (gdb) where #0 prepare_record (table=0x45cd368, cols=0x44b1c58, width=3, check=true) at rpl_record.cc:404 #1 0x00000000008831f0 in Rows_log_event::write_row (this=0x44b1bf8, rli=0x42116c8, overwrite=false) at log_event.cc:8383 #2 0x0000000000883e11 in Write_rows_log_event::do_exec_row (this=0x44b1bf8, rli=0x42116c8) at log_event.cc:8584 #3 0x000000000088875e in Rows_log_event::do_apply_event (this=0x44b1bf8, rli=0x42116c8) at log_event.cc:7333 #4 0x0000000000940aee in Log_event::apply_event (this=0x44b1bf8, rli=0x42116c8) at log_event.h:1079 #5 0x0000000000938a44 in apply_event_and_update_pos (ev=0x44b1bf8, thd=0x45f4598, rli=0x42116c8, skip=true) at slave.cc:2030 #6 0x0000000000939f8d in exec_relay_log_event (thd=0x45f4598, rli=0x42116c8) at slave.cc:2173 #7 0x000000000093acec in handle_slave_sql (arg=0x42101e0) at slave.cc:2857 #8 0x00007feae51e03ea in start_thread () from /lib/libpthread.so.0 #9 0x00007feae4152cbd in clone () from /lib/libc.so.6 #10 0x0000000000000000 in ?? () At this point, inside prepare_record there is a check for default values that fails, returning: error = HA_ERR_ROWS_EVENT_APPLY; The name of the field being checked and fails the default values check is: (gdb) p f->field_name $2 = 0x3fbb79c "int_nokey" => Punch line: gdb seems to confirm that the failure is triggered by the INSERT (line 24), which when applied on the slave does not have a default value for int_nokey. The difference from the slave to the master execution is that the slave is checking for default values regardless of the sql_mode (STRICT vs NOT STRICT), unlike the master that executes in normal (NOT STRICT). * RUN-3 For this run I set on the master STRICT_ALL_TABLES and repeat the test case. I have observed the following: 1. The test now succeeds, but that is because the last INSERT (line 24) is not replicated. The master just throws an error while executing it, but since the test case runs with disable_abort_on_error; (line 4) the test ends up succeeding: INSERT INTO t1 ( varchar_nokey ) VALUES ('a' ); ERROR HY000: Field 'int_nokey' doesn't have a default value => Punch line: With master and slave operating with similar behavior the problem does not reveal itself. CONCLUSIONS =========== 1. This bug seems not related to the comparison of integer column to string literal. 2. Given the above analysis, this bug seems DUPLICATE of BUG#38173. 3. Mixed-mode may not exhibit this problem because execution path for some statements (safe ones) will be handled as statement events and not as row events, thence, they will be executed with the correct mode (most likely the same as in the master).
[7 Apr 2009 15:23]
Andrei Elkin
Luis, analysis is utmost convincing, agreed. Thanks, Andrei