Bug #43667 | Maria replication failure in row-based/mixed-mode replication | ||
---|---|---|---|
Submitted: | 16 Mar 2009 8:34 | Modified: | 24 Apr 2009 13:13 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S2 (Serious) |
Version: | 6.0.10 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[16 Mar 2009 8:34]
Philip Stoev
[16 Mar 2009 9:32]
Philip Stoev
Test case for bug 43667
Attachment: bug43667.zip (application/x-zip-compressed, text), 4.94 KiB.
[16 Mar 2009 9:38]
Philip Stoev
To reproduce, unpack the ZIP in mysql-test/suite/bug43667 and then run: MTR_VERSION=1 perl mysql-test-run.pl --start-and-exit --mysqld=--innodb rpl_alter mysql -uroot --socket=var/tmp/slave.sock -e 'change master to master_port=9306 , master_host="127.0.0.1" , master_user="root"' mysql -uroot --socket=var/tmp/slave.sock -e 'start slave' engine_type=Maria MTR_VERSION=1 perl mysql-test-run.pl --stress --extern --user=root --socket=var/tmp/master.sock --stress-init-file=bug43667_init.txt --stress-test-file=bug43667_run.txt --stress-suite=bug43667 --stress-test-duration=60000 --stress-threads=50 Please disregard all output from the test and the contents of the var/stress directory. Instead, issue SHOW SLAVE STATUS on the slave to detect the issue. I have tried to simplify the test case as much as possible. The problem was also observed with --stress-threads=10 . The slave will lag behind the master, so the issue will become visible about 30 seconds into the test. Please let me know if a better test case is required.
[17 Apr 2009 17:34]
Luis Soares
Analysis ======== Conducted with tool provided in the bug report. Verified behavior as mentioned. Observation =========== * Slave stops at position 12271980 with: Could not execute Delete_rows event on table test.insdel1_tbl; Can't find record in 'insdel1_tbl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 12271980, Error_code: 1032 * Event at start position = position stopped: ../client/mysqlbinlog -v --start-position=12271980 var/log/master-bin.000001 | less (...) ### DELETE FROM test.insdel1_tbl ### WHERE ### @1=895 => We can find that it is a DELETE where @1=895 ... * Checking for inserts into test.insdel1_tbl before position = position stopped: => Several * Checking for inserts into insdel1_tbl, WHERE @1=895, before position = position stopped and 895 "key" appears as well: ../client/mysqlbinlog -v --stop-position=12271980 var/log/master-bin.000001 | grep -A 2 "INSERT INTO test.insdel1_tbl" | grep 895 => n/a Conclusion ========== Binlog event ordering seems different from execution history, thence slave will replay row events in wrong order (DELETE before INSERT), causing the slave to fail by not finding the row to be deleted.
[17 Apr 2009 17:36]
Luis Soares
Regarding the observation above: The following INSERT appears in the binlog *after* the DELETE event mentioned above. ### INSERT INTO test.insdel1_tbl ### SET ### @1=895 ### @2=1239987496 ### @3='root@localhost' ### @4='f2433a1a-2b70-11de-b0ab-002185344e3a' ### @5=19 ### @6='Going to test RBR for MySQL'
[17 Apr 2009 17:52]
Luis Soares
# Test case showing that with non-transactional storage engines and # autocommit=off, some events are logged in different order regarding # execution. This results in different execution histories on master # and slave. # In what follows, although an unsafe UPDATE statement (UUID) is # performed in table t1 (in connection conn2), before the DELETE in # (connection conn1) on the same table, these are ordered differently # when binlogging. # The binlogging of the unsafe UPDATE event (row event) does not obey # the execution order, but instead it obeys the commit order. This # makes the DELETE event to appear before the unsafe UPDATE, because # conn2 commits first than conn1. # Also, note that apparentely, only row events get the BEGIN and # COMMIT transaction boundaries. source include/have_log_bin.inc; source include/have_maria.inc; connection default; CREATE TABLE t1 (a text) engine=Maria; INSERT INTO t1 VALUES ("1"); connect (conn1, 127.0.0.1,root,,); connect (conn2, 127.0.0.1,root,,); echo ***** conn1: begin; connection conn1; SET AUTOCOMMIT= 0; BEGIN; echo ***** conn2: begin and update t1; connection conn2; SET AUTOCOMMIT= 0; BEGIN; UPDATE t1 SET a = 'a'; UPDATE t1 SET a = UUID(); SELECT * FROM t1; echo ***** conn1: delete from t1 and commit; connection conn1; SELECT * FROM t1; DELETE FROM t1; COMMIT; echo ***** conn2: commit; connection conn2; COMMIT; disconnect conn1; disconnect conn2; connection default; DROP TABLE t1; source include/show_binlog_events.inc; exit;
[17 Apr 2009 17:54]
Luis Soares
What follows is the dump of the binlog for the execution above for: === RBR mode master-bin.000001 # Query # # use `test`; CREATE TABLE t1 (a text) engine=Maria master-bin.000001 # Query # # use `test`; BEGIN master-bin.000001 # Table_map # # table_id: # (test.t1) master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F master-bin.000001 # Query # # use `test`; COMMIT master-bin.000001 # Query # # use `test`; BEGIN master-bin.000001 # Table_map # # table_id: # (test.t1) master-bin.000001 # Delete_rows # # table_id: # flags: STMT_END_F master-bin.000001 # Query # # use `test`; COMMIT master-bin.000001 # Query # # use `test`; BEGIN master-bin.000001 # Table_map # # table_id: # (test.t1) master-bin.000001 # Update_rows # # table_id: # flags: STMT_END_F master-bin.000001 # Table_map # # table_id: # (test.t1) master-bin.000001 # Update_rows # # table_id: # flags: STMT_END_F master-bin.000001 # Query # # use `test`; COMMIT master-bin.000001 # Query # # use `test`; DROP TABLE t1 == Mixed Mode master-bin.000001 # Query # # use `test`; CREATE TABLE t1 (a text) engine=Maria master-bin.000001 # Query # # use `test`; INSERT INTO t1 VALUES ("1") master-bin.000001 # Query # # use `test`; UPDATE t1 SET a = 'a' master-bin.000001 # Query # # use `test`; UPDATE t1 SET a = UUID() master-bin.000001 # Query # # use `test`; DELETE FROM t1 master-bin.000001 # Query # # use `test`; DROP TABLE t1 == Statement Mode show binlog events from <binlog_start>; Log_name Pos Event_type Server_id End_log_pos Info master-bin.000001 # Query # # use `test`; CREATE TABLE t1 (a text) engine=Maria master-bin.000001 # Query # # use `test`; INSERT INTO t1 VALUES ("1") master-bin.000001 # Query # # use `test`; UPDATE t1 SET a = 'a' master-bin.000001 # Query # # use `test`; DELETE FROM t1 master-bin.000001 # Query # # use `test`; BEGIN master-bin.000001 # Table_map # # table_id: # (test.t1) master-bin.000001 # Update_rows # # table_id: # flags: STMT_END_F master-bin.000001 # Query # # use `test`; COMMIT master-bin.000001 # Query # # use `test`; DROP TABLE t1
[17 Apr 2009 17:55]
Luis Soares
This seems a duplicate of BUG#40278.
[17 Apr 2009 22:58]
Luis Soares
Correction in the test case comments, *conn1* commits before *conn2*: # This makes the DELETE event to appear before the unsafe UPDATE, because # *conn1* commits first than *conn2*.
[24 Apr 2009 13:14]
Luis Soares
As analysis shows, this is duplicate of BUG#40278.