MySQL Bugs: #60955: Error executing trigger on slave results in replication event replay

Bug #60955	Error executing trigger on slave results in replication event replay
Submitted:	22 Apr 2011 19:35	Modified:	14 Jun 2011 10:26
Reporter:	Justin Tolmer	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.1.56	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	replication, triggers

Description:
If the replication SQL thread hits an error while executing a trigger initiated by a change to a non-transactional table (I didn't test transactional tables), then the replication SQL thread may replay an event when the error condition is resolved and the SQL thread restarted, resulting in data differences between the master and slave.

Found by examining at mysql-test/suite/rpl/t/rpl_slave_grp_exec.test which refers to WL#3734 which either doesn't exist or is not publicly visible.

How to repeat:
Run this modified version of the above mentioned test.

--source include/have_innodb.inc
--source include/have_binlog_format_mixed_or_statement.inc
--source include/master-slave.inc
--echo

# Create tables and data
--echo *** Preparing data ***
--connection master
CREATE TABLE t1 (a INT NOT NULL, b INT) ENGINE=MyISAM;
CREATE TABLE t2 LIKE t1;
CREATE TABLE t3 LIKE t1;

DELIMITER |;
CREATE TRIGGER tr1 BEFORE UPDATE ON t1
  FOR EACH ROW BEGIN
    UPDATE t2 SET b=b+1 WHERE a=NEW.a;
  END|
CREATE TRIGGER tr2 AFTER UPDATE ON t1
  FOR EACH ROW BEGIN
    UPDATE t3 SET b=b+1 WHERE a=NEW.a;
  END|
DELIMITER ;|
--echo

# Test non-transactional group with MyISAM tables w/o PK.
# Data for t1,t2 should be replicated for SBR even t3 
# doesn't exist on slave
--echo *** Test non-transactional group w/o PK ***

--connection master
INSERT INTO t3 VALUES(1, 0);
INSERT INTO t2 VALUES(1, 0);
INSERT INTO t1 VALUES(1, 0);
--sync_slave_with_master

RENAME TABLE t3 TO t3_bak;

--connection master
UPDATE t1 SET b = b+1 WHERE a = 1;
SELECT * FROM t1 ORDER BY a;
SELECT * FROM t2 ORDER BY a;
SELECT * FROM t3 ORDER BY a;

--connection slave
# 1146 = ER_NO_SUCH_TABLE
--let $slave_sql_errno= 1146
--source include/wait_for_slave_sql_error.inc
SHOW TABLES LIKE 't%';
if (`SELECT @@BINLOG_FORMAT = 'ROW'`) {
  SELECT * FROM t1 ORDER BY a;
  SELECT * FROM t2 ORDER BY a;
}
if (!`SELECT @@BINLOG_FORMAT = 'ROW'`) {
  SELECT * FROM t1 ORDER BY a;
  SELECT * FROM t2 ORDER BY a;
}

--source include/stop_slave_io.inc
RENAME TABLE t3_bak TO t3;
--source include/start_slave.inc

--connection master
SELECT * FROM t1 WHERE a = 1;
let b_from_master= query_get_value(SELECT * FROM t1 WHERE a = 1, b, 1);
--sync_slave_with_master
SELECT * FROM t1 WHERE a = 1;
let b_from_slave= query_get_value(SELECT * FROM t1 WHERE a = 1, b, 1);

if (`SELECT $b_from_master != $b_from_slave`)
{
  die "Contents of table t1 differ on master and slave";
}

# Clean up
--echo *** Clean up ***
--connection master
DROP TABLE t1,t2,t3;
--sync_slave_with_master

# End of 5.1 test
--source include/rpl_end.inc

Thank you for the bug report. Verified with current mysql-5.1 on Mac OS X:

...
CURRENT_TEST: main.bug60955
mysqltest: At line 71: "Contents of table t1 differ on master and slave"

The result from queries just before the failure was:
< snip >
SHOW TABLES LIKE 't%';
Tables_in_test (t%)
t1
t2
t3_bak
SELECT * FROM t1 ORDER BY a;
a	b
1	1
SELECT * FROM t2 ORDER BY a;
a	b
1	1
include/stop_slave_io.inc
RENAME TABLE t3_bak TO t3;
include/start_slave.inc
SELECT * FROM t1 WHERE a = 1;
a	b
1	1
SELECT * FROM t1 WHERE a = 1;
a	b
1	2
...

Analysis
========

In the reported case, the slave is changed and put out of sync
deliberately by the user. Then a statement changing non-transactional
tables directly and indirectly (through triggers) is applied on the
master and replicated to the slave.

When the statement is applied on the slave, two tables (t1 and t2),
out of the three that the statement changes, are indeed updated, but
not the third one (t3), which had been renamed by the user (to t3_bak)
making the statement execution fail. Thus the slave stops with errors:

110614 12:54:33 [ERROR] Slave SQL: Error 'Table 'test.t3' doesn't
exist' on query. Default database: 'test'. Query: 'UPDATE t1 SET b =
b+1 WHERE a = 1', Error_code: 1146 110614 12:54:33 [Warning] Slave:
Table 'test.t3' doesn't exist Error_code: 1146

110614 12:54:33 [ERROR] Error running query, slave SQL thread
aborted. Fix the problem, and restart the slave SQL thread with "SLAVE
START". We stopped at log 'master-bin.000001' position 1053

Since this statement changes non-transactional tables, the changes
made to t1 and t2 remain in-place - they are not rolled back.  Once
the user renames t3_bak to t3 and restarts the slave thread, it starts
applying from the position pointing at the statement that had failed,
i.e., it assumes that the user had fixed all problems. This causes
some updates to be applied twice.

Given that these are non-transactional tables, the user should have
recovered manually the data after renaming t3_bak to t3 (for example,
by either: 1. undoing changes to t1 and t2 and restart the slave;
or 2. applying missing changes to t3 and start the slave by skipping
the offending statement).

Finally, transactional tables (e.g., InnoDB) will not exhibit this
problem because they rollback the changes to t1 and t2 when the
statement fails in the first place. Thus after renaming back t3_bak,
the slave can just start from the same position!

Conclusion
==========

  C1. Given analysis, closing as not a bug.

Appendix
========

A1. Test case used for conducting analysis

## BUG#12402818 
##   Test case used for conducting analysis
##

--source include/have_innodb.inc
--source include/have_binlog_format_mixed_or_statement.inc
--source include/master-slave.inc

--connection slave
--echo *** SLAVE ***
call mtr.add_suppression("Table 'test.t3'.*, Error_code: 1146"); 

# Create tables and data
--connection master
--echo *** MASTER ***
CREATE TABLE t1 (a INT NOT NULL, b INT) ENGINE=MyISAM;

# INFO: uncomment line below to test with transactional tables
# ALTER TABLE t1 ENGINE = InnoDB;
CREATE TABLE t2 LIKE t1;
CREATE TABLE t3 LIKE t1;

# one trigger - tr1 - inserts data in table t2 *before* 
# t1 is updated
DELIMITER |;
CREATE TRIGGER tr1 BEFORE UPDATE ON t1
  FOR EACH ROW BEGIN
    UPDATE t2 SET b=b+1 WHERE a=NEW.a;
  END|

# the other trigger - tr2 - inserts data in table t2 *after*
# t1 is updated
CREATE TRIGGER tr2 AFTER UPDATE ON t1
  FOR EACH ROW BEGIN
    UPDATE t3 SET b=b+1 WHERE a=NEW.a;
  END|
DELIMITER ;|

INSERT INTO t3 VALUES(1, 0);
INSERT INTO t2 VALUES(1, 0);
INSERT INTO t1 VALUES(1, 0);
--sync_slave_with_master
--echo *** SLAVE ***

# inserts were replicated, now rename t3 to some other name
RENAME TABLE t3 TO t3_bak;

# update t1 to fire both triggers
--connection master
--echo *** MASTER ***
UPDATE t1 SET b = b+1 WHERE a = 1;
SELECT * FROM t1 ORDER BY a;
SELECT * FROM t2 ORDER BY a;
SELECT * FROM t3 ORDER BY a;

# wait for the slave to stop because it cannot update t3 *after*
# updating t1 (t1 is a MyISAM table, so changes can't be rolled back)
--connection slave
--echo *** SLAVE ***
# 1146 = ER_NO_SUCH_TABLE
--let $slave_sql_errno= 1146
--source include/wait_for_slave_sql_error.inc
# stop the slave
--source include/stop_slave_io.inc

# rename the table back to t3
RENAME TABLE t3_bak TO t3;

SELECT * FROM t1 ORDER BY a;
SELECT * FROM t2 ORDER BY a;
SELECT * FROM t3 ORDER BY a;

# restart the slave
--source include/start_slave.inc
--connection master
--sync_slave_with_master

# slave will try to execute the failed statement again
--let $diff_tables= master:test.t1, slave:test.t1
--source include/diff_tables.inc

--let $diff_tables= master:test.t2, slave:test.t2
--source include/diff_tables.inc

--let $diff_tables= master:test.t3, slave:test.t3
--source include/diff_tables.inc

# Clean up
--connection master
--echo *** MASTER ***
DROP TABLE t1,t2,t3;
--sync_slave_with_master
--echo *** SLAVE ***

# End of 5.1 test
--source include/rpl_end.inc