MySQL Bugs: #42165: Slave retries incompletely rolled back transaction

Bug #42165	Slave retries incompletely rolled back transaction
Submitted:	16 Jan 2009 16:45	Modified:	19 Feb 2009 9:50
Reporter:	Sergei Golubchik	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
if a committed transaction is replicated and then aborted on slave for local reasons (deadlock or lock wait timeout) it will be retried up to slave-transaction-retries times. This is the case even if the transaction could not be rolled back completely (mixed InnoDB+MyISAM transaction). The latter is a bug, a slave should not try to re-execute the transaction if the rollback was incomplete.

How to repeat:
for example, do - on the slave - something blocking (make sure innodb_table isn't empty):

SELECT * FROM innodb_table FOR UPDATE;

and then replicate, for example

BEGIN; INSERT myisam_table VALUES (1); UPDATE innodb_table SET a=1; COMMIT;

this will succeed on the master and time out on the slave. It'll keep on retrying  slave-transaction-retries times. When it'll give out completely there will be many 1's in the myisam_table.

Suggested fix:
don't retry in this case

Thank you for the report.

I can not repeat described behavior:

1. Make replication sandbox
2. master [localhost] {msandbox} ((none)) > use test
Database changed
master [localhost] {msandbox} (test) > create table t1(f1 int) engine=innodb;
Query OK, 0 rows affected (0.21 sec)

master [localhost] {msandbox} (test) > create table t2(f1 int);
Query OK, 0 rows affected (0.43 sec)

master [localhost] {msandbox} (test) > insert into t1 values(1);
Query OK, 1 row affected (0.09 sec)

3. slave1 [localhost] {msandbox} (test) > begin; 
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (test) > select * from t1 for update;
+------+
| f1   |
+------+
|    1 | 
+------+
1 row in set (0.00 sec)

4. master [localhost] {msandbox} (test) > begin;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > INSERT t2 VALUES (1); UPDATE t1 SET f1=2;
Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

master [localhost] {msandbox} (test) > commit;

5. slave1 [localhost] {msandbox} (test) > select * from t2;
+------+
| f1   |
+------+
|    1 | 
+------+
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} (test) > select * from t1 for update;
+------+
| f1   |
+------+
|    1 | 
+------+
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (test) > select * from t1;
+------+
| f1   |
+------+
|    2 | 
+------+
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} (test) > select * from t2;
+------+
| f1   |
+------+
|    1 | 
+------+
1 row in set (0.00 sec)

Please provide repeatable test case.

A suggestion to implement the suggested fix:
currently in slave.cc when we find an error which should cause a retry, we first rollback the transaction (and then retry), this rollback is:
            end_trans(thd, ROLLBACK);
(in exec_relay_log_event()).
We could check if this end_trans() raised the ER_WARNING_NOT_COMPLETE_ROLLBACK warning (you can see this warning in action now if you do
BEGIN; INSERT INTO myisamtable; ROLLBACK;). And if yes, then don't retry, because the transaction could not be rolled back, so retrying would lead to duplicate effects.

Sveta, try this:

=== modified file 'mysql-test/extra/rpl_tests/rpl_deadlock.test'
--- mysql-test/extra/rpl_tests/rpl_deadlock.test 2007-06-21 12:39:40 +0000
+++ mysql-test/extra/rpl_tests/rpl_deadlock.test 2009-01-17 06:51:57 +0000
@@ -18,6 +18,7 @@
 # requiring 'unique' for the timeout part of the test
 eval CREATE TABLE t3 (a INT  UNIQUE) ENGINE=$engine_type;
 eval CREATE TABLE t4 (a INT) ENGINE=$engine_type;
+create table t5 (a int) engine=myisam;
 show variables like 'slave_transaction_retries';
 sync_slave_with_master;
 
@@ -41,6 +42,7 @@
  dec $1;
 }
 enable_query_log;
+insert into t5 values(1);
 insert into t1 values(1);
 commit;
 save_master_pos;
@@ -78,6 +80,9 @@
 show slave status;
 --horizontal_results
 
+show variables like 'slave_transaction_retries';
+select * from t5;
+
 # 2) Test lock wait timeout
 
 stop slave;

=============================================

Apply it and run the test:

./mtr rpl.rpl_deadlock_innodb

when it'll fail, you see

show variables like 'slave_transaction_retries';
Variable_name   Value
slave_transaction_retries       2
select * from t5;
a
1
1

Sergei,

which tree are you using? With latest 5.0-main I get:

Variable_name   Value
slave_transaction_retries       10
select * from t5;
a
1

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".