Bug #77826 Transaction applied partially on a slave with relay_log_info_repository=table
Submitted: 24 Jul 2015 8:19 Modified: 28 Jul 2015 12:25
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.25, 5.6.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: relay_log_info_repository

[24 Jul 2015 8:19] Przemyslaw Malkowski
Description:
A transaction may be executed partially(!) on a slave running with relay_log_info_repository=TABLE, when a slave SQL thread is blocked for more then innodb_lock_wait_timeout. Instead of re-trying just a timeouted statement it re-tries the whole transaction. As by default the innodb_rollback_on_timeout is OFF, this leads to data inconsistency. Happens for both statement and ROW format.

The workaround for that is switching innodb_rollback_on_timeout to ON, or using relay_log_info_repository=FILE.
Tested on: 5.6.25-log MySQL Community Server (GPL)

How to repeat:
In slave's my.cnf, add:
relay_log_info_repository           = TABLE

--master:
use test
create table t1 (id int auto_increment primary key, a varchar(50));
create table t2 (id int auto_increment primary key, a varchar(50));
insert into t1 values (null,"aa"),(null,"bb"),(null,"cc");
insert into t2 values (null,"AA"),(null,"BB"),(null,"CC");

-- slave:
begin; select * from t2 for update;

-- master:
begin;
insert into t1 values (10,"dd");
insert into t2 values (10,"DD");
commit;

As a result, the slave ends up with error:
2015-07-24 09:57:53 14775 [Warning] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'insert into t2 values (10,"DD")', Error_code: 1205
2015-07-24 09:57:53 14775 [ERROR] Slave SQL: Error 'Duplicate entry '10' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t1 values (10,"dd")', Error_code: 1062
2015-07-24 09:57:53 14775 [Warning] Slave: Duplicate entry '10' for key 'PRIMARY' Error_code: 1062
2015-07-24 09:57:53 14775 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000002' position 946

And inconsistent data comparing to the master:
--master:
master [localhost] {msandbox} (test) > select * from t1; select * from t2;
+----+------+
| id | a    |
+----+------+
|  1 | aa   |
|  2 | bb   |
|  3 | cc   |
| 10 | dd   |
+----+------+
4 rows in set (0.00 sec)

+----+------+
| id | a    |
+----+------+
|  1 | AA   |
|  2 | BB   |
|  3 | CC   |
| 10 | DD   |
+----+------+
4 rows in set (0.00 sec)

--slave:
slave1 [localhost] {msandbox} (test) > select * from t1; select * from t2;
+----+------+
| id | a    |
+----+------+
|  1 | aa   |
|  2 | bb   |
|  3 | cc   |
| 10 | dd   |
+----+------+
4 rows in set (0.00 sec)

+----+------+
| id | a    |
+----+------+
|  1 | AA   |
|  2 | BB   |
|  3 | CC   |
+----+------+
3 rows in set (0.00 sec)

Suggested fix:
Fix the replication to be ACID compliant also when the relay_log_info_repository=TABLE
[28 Jul 2015 12:25] MySQL Verification Team
Hello Przemyslaw,

Thank you for the report and test case.
Observed this with 5.6.26 GA builds.

Thanks,
Umesh
[28 Jul 2015 12:26] MySQL Verification Team
5.6.26 test result

Attachment: 77826.5_6_26.results (application/octet-stream, text), 14.41 KiB.

[13 Oct 2015 6:41] Marcin Szumilak
Can we have innodb_rollback_on_timeout enabled by default in 5.6.28?