Bug #74177 --slave-preserve-commit-order causes slave to deadlock and break for some querie
Submitted: 1 Oct 2014 13:51 Modified: 26 Jun 2015 16:01
Reporter: Kristian Nielsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.5, 5.7.6 OS:Any
Assigned to: CPU Architecture:Any

[1 Oct 2014 13:51] Kristian Nielsen
Description:
Now that MySQL 5.7.5 has the --slave-preserve-commit-order=1 feature, it also
gets the same bug that the similar MariaDB parallel replication has. This was
discussed extensively, here are some links:

  https://mariadb.atlassian.net/browse/MDEV-5914
  https://mariadb.atlassian.net/browse/MDEV-5941
  http://lists.askmonty.org/pipermail/commits/2014-July/006318.html
  https://lists.launchpad.net/maria-developers/msg07480.html

The basic problem is that there are cases where transactions T1 and T2 can
group commit together on the master, but still conflict on the slave. Then
with --slave-preserve-commit-order=1, we can end up with T1 waiting for a row
lock held by T2, but T2 waiting for T1 to commit. This results in a deadlock
(that is not detected), and eventually a lock wait timeout and slave failure.

One example is:

CREATE TABLE t4 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t4 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
UPDATE t4 SET b=NULL WHERE a=6;
DELETE FROM t4 WHERE b <= 3;

If the UPDATE runs first it does not block the DELETE, but if the DELETE runs
first it will block the UPDATE. There are more cases discussed in above links,
another important case is if the optimiser decides to use a different
execution plan on the slave than on the master (eg. index lookup vs. full
table scan).

How to repeat:
Here is a test case:

rpl_knielsen-slave.opt:

-----------------------------------------------------------------------
--slave-parallel-workers=6 --slave-parallel-type='logical_clock'  --log-slave-updates
--slave-transaction-retries=5 --slave-preserve-commit-order=1
-----------------------------------------------------------------------

rpl_knielsen.test:

-----------------------------------------------------------------------
--source include/have_debug.inc
--source include/master-slave.inc

--source include/rpl_connection_master.inc
CREATE TABLE t1 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
CREATE TABLE t2 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t2 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
CREATE TABLE t3 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t3 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
CREATE TABLE t4 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t4 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
CREATE TABLE t5 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t5 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);
CREATE TABLE t6 (a INT PRIMARY KEY, b INT, KEY b_idx(b)) ENGINE=InnoDB;
INSERT INTO t6 VALUES (1,NULL), (2,2), (3,NULL), (4,4), (5, NULL), (6, 6);

--source include/sync_slave_sql_with_master.inc
--source include/rpl_connection_slave.inc

--source include/rpl_connection_master.inc
# Make the following transactions have same commit parent. So they can be applied
# parallel on slave.
SET @save_debug = @@GLOBAL.debug;
SET GLOBAL debug = "d,set_commit_parent_100";

--source include/rpl_connection_slave.inc
LOCK TABLES t1 WRITE, t2 WRITE, t3 WRITE, t4 WRITE, t5 WRITE, t6 WRITE;

--source include/rpl_connection_master.inc
UPDATE t1 SET b=NULL WHERE a=6;
DELETE FROM t1 WHERE b <= 3;
INSERT INTO t2 VALUES (7, NULL);
DELETE FROM t2 WHERE b <= 3;
UPDATE t3 SET b=NULL WHERE a=6;
DELETE FROM t3 WHERE b <= 3;
INSERT INTO t4 VALUES (7, NULL);
DELETE FROM t5 WHERE b <= 3;
UPDATE t5 SET b=NULL WHERE a=6;
DELETE FROM t5 WHERE b <= 3;
INSERT INTO t6 VALUES (7, NULL);
DELETE FROM t6 WHERE b <= 3;

--source include/save_master_pos.inc

--source include/rpl_connection_slave.inc
let $wait_condition= SELECT count(*) = 6 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE State = 'Waiting for table metadata lock';
--source include/wait_condition.inc

UNLOCK TABLES;
--source include/sync_slave_sql.inc

--source include/rpl_connection_master.inc
SHOW BINLOG EVENTS;
--source include/rpl_connection_slave.inc
SELECT * FROM t1 ORDER BY a;
SELECT * FROM t2 ORDER BY a;
SELECT * FROM t3 ORDER BY a;
SELECT * FROM t4 ORDER BY a;
SELECT * FROM t5 ORDER BY a;
SELECT * FROM t6 ORDER BY a;

--source include/rpl_connection_master.inc
SET GLOBAL debug = @save_debug;

DROP TABLE t1, t2, t3, t4, t5, t6;
--source include/sync_slave_sql_with_master.inc

--source include/rpl_end.inc
-----------------------------------------------------------------------

The test case does not use debug_sync, so might need to be run a couple times
for the race to trigger.

Suggested fix:
The problem is essentially the same as in MariaDB 10, so probably a similar fix is needed as in the patch given in the link above.
[3 Dec 2014 13:23] MySQL Verification Team
Hello Kristian Nielsen,

Thank you for the report and test case.
With the provided test case, observed a timeout on slave and failure.

Thanks,
Umesh
[3 Dec 2014 13:23] MySQL Verification Team
// 5.7.6

[umshastr@hod03]/export/umesh/mysql-5.7.6/mysql-test: ./mtr --debug rpl_knielsen
Logging: ./mtr  --debug rpl_knielsen
MySQL Version 5.7.6
Checking supported features...
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/export/umesh/mysql-5.7.6/mysql-test/var'...
Installing system database...

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

connection slave;
ERROR: sync_slave_sql.inc failed on connection 'slave'
ERROR: use_gtids=''
ERROR: _saved_gtids=''
ERROR: _saved_file='master-bin.000001'
ERROR: _saved_pos='5536'
ERROR: timeout='300'
ERROR: result='-1'
ERROR: error type: Timeout after 300 seconds.
main.rpl_knielsen                        [ fail ]
        Test ended at 2014-12-03 13:10:40

CURRENT_TEST: main.rpl_knielsen
mysqltest: [Warning] /export/umesh/mysql-5.7.6/bin/mysqltest: Option 'debug=d:t:A,/export/umesh/mysql-5.7.6/mysql-test/var/log/mysqltest.trace' used, but is disabled
mysqltest: In included file ./include/sync_slave_sql.inc at line 136:
included from /export/umesh/mysql-5.7.6/mysql-test/t/rpl_knielsen.test at line 52:
At line 135: Error in sync_with_master.inc

 - the logfile can be found in '/export/umesh/mysql-5.7.6/mysql-test/var/log/main.rpl_knielsen/rpl_knielsen.log'
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 310 seconds executing testcases

Completed: Failed 1/1 tests, 0.00% were successful.

Failing test(s): main.rpl_knielsen

## build used

// build used

commit: 1982dd7ab813e3cdb28d3bb119cc7b706cf01b15
date: 2014-11-27 13:27:39 +0300
build-date: 2014-11-27 12:38:45 +0100
short: 1982dd7
branch: mysql-trunk

MySQL source 5.7.6
[3 Dec 2014 13:28] MySQL Verification Team
// In the second attempt extracted transaction details from slave
//

------------
TRANSACTIONS
------------
Trx id counter 1368
Purge done for trx's n:o < 1364 undo n:o < 0 state: running but idle
History list length 24
Total number of lock structs in row lock hash table 10
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421154230976672, not started
0 lock struct(s), heap size 1160, 0 row lock(s)
---TRANSACTION 421154230969904, not started
0 lock struct(s), heap size 1160, 0 row lock(s)
---TRANSACTION 421154230967648, not started
0 lock struct(s), heap size 1160, 0 row lock(s)
---TRANSACTION 421154230966520, not started
0 lock struct(s), heap size 1160, 0 row lock(s)
---TRANSACTION 1367, ACTIVE 29 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 10, OS thread handle 139679249843968, query id 267 updating
UPDATE t3 SET b=NULL WHERE a=6
------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 22 page no 4 n bits 80 index b_idx of table `test`.`t3` trx id 1367 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000002; asc     ;;

------------------
---TRANSACTION 1366, ACTIVE 29 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 139679250908928, query id 268 updating
UPDATE t1 SET b=NULL WHERE a=6
------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 20 page no 4 n bits 80 index b_idx of table `test`.`t1` trx id 1366 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000002; asc     ;;

------------------
---TRANSACTION 1333, ACTIVE (PREPARED) 244 sec
3 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 139679250110208, query id 239 Waiting for its turn to commit.
---TRANSACTION 1332, ACTIVE (PREPARED) 244 sec
3 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 139679249577728, query id 243 Waiting for its turn to commit.
---TRANSACTION 1331, ACTIVE (PREPARED) 244 sec
3 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 1
MySQL thread id 7, OS thread handle 139679250642688, query id 235 Waiting for its turn to commit.
---TRANSACTION 1330, ACTIVE (PREPARED) 244 sec
1 lock struct(s), heap size 1160, 0 row lock(s), undo log entries 1
MySQL thread id 8, OS thread handle 139679250376448, query id 237 Waiting for its turn to commit.
[22 May 2015 12:13] Marko Mäkelä
Posted by developer:
 
The scenario in the bug description demonstrates that InnoDB record locking is not symmetric.

First of all, we should keep in mind that InnoDB does not have a concept of table row locks.
Instead, it has two primary kinds of locks: table locks, and index record locks.

We have a possible conflict between two single-statement transactions: UPDATE and DELETE.
Internally, both transactions are implemented as UPDATE inside InnoDB.
We will first x-lock the matching index records, and then update them. Let me show it in detail:

The two single-statement transactions that we have are:

T1: UPDATE t4 SET b=NULL WHERE a=6;
T2: DELETE FROM t4 WHERE b <= 3;

T1 would do the following:
1. Look up a=6 in the clustered index.
2. X-lock the matching clustered index record (a,b)=(6,6).
3. X-lock the matching secondary index record (b,a)=(6,6).
4. Write an update_undo record covering the changes in 5,6,7.
5. Update the clustered index record (6,6) to (6,NULL).
6. Delete-mark the secondary index record (6,6).
7. Insert the secondary index record (6,NULL).

Steps 2,3,4 may be in a different order.

T2 could do the following:
1. Search for b<=3 in the secondary index.
This might not match any recs locked or modified by T1.
2. X-lock the matching records in the secondary index.
Note: Bug#3300 introduced the sloppy 'semi-consistent' semantics that non-matching rows will not be locked!
3. X-lock the corresponding records in the clustered index.
4. For each match:
4.1. write update_undo records for the DELETE operation
4.2. delete-mark the clustered index record
4.3. delete-mark the secondary index record
[22 May 2015 12:19] Marko Mäkelä
Posted by developer:
 
The asymmetry in my example is that the UPDATE will not block the DELETE, but the DELETE would block the UPDATE.

The DELETE is only matching the record (2,2).

Before the UPDATE and the DELETE, the internal order of records in the secondary index (b,a) would be (NULL,1),(NULL,3),(NULL,5),(2,2),(4,4),(6,6).

Internally, the secondary index record inserted by the UPDATE is (NULL,6), and it would be ordered right before the (2,2) that was probably record-and-gap locked by the DELETE.

Maybe this asymmetry would go away (and the UPDATE would block the DELETE) if the fix of Bug#3300 was disabled?
[26 Jun 2015 16:01] David Moss
The following was added to the 5.7.8 changelog:
When using a multi-threaded slave with slave_preserve_commit_order=1, certain combinations of transactions being applied in parallel could cause a deadlock and stop the slave responding. The fix introduces a check for such deadlocks, so that when a transaction needs to wait for another transaction to release a row lock, InnoDB checks if there is a deadlock caused by the commit order. If it finds a deadlock caused by the commit order, it sets a deadlock flag for the slave worker which is holding the row lock. Then the worker rolls back its transaction and tries again.