Bug #95572 Rollback of big transaction due to conflict remains in hung state
Submitted: 30 May 2019 3:01 Modified: 4 Sep 2019 21:18
Reporter: Narendra Singh Chauhan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.17 OS:Any
Assigned to: CPU Architecture:Any

[30 May 2019 3:01] Narendra Singh Chauhan
Description:
Scenario: In GR setup {M1, M2},
M1> Begin a big transaction, but within the limit of group_replication_transaction_size_limit. With GTID_NEXT lets say: UUID:10
M2> Execute a statement. With same GTID_NEXT: UUID:10
M1> commit;

Expectation: M1 Must rollback the transaction stating that: "The requested GTID '88888888-aaaa-aaaa-aaaa-888888888888:10' was already used, the transaction will rollback.".

Observation: The rollback command remains hung. Reason: "Difficult to find free blocks in the buffer pool".
And, "INNODB MONITOR OUTPUT" fills the error log with details like:-
====
RW-LOCK: 0x9bdadf0
Locked: thread 139898762602240 file btr0cur.cc line 836  SX-LOCK
RW-LOCK: 0x9cf67b8
Locked: thread 139898762602240 file fsp0fsp.cc line 3395  X-LOCK
Locked: thread 139898762602240 file fsp0fsp.cc line 3395  X-LOCK
Locked: thread 139898762602240 file fsp0fsp.cc line 3395  X-LOCK
...
...
Locked: thread 139898762602240 file fsp0fsp.cc line 3061  X-LOCK
RW-LOCK: 0x7f3cba7ffed0
Locked: thread 139898762602240 file lob0index.cc line 100  S-LOCK
RW-LOCK: 0x7f3cba7ffd98
Locked: thread 139898762602240 file lob0index.cc line 100  X-LOCK
...
...
====

mysqld.1.err:-
===============
.....
.....
2019-05-29T13:02:53.907585Z 18 [ERROR] [MY-011443] [Repl] Plugin group_replication reported: 'The requested GTID '88888888-aaaa-aaaa-aaaa-888888888888:10' was already used, the transaction will rollback.'
2019-05-29T13:05:25.710196Z 13 [Warning] [MY-011959] [InnoDB] Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 1. 2829 OS file reads, 43819 OS file writes, 4574 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.
.....
.....

Find attached MTR testcase output with the bug.

How to repeat:
Tested on 8.0.18 and 8.0.17

Steps to repro:-
================
Run command:-
./mtr group_replication.gr_test_big_transaction_rollback_hangs.test \
 --big-test

Where,
$ cat gr_test_big_transaction_rollback_hangs.test
###########################################################################
--source include/big_test.inc
--let $group_replication_group_name= 88888888-aaaa-aaaa-aaaa-888888888888
--source include/have_group_replication_plugin.inc
--let $rpl_skip_group_replication_start= 1
--let $rpl_group_replication_single_primary_mode=1
--source include/group_replication.inc

--let $rpl_connection_name= server1
--source include/rpl_connection.inc
--source include/start_and_bootstrap_group_replication.inc

--let $rpl_connection_name= server2
--source include/rpl_connection.inc
--source include/start_group_replication.inc

--let $rpl_connection_name= server1
--source include/rpl_connection.inc
CREATE TABLE tab1 (a INT PRIMARY KEY NOT NULL, b LONGTEXT, c LONGTEXT);
INSERT INTO tab1 VALUES (1, 'a', 'a');

set profiling=1;
SET @@SESSION.GTID_NEXT= '88888888-aaaa-aaaa-aaaa-888888888888:10';
BEGIN;
INSERT INTO tab1 VALUES (2, repeat('a',60000000), repeat('b',60000000));
--sleep 1

--echo
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
SET @@SESSION.GTID_NEXT= '88888888-aaaa-aaaa-aaaa-888888888888:10';
FLUSH RELAY LOGS;
SET @@SESSION.GTID_NEXT= 'automatic';

SELECT @@global.gtid_executed;

--echo
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
--echo ## Expect transaction to rollback.
--echo ## But, issue here.. this hangs.
--error ER_TRANSACTION_ROLLBACK_DURING_COMMIT
COMMIT;
SET @@SESSION.GTID_NEXT= 'automatic';

SHOW PROFILES;
SELECT @@global.gtid_executed;

--die "OK TO STOP HERE"
[20 Jun 2019 6:20] Narendra Singh Chauhan
Posted by developer:
 
Verified on 5.7.27. I can see that plugin is able to instruct to rollback the transaction successfully.

Run command:- 
$ ./mtr group_replication.gr_test_big_transaction_rollback_hangs.test \
 --big-test --mysqld=--max_allowed_packet=67108864

Do changes in three lines in the testcase s{include{../inc{:-
--source include/have_group_replication_plugin.inc
--source include/group_replication.inc--source
--source include/start_and_bootstrap_group_replication.inc
TO:
--source ../inc/have_group_replication_plugin.inc
--source ../inc/group_replication.inc--source
--source ../inc/start_and_bootstrap_group_replication.inc

MTR testcase output:-
=====================
include/group_replication.inc
Warnings:
Note	####	Sending passwords in plain text without SSL/TLS is extremely insecure.
Note	####	Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[connection server1]
[connection server1]
include/start_and_bootstrap_group_replication.inc
[connection server2]
include/start_group_replication.inc
[connection server1]
CREATE TABLE tab1 (a INT PRIMARY KEY NOT NULL, b LONGTEXT, c LONGTEXT);
INSERT INTO tab1 VALUES (1, 'a', 'a');
set profiling=1;
Warnings:
Warning	1287	'@@profiling' is deprecated and will be removed in a future release.
SET @@SESSION.GTID_NEXT= '88888888-aaaa-aaaa-aaaa-888888888888:10';
BEGIN;
INSERT INTO tab1 VALUES (2, repeat('a',60000000), repeat('b',60000000));

[connection server2]
SET @@SESSION.GTID_NEXT= '88888888-aaaa-aaaa-aaaa-888888888888:10';
FLUSH RELAY LOGS;
SET @@SESSION.GTID_NEXT= 'automatic';
SELECT @@global.gtid_executed;
@@global.gtid_executed
88888888-aaaa-aaaa-aaaa-888888888888:1-4:10

[connection server1]
## Expect transaction to rollback.
## But, issue here.. this hangs.
COMMIT;
ERROR HY000: Plugin instructed the server to rollback the current transaction.  << As EXPECTED transaction is rolled-back.
SET @@SESSION.GTID_NEXT= 'automatic';
SHOW PROFILES;
Query_ID	Duration	Query
1	0.00081450	SHOW WARNINGS
2	0.00225200	SET @@SESSION.GTID_NEXT= '88888888-aaaa-aaaa-aaaa-888888888888:10'
3	0.00170800	BEGIN
4	9.83360500	INSERT INTO tab1 VALUES (2, repeat('a',60000000), repeat('b',60000000))
5	28.89524825	COMMIT
6	0.00015275	SET @@SESSION.GTID_NEXT= 'automatic'
Warnings:
Warning	1287	'SHOW PROFILES' is deprecated and will be removed in a future release. Please use Performance Schema instead
SELECT @@global.gtid_executed;
@@global.gtid_executed
88888888-aaaa-aaaa-aaaa-888888888888:1-4:10
[100%] group_replication.gr_test_big_transaction_rollback_hangs [ fail ]
[4 Sep 2019 21:18] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.19 release, and here's the changelog entry:

Rollback of an INSERT operation that inserted a LOB value larger than a
buffer pool caused a deadlock.