Description:
Whether it's an XA transaction or a commit with Binlog enabled, it will trigger the InnoDB transaction to complete the commit using the 2PC protocol. When an InnoDB transaction needs to `prepare`, under the `READ_UNCOMMITTED` and `READ_COMMITTED` isolation levels, the InnoDB transaction will release the READ GAP LOCK it holds in advance.
Within the `lock_trx_release_read_locks` function, locks are released under two scenarios: optimistic and pessimistic. Optimistic release allows five attempts; if all attempts fail, it enters a pessimistic release loop, which continues indefinitely until all locks held by the current transaction have been processed:
```cpp
void lock_trx_release_read_locks(trx_t *trx, bool only_gap) {
const size_t MAX_FAILURES = 5;
// optimistic
for (size_t failures = 0; failures < MAX_FAILURES; ++failures) {
if (locksys::try_release_read_locks_in_s_mode(trx, only_gap)) {
return;
}
std::this_thread::yield();
}
// pessimistic
while (!locksys::try_release_read_locks_in_x_mode(trx, only_gap)) {
std::this_thread::yield();
}
}
```
However, a 1s timeout is set during pessimistic release. If this transaction holds many locks (and they are not gap locks) and does not complete processing within 1 second, it will start processing from the beginning again, over and over. This will cause the transaction to get stuck here indefinitely!
```cpp
static bool try_release_read_locks_in_x_mode(trx_t *trx,
bool only_gap) {
Global_exclusive_latch_guard guard{UT_LOCATION_HERE};
const auto started_at = std::chrono::steady_clock::now();
trx_mutex_enter_first_of_two(trx);
for (auto lock : trx->lock.trx_locks.removable()) {
// If we don't process all the Locks within 1 second,
// we'll have to start all over again!
if (MAX_CS_DURATION < std::chrono::steady_clock::now() - started_at) {
trx_mutex_exit(trx);
return false;
}
lock_release_read_lock(lock, only_gap);
}
trx_mutex_exit(trx);
return true;
}
```
How to repeat:
I wrote a test case modeled after `lock_trx_release_read_locks_in_x_mode.test` to simulate this problem. I also slightly modified the `try_release_read_locks_in_x_mode` function to simulate the slow processing speed during iteration. All modifications are as follows:
```
diff --git a/mysql-test/t/bug_release_locks.test b/mysql-test/t/bug_release_locks.test
new file mode 100644
index 00000000000..60195cc2b60
--- /dev/null
+++ b/mysql-test/t/bug_release_locks.test
@@ -0,0 +1,49 @@
+--source include/have_debug_sync.inc
+--source include/count_sessions.inc
+
+# keep in sync with MAX_FAILURES defined in lock_trx_release_read_locks()
+--let MAX_FAILURES=5
+--let i=0
+while($i<=$MAX_FAILURES)
+{
+ --eval CREATE TABLE t$i (id INT PRIMARY KEY) ENGINE=InnoDB
+ --inc $i
+}
+INSERT INTO t0 (id) VALUES (1);
+
+--connect (c0, localhost, root,,)
+ SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
+ XA START 'x';
+ # create at least MAX_FAILURES implicit locks
+ --let i=1
+ while($i<=$MAX_FAILURES)
+ {
+ --eval INSERT INTO t$i (id) VALUES (1);
+ --inc $i
+ }
+ # create at least 1 explicit lock
+ SELECT * FROM t0 WHERE id=1 FOR UPDATE;
+ XA END 'x';
+ SET DEBUG='+d,simulate_release_many_locks';
+ SET DEBUG_SYNC='try_relatch_trx_and_shard_and_do_noted_expected_version
+ SIGNAL c0_noted_expected_version
+ WAIT_FOR c0_can_go
+ EXECUTE 5';
+ --send XA PREPARE 'x'
+
+--let i=1
+while($i<=$MAX_FAILURES)
+{
+ --connect (c$i, localhost, root,,)
+ BEGIN;
+ SET DEBUG_SYNC = 'now WAIT_FOR c0_noted_expected_version';
+ --eval SET DEBUG_SYNC='lock_wait_will_wait SIGNAL c0_can_go'
+ --send_eval SELECT * FROM t$i FOR SHARE
+
+ --inc $i
+}
+
+--connection c0
+ --reap
+ XA COMMIT 'x';
+
diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index 0efa1f4e139..bb3fe9bc29d 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -4098,6 +4098,9 @@ It is called during XA prepare to release locks early.
DEBUG_SYNC_C("lock_trx_release_read_locks_in_x_mode_will_release");
lock_release_read_lock(lock, only_gap);
+
+ fprintf(stderr, "trx_locks.count = %ld\n", trx->lock.trx_locks.get_length());
+ DBUG_EXECUTE_IF("simulate_release_many_locks", sleep(1););
}
trx_mutex_exit(trx);
```
Compiling and running the test cases reveals that connection `c0` is stuck in an endless loop of `lock_trx_release_read_locks`. Furthermore, `tailf mysqld.1.err` shows that the number of locks to be processed remains unchanged:
```
$tailf mysqld.1.err
trx_locks.count = 12
trx_locks.count = 12
trx_locks.count = 12
trx_locks.count = 12
trx_locks.count = 12
trx_locks.count = 12
...
```
Suggested fix:
I think pessimistic release also needs a limit on the number of failures. Once the maximum number of failures is exceeded, the timeout limit should be ignored.
Description: Whether it's an XA transaction or a commit with Binlog enabled, it will trigger the InnoDB transaction to complete the commit using the 2PC protocol. When an InnoDB transaction needs to `prepare`, under the `READ_UNCOMMITTED` and `READ_COMMITTED` isolation levels, the InnoDB transaction will release the READ GAP LOCK it holds in advance. Within the `lock_trx_release_read_locks` function, locks are released under two scenarios: optimistic and pessimistic. Optimistic release allows five attempts; if all attempts fail, it enters a pessimistic release loop, which continues indefinitely until all locks held by the current transaction have been processed: ```cpp void lock_trx_release_read_locks(trx_t *trx, bool only_gap) { const size_t MAX_FAILURES = 5; // optimistic for (size_t failures = 0; failures < MAX_FAILURES; ++failures) { if (locksys::try_release_read_locks_in_s_mode(trx, only_gap)) { return; } std::this_thread::yield(); } // pessimistic while (!locksys::try_release_read_locks_in_x_mode(trx, only_gap)) { std::this_thread::yield(); } } ``` However, a 1s timeout is set during pessimistic release. If this transaction holds many locks (and they are not gap locks) and does not complete processing within 1 second, it will start processing from the beginning again, over and over. This will cause the transaction to get stuck here indefinitely! ```cpp static bool try_release_read_locks_in_x_mode(trx_t *trx, bool only_gap) { Global_exclusive_latch_guard guard{UT_LOCATION_HERE}; const auto started_at = std::chrono::steady_clock::now(); trx_mutex_enter_first_of_two(trx); for (auto lock : trx->lock.trx_locks.removable()) { // If we don't process all the Locks within 1 second, // we'll have to start all over again! if (MAX_CS_DURATION < std::chrono::steady_clock::now() - started_at) { trx_mutex_exit(trx); return false; } lock_release_read_lock(lock, only_gap); } trx_mutex_exit(trx); return true; } ``` How to repeat: I wrote a test case modeled after `lock_trx_release_read_locks_in_x_mode.test` to simulate this problem. I also slightly modified the `try_release_read_locks_in_x_mode` function to simulate the slow processing speed during iteration. All modifications are as follows: ``` diff --git a/mysql-test/t/bug_release_locks.test b/mysql-test/t/bug_release_locks.test new file mode 100644 index 00000000000..60195cc2b60 --- /dev/null +++ b/mysql-test/t/bug_release_locks.test @@ -0,0 +1,49 @@ +--source include/have_debug_sync.inc +--source include/count_sessions.inc + +# keep in sync with MAX_FAILURES defined in lock_trx_release_read_locks() +--let MAX_FAILURES=5 +--let i=0 +while($i<=$MAX_FAILURES) +{ + --eval CREATE TABLE t$i (id INT PRIMARY KEY) ENGINE=InnoDB + --inc $i +} +INSERT INTO t0 (id) VALUES (1); + +--connect (c0, localhost, root,,) + SET TRANSACTION ISOLATION LEVEL READ COMMITTED; + XA START 'x'; + # create at least MAX_FAILURES implicit locks + --let i=1 + while($i<=$MAX_FAILURES) + { + --eval INSERT INTO t$i (id) VALUES (1); + --inc $i + } + # create at least 1 explicit lock + SELECT * FROM t0 WHERE id=1 FOR UPDATE; + XA END 'x'; + SET DEBUG='+d,simulate_release_many_locks'; + SET DEBUG_SYNC='try_relatch_trx_and_shard_and_do_noted_expected_version + SIGNAL c0_noted_expected_version + WAIT_FOR c0_can_go + EXECUTE 5'; + --send XA PREPARE 'x' + +--let i=1 +while($i<=$MAX_FAILURES) +{ + --connect (c$i, localhost, root,,) + BEGIN; + SET DEBUG_SYNC = 'now WAIT_FOR c0_noted_expected_version'; + --eval SET DEBUG_SYNC='lock_wait_will_wait SIGNAL c0_can_go' + --send_eval SELECT * FROM t$i FOR SHARE + + --inc $i +} + +--connection c0 + --reap + XA COMMIT 'x'; + diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index 0efa1f4e139..bb3fe9bc29d 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -4098,6 +4098,9 @@ It is called during XA prepare to release locks early. DEBUG_SYNC_C("lock_trx_release_read_locks_in_x_mode_will_release"); lock_release_read_lock(lock, only_gap); + + fprintf(stderr, "trx_locks.count = %ld\n", trx->lock.trx_locks.get_length()); + DBUG_EXECUTE_IF("simulate_release_many_locks", sleep(1);); } trx_mutex_exit(trx); ``` Compiling and running the test cases reveals that connection `c0` is stuck in an endless loop of `lock_trx_release_read_locks`. Furthermore, `tailf mysqld.1.err` shows that the number of locks to be processed remains unchanged: ``` $tailf mysqld.1.err trx_locks.count = 12 trx_locks.count = 12 trx_locks.count = 12 trx_locks.count = 12 trx_locks.count = 12 trx_locks.count = 12 ... ``` Suggested fix: I think pessimistic release also needs a limit on the number of failures. Once the maximum number of failures is exceeded, the timeout limit should be ignored.