| Bug #95863 | MTS ordered commit may cause deadlock | ||
|---|---|---|---|
| Submitted: | 18 Jun 2019 14:59 | Modified: | 23 Jul 2020 8:41 |
| Reporter: | lalit Choudhary | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.7.26,8.0.16, 5.7.27 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[19 Jun 2019 7:18]
lalit Choudhary
Also, reproduciable with 8.0.16 MySQL Community Server
[19 Jun 2019 9:09]
MySQL Verification Team
Hello Lalit, Thank you for the report and test case. Verified as described with 5.7.26 build. regards, Umesh
[22 Oct 2019 11:57]
Mikhail Izioumtchenko
I wonder if there is a workaround for this? We hit this bug fairly often
[22 Oct 2019 12:15]
Mikhail Izioumtchenko
we are seeing this also with 8.0.18
[23 Jul 2020 8:41]
lalit Choudhary
Try slave_preserve_commit_order=off
[27 Jul 2020 10:52]
Mikhail Izioumtchenko
slave_preserve_commit_order=off might work but we are of the opinion that we must preserve commit order on some of the replication chains.
[21 Sep 2020 12:03]
Phong Dinh
Hi, Bug still reproducible with mysql 8.0.20 and in my test it is for `FLUSH TABLES WITH READ LOCK' command on Slave.
[21 Sep 2020 12:04]
Phong Dinh
Below is the test case.
(1) master-slave, enable multithreaded replication.
[mysqld]
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=16
slave_preserve_commit_order=ON
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
log-slave-updates
enforce-gtid-consistency
(2) write data on master by sysbench
sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=300 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22001.sock --threads=15 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare
(3) execute sql on slave continually,sql:
FLUSH TABLE WITH READ LOCK; UNLOCK TABLES;
And the output of show processlist on Slave
slave1 [localhost:21022] {msandbox} ((none)) > show processlist;
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 242 | Waiting on empty queue | NULL |
| 8 | system user | connecting host | NULL | Connect | 242 | Waiting for master to send event | NULL |
| 9 | system user | | NULL | Query | 42 | Waiting for slave workers to process their queues | NULL |
| 10 | system user | | NULL | Query | 102 | Waiting for global read lock | NULL |
| 11 | system user | | NULL | Query | 102 | Waiting for commit lock | NULL |
| 12 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL |
| 13 | system user | | NULL | Query | 102 | Waiting for global read lock | NULL |
| 14 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL |
| 15 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL |
| 16 | msandbox | localhost | NULL | Query | 42 | Waiting for commit lock | flush tables with read lock |
| 17 | msandbox | localhost | NULL | Query | 0 | starting | show processlist |
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
11 rows in set (0.00 sec)
[11 Oct 2020 19:10]
Vinicius Malvestio Grippa
Repeatable on version 5.7.29
[31 Aug 2021 9:12]
MySQL Verification Team
Has anybody seen this hangup in 8.0.23 or newer, which got this WL implemented: https://dev.mysql.com/worklog/task/?id=13574 ??
[11 Feb 2022 9:16]
Cristi Coteț
Yes, with: Server version: 8.0.27 MySQL Community Server - GPL CentOS 7: Linux [hostname-removed] 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux We have 3 instances: primary is on 8.0.26 and 2 secondary on version 8.0.27. Both secondary instances reached this status at the same time. It's the first time it happened (uptime 99 days+) +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ | 5 | system user | connecting host | | Connect | 8636193 | Waiting for source to send event | 6 | system user | | | Query | 45685 | Waiting for replica workers to process their queues | 7 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 8 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 9 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 10 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 11 | event_scheduler | localhost | | Daemon | 8636193 | Waiting on empty queue | 69691040 | user1-removed | [hostname5-removed]:50933 | db1-remo | Sleep | 3381 | 69691043 | user1-removed | [hostname5-removed]:50934 | db1-remo | Sleep | 3381 | 69719712 | xxxrouter-user | [hostname4-removed]:52692 | | Sleep | 0 | 69719713 | xxxrouter-user | [hostname1-removed]:57952 | | Sleep | 0 | 69719715 | xxxrouter-user | [hostname3-removed]:50540 | | Sleep | 0 | 69719716 | xxxrouter-user | [hostname2-removed]:42338 | | Sleep | 0 | 69719717 | xxxrouter-user | [hostname5-removed]:48568 | | Sleep | 0 | 69719718 | root | [hostname1-removed]:57994 | | Query | 0 | init | show full processlist +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ Uptime: 8636199 Threads: 15 Questions: 6960946764 Slow queries: 776 Opens: 274408 Flush tables: 3 Open tables: 4285 Queries per second avg: 806.019
[11 Feb 2022 16:18]
Ting-Lin Chu
8.0.26 also impacted this bug.
mysql> show processlist;
+---------+----------------------------+--------------------+--------------------+---------+---------+---------------------------------
--------------------+----------------------------------+
| Id | User | Host | db | Command | Time | State
| Info |
+---------+----------------------------+--------------------+--------------------+---------+---------+---------------------------------
--------------------+----------------------------------+
| 6 | event_scheduler | localhost | NULL | Daemon | 6766815 | Waiting on empty queue
| NULL |
| 10 | system user | | NULL | Connect | 6766815 | waiting for handler commit
| Group replication applier module |
| 13 | system user | | NULL | Query | 27964 | Waiting for replica workers to p
rocess their queues | NULL |
| 14 | system user | | NULL | Query | 27964 | Waiting for preceding transactio
n to commit | NULL |
| 15 | system user | | NULL | Query | 27964 | Waiting for preceding transactio
n to commit | NULL |
| 16 | system user | | NULL | Query | 27964 | Waiting for preceding transactio
n to commit | NULL |
| 17 | system user | | NULL | Query | 27964 | Waiting for preceding transactio
n to commit | NULL
Show engine innodb status
---TRANSACTION 24093172627, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 177, OS thread handle 139965536835328, query id 6080407878 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172626, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 176, OS thread handle 139965669926656, query id 6080407875 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172625, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 174, OS thread handle 139966133860096, query id 6080407872 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172623, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 175, OS thread handle 139965670328064, query id 6080407868 Waiti
ng for preceding transaction to commit
--------
can't stop group replication, client hang
mysql> stop group_replication;
also can't stop mysql service, need using kill -9 to terminate mysql process.
[19 Jun 2022 20:33]
Scott Nemes
The severity of this bug should be higher. We are currently hitting this bug on a daily basis, which completely locks up the replica, and requires the replica to be killed and restarted. Running 8.0.26.
[28 Jun 2022 15:34]
MySQL Verification Team
Hi All, Subsequent to my last message where I wrote: > > Has anybody seen this hangup in 8.0.23 or newer, which got this WL implemented: > https://dev.mysql.com/worklog/task/?id=13574 ?? > There has been another fix in 8.0.28. Bug#32891221 REPLICA HANGS WITH REPLICA_PRESERVE_COMMIT_ORDER ON https://bugs.mysql.com/bug.php?id=103636 So please try newer versions. Thanks! -- sbester
[3 Nov 2022 17:50]
Perry Harrington
Was this bug fixed in 5.7? Sven stated it was part of WL#13574 or "subsequent bugfixes". I see the reference to a subsequent bug fixed in 8.0.28, but I wonder if the fix was also applied to 5.7, since it was in the affected versions? Thanks, --Perry

Description: slave1 [localhost] {msandbox} ((none)) > slave1 [localhost] {msandbox} ((none)) > show processlist; +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ | 23 | system user | | NULL | Connect | 236 | Waiting for master to send event | NULL | | 24 | system user | | NULL | Connect | 36 | Waiting for dependent transaction to commit | NULL | | 25 | system user | | NULL | Connect | 54 | Waiting for commit lock | NULL | | 26 | system user | | NULL | Connect | 53 | Waiting for preceding transaction to commit | NULL | | 27 | system user | | NULL | Connect | 53 | Waiting for an event from Coordinator | NULL | | 28 | system user | | NULL | Connect | 54 | Waiting for an event from Coordinator | NULL | | 29 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 30 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 31 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 32 | system user | | NULL | Connect | 56 | Waiting for an event from Coordinator | NULL | | 33 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 34 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 35 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 36 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 37 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 38 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 39 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 40 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 41 | msandbox | localhost | NULL | Query | 36 | Waiting for commit lock | set global super_read_only=ON | | 42 | msandbox | localhost | NULL | Query | 0 | starting | show processlist | +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ 20 rows in set (0.00 sec) SHOW SLAVE STATUS\G Slave_SQL_Running_State: Waiting for dependent transaction to commit How to repeat: (1) master-slave, enable multithreaded replication. [mysqld] slave_parallel_type=LOGICAL_CLOCK slave_parallel_workers=16 slave_preserve_commit_order=ON master-info-repository=table relay-log-info-repository=table gtid_mode=ON log-slave-updates enforce-gtid-consistency (2) write data on master by sysbench sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=300 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22001.sock --threads=15 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare (3) execute sql on slave continually,sql: set global super_read_only=OFF; set global super_read_only=ON; when deadlock occurs, show processlist as below: slave1 [localhost] {msandbox} ((none)) > slave1 [localhost] {msandbox} ((none)) > show processlist; +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ | 23 | system user | | NULL | Connect | 236 | Waiting for master to send event | NULL | | 24 | system user | | NULL | Connect | 36 | Waiting for dependent transaction to commit | NULL | | 25 | system user | | NULL | Connect | 54 | Waiting for commit lock | NULL | | 26 | system user | | NULL | Connect | 53 | Waiting for preceding transaction to commit | NULL | | 27 | system user | | NULL | Connect | 53 | Waiting for an event from Coordinator | NULL | | 28 | system user | | NULL | Connect | 54 | Waiting for an event from Coordinator | NULL | | 29 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 30 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 31 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 32 | system user | | NULL | Connect | 56 | Waiting for an event from Coordinator | NULL | | 33 | system user | | NULL | Connect | 55 | Waiting for an event from Coordinator | NULL | | 34 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 35 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 36 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 37 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 38 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 39 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 40 | system user | | NULL | Connect | 236 | Waiting for an event from Coordinator | NULL | | 41 | msandbox | localhost | NULL | Query | 36 | Waiting for commit lock | set global super_read_only=ON | | 42 | msandbox | localhost | NULL | Query | 0 | starting | show processlist | +----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+ 20 rows in set (0.00 sec)