Bug #95863 | MTS ordered commit may cause deadlock | ||
---|---|---|---|
Submitted: | 18 Jun 2019 14:59 | Modified: | 23 Jul 2020 8:41 |
Reporter: | lalit Choudhary | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.7.26,8.0.16, 5.7.27 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[18 Jun 2019 14:59]
lalit Choudhary
[19 Jun 2019 7:18]
lalit Choudhary
Also, reproduciable with 8.0.16 MySQL Community Server
[19 Jun 2019 9:09]
MySQL Verification Team
Hello Lalit, Thank you for the report and test case. Verified as described with 5.7.26 build. regards, Umesh
[22 Oct 2019 11:57]
Mikhail Izioumtchenko
I wonder if there is a workaround for this? We hit this bug fairly often
[22 Oct 2019 12:15]
Mikhail Izioumtchenko
we are seeing this also with 8.0.18
[23 Jul 2020 8:41]
lalit Choudhary
Try slave_preserve_commit_order=off
[27 Jul 2020 10:52]
Mikhail Izioumtchenko
slave_preserve_commit_order=off might work but we are of the opinion that we must preserve commit order on some of the replication chains.
[21 Sep 2020 12:03]
Phong Dinh
Hi, Bug still reproducible with mysql 8.0.20 and in my test it is for `FLUSH TABLES WITH READ LOCK' command on Slave.
[21 Sep 2020 12:04]
Phong Dinh
Below is the test case. (1) master-slave, enable multithreaded replication. [mysqld] slave_parallel_type=LOGICAL_CLOCK slave_parallel_workers=16 slave_preserve_commit_order=ON master-info-repository=table relay-log-info-repository=table gtid_mode=ON log-slave-updates enforce-gtid-consistency (2) write data on master by sysbench sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=300 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22001.sock --threads=15 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare (3) execute sql on slave continually,sql: FLUSH TABLE WITH READ LOCK; UNLOCK TABLES; And the output of show processlist on Slave slave1 [localhost:21022] {msandbox} ((none)) > show processlist; +----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 242 | Waiting on empty queue | NULL | | 8 | system user | connecting host | NULL | Connect | 242 | Waiting for master to send event | NULL | | 9 | system user | | NULL | Query | 42 | Waiting for slave workers to process their queues | NULL | | 10 | system user | | NULL | Query | 102 | Waiting for global read lock | NULL | | 11 | system user | | NULL | Query | 102 | Waiting for commit lock | NULL | | 12 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL | | 13 | system user | | NULL | Query | 102 | Waiting for global read lock | NULL | | 14 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL | | 15 | system user | | NULL | Query | 102 | Waiting for preceding transaction to commit | NULL | | 16 | msandbox | localhost | NULL | Query | 42 | Waiting for commit lock | flush tables with read lock | | 17 | msandbox | localhost | NULL | Query | 0 | starting | show processlist | +----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+ 11 rows in set (0.00 sec)
[11 Oct 2020 19:10]
Vinicius Malvestio Grippa
Repeatable on version 5.7.29
[31 Aug 2021 9:12]
MySQL Verification Team
Has anybody seen this hangup in 8.0.23 or newer, which got this WL implemented: https://dev.mysql.com/worklog/task/?id=13574 ??
[11 Feb 2022 9:16]
Cristi Coteț
Yes, with: Server version: 8.0.27 MySQL Community Server - GPL CentOS 7: Linux [hostname-removed] 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux We have 3 instances: primary is on 8.0.26 and 2 secondary on version 8.0.27. Both secondary instances reached this status at the same time. It's the first time it happened (uptime 99 days+) +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ | 5 | system user | connecting host | | Connect | 8636193 | Waiting for source to send event | 6 | system user | | | Query | 45685 | Waiting for replica workers to process their queues | 7 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 8 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 9 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 10 | system user | | | Query | 45685 | Waiting for preceding transaction to commit | 11 | event_scheduler | localhost | | Daemon | 8636193 | Waiting on empty queue | 69691040 | user1-removed | [hostname5-removed]:50933 | db1-remo | Sleep | 3381 | 69691043 | user1-removed | [hostname5-removed]:50934 | db1-remo | Sleep | 3381 | 69719712 | xxxrouter-user | [hostname4-removed]:52692 | | Sleep | 0 | 69719713 | xxxrouter-user | [hostname1-removed]:57952 | | Sleep | 0 | 69719715 | xxxrouter-user | [hostname3-removed]:50540 | | Sleep | 0 | 69719716 | xxxrouter-user | [hostname2-removed]:42338 | | Sleep | 0 | 69719717 | xxxrouter-user | [hostname5-removed]:48568 | | Sleep | 0 | 69719718 | root | [hostname1-removed]:57994 | | Query | 0 | init | show full processlist +----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+ Uptime: 8636199 Threads: 15 Questions: 6960946764 Slow queries: 776 Opens: 274408 Flush tables: 3 Open tables: 4285 Queries per second avg: 806.019
[11 Feb 2022 16:18]
Ting-Lin Chu
8.0.26 also impacted this bug. mysql> show processlist; +---------+----------------------------+--------------------+--------------------+---------+---------+--------------------------------- --------------------+----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+----------------------------+--------------------+--------------------+---------+---------+--------------------------------- --------------------+----------------------------------+ | 6 | event_scheduler | localhost | NULL | Daemon | 6766815 | Waiting on empty queue | NULL | | 10 | system user | | NULL | Connect | 6766815 | waiting for handler commit | Group replication applier module | | 13 | system user | | NULL | Query | 27964 | Waiting for replica workers to p rocess their queues | NULL | | 14 | system user | | NULL | Query | 27964 | Waiting for preceding transactio n to commit | NULL | | 15 | system user | | NULL | Query | 27964 | Waiting for preceding transactio n to commit | NULL | | 16 | system user | | NULL | Query | 27964 | Waiting for preceding transactio n to commit | NULL | | 17 | system user | | NULL | Query | 27964 | Waiting for preceding transactio n to commit | NULL Show engine innodb status ---TRANSACTION 24093172627, ACTIVE (PREPARED) 28709 sec 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3 MySQL thread id 177, OS thread handle 139965536835328, query id 6080407878 Waiti ng for preceding transaction to commit ---TRANSACTION 24093172626, ACTIVE (PREPARED) 28709 sec 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3 MySQL thread id 176, OS thread handle 139965669926656, query id 6080407875 Waiti ng for preceding transaction to commit ---TRANSACTION 24093172625, ACTIVE (PREPARED) 28709 sec 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3 MySQL thread id 174, OS thread handle 139966133860096, query id 6080407872 Waiti ng for preceding transaction to commit ---TRANSACTION 24093172623, ACTIVE (PREPARED) 28709 sec 5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3 MySQL thread id 175, OS thread handle 139965670328064, query id 6080407868 Waiti ng for preceding transaction to commit -------- can't stop group replication, client hang mysql> stop group_replication; also can't stop mysql service, need using kill -9 to terminate mysql process.
[19 Jun 2022 20:33]
Scott Nemes
The severity of this bug should be higher. We are currently hitting this bug on a daily basis, which completely locks up the replica, and requires the replica to be killed and restarted. Running 8.0.26.
[28 Jun 2022 15:34]
MySQL Verification Team
Hi All, Subsequent to my last message where I wrote: > > Has anybody seen this hangup in 8.0.23 or newer, which got this WL implemented: > https://dev.mysql.com/worklog/task/?id=13574 ?? > There has been another fix in 8.0.28. Bug#32891221 REPLICA HANGS WITH REPLICA_PRESERVE_COMMIT_ORDER ON https://bugs.mysql.com/bug.php?id=103636 So please try newer versions. Thanks! -- sbester
[3 Nov 2022 17:50]
Perry Harrington
Was this bug fixed in 5.7? Sven stated it was part of WL#13574 or "subsequent bugfixes". I see the reference to a subsequent bug fixed in 8.0.28, but I wonder if the fix was also applied to 5.7, since it was in the affected versions? Thanks, --Perry