Bug #95863 MTS ordered commit may cause deadlock
Submitted: 18 Jun 2019 14:59 Modified: 23 Jul 2020 8:41
Reporter: lalit Choudhary Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.26,8.0.16, 5.7.27 OS:Any
Assigned to: CPU Architecture:Any

[18 Jun 2019 14:59] lalit Choudhary
Description:
slave1 [localhost] {msandbox} ((none)) > slave1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| Id | User        | Host      | db   | Command | Time | State                                       | Info                          |
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| 23 | system user |           | NULL | Connect |  236 | Waiting for master to send event            | NULL                          |
| 24 | system user |           | NULL | Connect |   36 | Waiting for dependent transaction to commit | NULL                          |
| 25 | system user |           | NULL | Connect |   54 | Waiting for commit lock                     | NULL                          |
| 26 | system user |           | NULL | Connect |   53 | Waiting for preceding transaction to commit | NULL                          |
| 27 | system user |           | NULL | Connect |   53 | Waiting for an event from Coordinator       | NULL                          |
| 28 | system user |           | NULL | Connect |   54 | Waiting for an event from Coordinator       | NULL                          |
| 29 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 30 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 31 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 32 | system user |           | NULL | Connect |   56 | Waiting for an event from Coordinator       | NULL                          |
| 33 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 34 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 35 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 36 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 37 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 38 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 39 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 40 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 41 | msandbox    | localhost | NULL | Query   |   36 | Waiting for commit lock                     | set global super_read_only=ON |
| 42 | msandbox    | localhost | NULL | Query   |    0 | starting                                    | show processlist              |
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
20 rows in set (0.00 sec)

SHOW SLAVE STATUS\G

      Slave_SQL_Running_State: Waiting for dependent transaction to commit

How to repeat:
(1) master-slave, enable multithreaded replication.

[mysqld]
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=16
slave_preserve_commit_order=ON
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
log-slave-updates
enforce-gtid-consistency

(2) write data on master by sysbench

sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=300 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22001.sock --threads=15 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare

(3) execute sql on slave continually,sql:
set global super_read_only=OFF; set global super_read_only=ON;

when deadlock occurs, show processlist as below:

slave1 [localhost] {msandbox} ((none)) > slave1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| Id | User        | Host      | db   | Command | Time | State                                       | Info                          |
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| 23 | system user |           | NULL | Connect |  236 | Waiting for master to send event            | NULL                          |
| 24 | system user |           | NULL | Connect |   36 | Waiting for dependent transaction to commit | NULL                          |
| 25 | system user |           | NULL | Connect |   54 | Waiting for commit lock                     | NULL                          |
| 26 | system user |           | NULL | Connect |   53 | Waiting for preceding transaction to commit | NULL                          |
| 27 | system user |           | NULL | Connect |   53 | Waiting for an event from Coordinator       | NULL                          |
| 28 | system user |           | NULL | Connect |   54 | Waiting for an event from Coordinator       | NULL                          |
| 29 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 30 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 31 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 32 | system user |           | NULL | Connect |   56 | Waiting for an event from Coordinator       | NULL                          |
| 33 | system user |           | NULL | Connect |   55 | Waiting for an event from Coordinator       | NULL                          |
| 34 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 35 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 36 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 37 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 38 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 39 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 40 | system user |           | NULL | Connect |  236 | Waiting for an event from Coordinator       | NULL                          |
| 41 | msandbox    | localhost | NULL | Query   |   36 | Waiting for commit lock                     | set global super_read_only=ON |
| 42 | msandbox    | localhost | NULL | Query   |    0 | starting                                    | show processlist              |
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
20 rows in set (0.00 sec)
[19 Jun 2019 7:18] lalit Choudhary
Also, reproduciable with 8.0.16 MySQL Community Server
[19 Jun 2019 9:09] MySQL Verification Team
Hello Lalit,

Thank you for the report and test case.
Verified as described with 5.7.26 build.

regards,
Umesh
[22 Oct 2019 11:57] Mikhail Izioumtchenko
I wonder if there is a workaround for this? We hit this bug fairly often
[22 Oct 2019 12:15] Mikhail Izioumtchenko
we are seeing this also with 8.0.18
[23 Jul 2020 8:41] lalit Choudhary
Try slave_preserve_commit_order=off
[27 Jul 2020 10:52] Mikhail Izioumtchenko
slave_preserve_commit_order=off might work but we are of the opinion that we must preserve commit order on some of the replication chains.
[21 Sep 2020 12:03] Phong Dinh
Hi,

Bug still reproducible with mysql 8.0.20  and in my test  it is for `FLUSH TABLES WITH READ LOCK' command on Slave.
[21 Sep 2020 12:04] Phong Dinh
Below is the test case.

(1) master-slave, enable multithreaded replication.
[mysqld]
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=16
slave_preserve_commit_order=ON
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
log-slave-updates
enforce-gtid-consistency
(2) write data on master by sysbench
sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=10000 --tables=300 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22001.sock --threads=15 --time=900 --report-interval=1 --events=0 --db-driver=mysql prepare
(3) execute sql on slave continually,sql:
FLUSH TABLE WITH READ LOCK; UNLOCK TABLES;

And the output of show processlist on Slave

slave1 [localhost:21022] {msandbox} ((none)) > show processlist;
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
| Id | User            | Host            | db   | Command | Time | State                                             | Info                        |
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
|  5 | event_scheduler | localhost       | NULL | Daemon  |  242 | Waiting on empty queue                            | NULL                        |
|  8 | system user     | connecting host | NULL | Connect |  242 | Waiting for master to send event                  | NULL                        |
|  9 | system user     |                 | NULL | Query   |   42 | Waiting for slave workers to process their queues | NULL                        |
| 10 | system user     |                 | NULL | Query   |  102 | Waiting for global read lock                      | NULL                        |
| 11 | system user     |                 | NULL | Query   |  102 | Waiting for commit lock                           | NULL                        |
| 12 | system user     |                 | NULL | Query   |  102 | Waiting for preceding transaction to commit       | NULL                        |
| 13 | system user     |                 | NULL | Query   |  102 | Waiting for global read lock                      | NULL                        |
| 14 | system user     |                 | NULL | Query   |  102 | Waiting for preceding transaction to commit       | NULL                        |
| 15 | system user     |                 | NULL | Query   |  102 | Waiting for preceding transaction to commit       | NULL                        |
| 16 | msandbox        | localhost       | NULL | Query   |   42 | Waiting for commit lock                           | flush tables with read lock |
| 17 | msandbox        | localhost       | NULL | Query   |    0 | starting                                          | show processlist            |
+----+-----------------+-----------------+------+---------+------+---------------------------------------------------+-----------------------------+
11 rows in set (0.00 sec)
[11 Oct 2020 19:10] Vinicius Malvestio Grippa
Repeatable on version 5.7.29
[31 Aug 2021 9:12] MySQL Verification Team
Has anybody seen this hangup in 8.0.23 or newer,  which got this WL implemented:
https://dev.mysql.com/worklog/task/?id=13574 ??
[11 Feb 2022 9:16] Cristi Coteț
Yes, with: Server version: 8.0.27 MySQL Community Server - GPL
CentOS 7: Linux [hostname-removed] 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

We have 3 instances: primary is on 8.0.26 and 2 secondary on version 8.0.27. Both secondary instances reached this status at the same time.
It's the first time it happened (uptime 99 days+)

+----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+
| Id       | User            | Host                            | db       | Command | Time    | State                                               | Info
+----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+
| 5        | system user     | connecting host                 |          | Connect | 8636193 | Waiting for source to send event
| 6        | system user     |                                 |          | Query   | 45685   | Waiting for replica workers to process their queues
| 7        | system user     |                                 |          | Query   | 45685   | Waiting for preceding transaction to commit
| 8        | system user     |                                 |          | Query   | 45685   | Waiting for preceding transaction to commit
| 9        | system user     |                                 |          | Query   | 45685   | Waiting for preceding transaction to commit
| 10       | system user     |                                 |          | Query   | 45685   | Waiting for preceding transaction to commit
| 11       | event_scheduler | localhost                       |          | Daemon  | 8636193 | Waiting on empty queue
| 69691040 | user1-removed   | [hostname5-removed]:50933       | db1-remo | Sleep   | 3381
| 69691043 | user1-removed   | [hostname5-removed]:50934       | db1-remo | Sleep   | 3381
| 69719712 | xxxrouter-user  | [hostname4-removed]:52692       |          | Sleep   | 0
| 69719713 | xxxrouter-user  | [hostname1-removed]:57952       |          | Sleep   | 0
| 69719715 | xxxrouter-user  | [hostname3-removed]:50540       |          | Sleep   | 0
| 69719716 | xxxrouter-user  | [hostname2-removed]:42338       |          | Sleep   | 0
| 69719717 | xxxrouter-user  | [hostname5-removed]:48568       |          | Sleep   | 0
| 69719718 | root            | [hostname1-removed]:57994       |          | Query   | 0       | init                                                | show full processlist
+----------+-----------------+---------------------------------+----------+---------+---------+-----------------------------------------------------+-----------------------+
Uptime: 8636199  Threads: 15  Questions: 6960946764  Slow queries: 776  Opens: 274408  Flush tables: 3  Open tables: 4285  Queries per second avg: 806.019
[11 Feb 2022 16:18] Ting-Lin Chu
8.0.26 also impacted this bug.

mysql> show processlist;
+---------+----------------------------+--------------------+--------------------+---------+---------+---------------------------------
--------------------+----------------------------------+
| Id      | User                       | Host               | db                 | Command | Time    | State
                    | Info                             |
+---------+----------------------------+--------------------+--------------------+---------+---------+---------------------------------
--------------------+----------------------------------+
|       6 | event_scheduler            | localhost          | NULL               | Daemon  | 6766815 | Waiting on empty queue
                    | NULL                             |
|      10 | system user                |                    | NULL               | Connect | 6766815 | waiting for handler commit
                    | Group replication applier module |
|      13 | system user                |                    | NULL               | Query   |   27964 | Waiting for replica workers to p
rocess their queues | NULL                             |
|      14 | system user                |                    | NULL               | Query   |   27964 | Waiting for preceding transactio
n to commit         | NULL                             |
|      15 | system user                |                    | NULL               | Query   |   27964 | Waiting for preceding transactio
n to commit         | NULL                             |
|      16 | system user                |                    | NULL               | Query   |   27964 | Waiting for preceding transactio
n to commit         | NULL                             |
|      17 | system user                |                    | NULL               | Query   |   27964 | Waiting for preceding transactio
n to commit         | NULL   

Show engine innodb status

---TRANSACTION 24093172627, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 177, OS thread handle 139965536835328, query id 6080407878 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172626, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 176, OS thread handle 139965669926656, query id 6080407875 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172625, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 174, OS thread handle 139966133860096, query id 6080407872 Waiti
ng for preceding transaction to commit
---TRANSACTION 24093172623, ACTIVE (PREPARED) 28709 sec
5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 3
MySQL thread id 175, OS thread handle 139965670328064, query id 6080407868 Waiti
ng for preceding transaction to commit
--------

can't stop group replication, client hang

mysql> stop group_replication;

also can't stop mysql service, need using kill -9 to terminate mysql process.
[19 Jun 2022 20:33] Scott Nemes
The severity of this bug should be higher. We are currently hitting this bug on a daily basis, which completely locks up the replica, and requires the replica to be killed and restarted. Running 8.0.26.
[28 Jun 2022 15:34] MySQL Verification Team
Hi All,

Subsequent to my last message where I wrote:

>
> Has anybody seen this hangup in 8.0.23 or newer,  which got this WL implemented:
> https://dev.mysql.com/worklog/task/?id=13574 ??
>

There has been another fix in 8.0.28.  
Bug#32891221 REPLICA HANGS WITH REPLICA_PRESERVE_COMMIT_ORDER ON
https://bugs.mysql.com/bug.php?id=103636

So please try newer versions.
Thanks!
--
sbester
[3 Nov 2022 17:50] Perry Harrington
Was this bug fixed in 5.7?  Sven stated it was part of WL#13574 or "subsequent bugfixes".

I see the reference to a subsequent bug fixed in 8.0.28, but I wonder if the fix was also applied to 5.7, since it was in the affected versions?

Thanks,

--Perry