Bug #87747 Open parallel copy, press a lot of transactions, the slave's SQL thread stuck
Submitted: 13 Sep 2017 9:27 Modified: 28 Sep 2017 4:24
Reporter: sandy sandy Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:mysql5.7.19 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: mysql,parallel, SQL threadstuck

[13 Sep 2017 9:27] sandy sandy
Description:
Three node master-slave replication, open the semi synchronous replication and parallel replication, Use the sqltest tool to press 10000 SQL statements to the master, the SQL thread of the slave is jammed;
I'm replicating one database using LOGICAL_CLOCK:
mysql> show variables like 'slave_parallel%';
+------------------------+---------------+
| Variable_name          | Value         |
+------------------------+---------------+
| slave_parallel_type    | LOGICAL_CLOCK |
| slave_parallel_workers | 16            |
+------------------------+---------------+
Only one thread is active, others just wait:
mysql> show processlist;
+-----+-------------+-----------------+--------------------+---------+------+---------------------------------------------+------------------+
| Id  | User        | Host            | db                 | Command | Time | State                                       | Info             |
+-----+-------------+-----------------+--------------------+---------+------+---------------------------------------------+------------------+
|  14 | system user |                 | NULL               | Connect | 3690 | Waiting for master to send event            | NULL             |
|  15 | system user |                 | NULL               | Connect | 3038 | Waiting for dependent transaction to commit | NULL             |
|  16 | system user |                 | NULL               | Connect | 3029 | Waiting for an event from Coordinator       | NULL             |
|  17 | system user |                 | NULL               | Connect | 3029 | Waiting for an event from Coordinator       | NULL             |
|  18 | system user |                 | NULL               | Connect | 3029 | Waiting for an event from Coordinator       | NULL             |
|  19 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  20 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  21 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  22 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  23 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  24 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  25 | system user |                 | NULL               | Connect | 3029 | Waiting for an event from Coordinator       | NULL             |
|  26 | system user |                 | NULL               | Connect | 3029 | Waiting for an event from Coordinator       | NULL             |
|  27 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  28 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  29 | system user |                 | NULL               | Connect | 3029 | System lock                                 | NULL             |
|  30 | system user |                 | NULL               | Connect | 3030 | Waiting for an event from Coordinator       | NULL             |
|  31 | system user |                 | NULL               | Connect | 3030 | Waiting for an event from Coordinator       | NULL             |
|  53 | ebaseomm    | localhost: | performance_schema | Sleep   |   50 |                                             | NULL             |
| 318 | root        | localhost       | NULL               | Query   |    0 | starting                                    | show processlist |
+-----+-------------+-----------------+--------------------+---------+------+---------------------------------------------+------------------+
View slave's status
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Slave_SQL_Running_State: Waiting for dependent transaction to commit

How to repeat:
Three node master-slave replication, open the semi synchronous replication and parallel replication, Press 10000 SQL statements to the master, the SQL thread of the slave is jammed
[14 Sep 2017 14:08] MySQL Verification Team
Hi,

> Three node master-slave replication
> the SQL thread of the slave is jammed;

Can you give more data on the setup please. 

You are using:

Master -> Slave
  |
  V
Slave

or:

Master -> Slave -> Slave

Can you please share config from all three servers.

Thanks
Bogdan
[15 Sep 2017 1:36] sandy sandy
1、Three node master-slave replication
I am using:

Master -> Slave
  |
Slave
2、Configuration file content
[mysqld1]
bind_address =127.0.0.1
port     = 5581
socket   = /home/mysql17/bin/mysql1.sock
datadir  = /home/mysql17/data/data
innodb_data_home_dir = /home/mysql17/data/data
innodb_log_group_home_dir = /home/mysql17/data/redo
log-error=/home/mysql17/log/mysqld1.log
pid-file=/home/mysql17/bin/mysqld1.pid
secure_file_priv=
show_compatibility_56 = ON

#replication options
server-id=18
log-bin=../binlog/log-bin
relay-log=../relaylog/relay-bin
max_binlog_size=10485760
binlog_format=ROW
log-slave-updates=1

#semi_sync options on master-host
rpl_semi_sync_master_enabled = ON
rpl_semi_sync_master_timeout = 10000
rpl_semi_sync_master_trace_level = 64

#semi_sync option on slave-host
rpl_semi_sync_slave_enabled = ON
rpl_semi_sync_slave_trace_level = 32

gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=16
relay_log_recovery=ON

3、Every time, when I press a lot of data,show slave status will appear Slave_SQL_Running_State: Waiting for dependent transaction to commit,has been waiting for a long time, SQL thread still stuck there, no longer playback relaylog
[28 Sep 2017 4:24] MySQL Verification Team
Hi,

I tried generating different load (my own app, sysbench..) and I was not able to reproduce this.

Can you reproduce this using sysbench on your system for e.g. ?

all best
Bogdan