Description:
use sysbench 1.0.13 (using bundled LuaJIT 2.1.0-beta2) to test MTS replication,when test with parameters below:
sysbench --test=/root/benchSQL/oltp_legacy/oltp.lua --oltp_tables_count=10 --oltp-table-size=200000 --mysql-user=root   --mysql-socket=/data/test/data/mysql.sock --threads=128 --time=120 --events=0 --oltp-test-mode=complex run
the os7.0 quickly hang on with "Waiting for dependent transaction to commit"., and lag increase :
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 421
                  Master_UUID: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for dependent transaction to commit
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf:1892412-2959062
            Executed_Gtid_Set: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf:1-1893240
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
the processlist look like this:
+------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
| Id   | User        | Host      | db   | Command | Time  | State                                       | Info             |
+------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
|  100 | system user |           | NULL | Connect | 75381 | Waiting for master to send event            | NULL             |
|  101 | system user |           | NULL | Connect | 75375 | Waiting for dependent transaction to commit | NULL             |
|  102 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  103 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  104 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  105 | system user |           | NULL | Connect | 76614 | System lock                                 | NULL             |
|  106 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  107 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  108 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
|  109 | system user |           | NULL | Connect | 76614 | Waiting for an event from Coordinator       | NULL             |
| 7641 | root        | localhost | NULL | Query   |     0 | starting                                    | show processlist |
+------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
11 rows in set (0.00 sec)
i can't stop replication with command "stop slave" even can't stop mysql unless use kill -9 (mysql will quickly restart itself quickly). when restart mysql, then start replication again , it replicate several trx then hang on again with the same reason .
if i restart mysql which hang on use "kill -9" , disable MTS before start replication, then it goes normal.
firstly i guess maybe it's a msyql version bug ,so i replace 5.7.21 official rpm to test, then it repeats again. then i try to change centos 7.0.1406(3.10.0-123.el7.x86_64) to  centos 7.4.1708(3.10.0-693.el7.x86_64) , and both version on os7.4 run MTS replication normally.
even i copy the fault data, rpm , my.cnf on os7.0 to os7.4, then restart replication , "Waiting for dependent transaction to commit" never cause hang on under os7.4
finally, i run MTS on os7.0 to make it repeat the error , kill the mysql, upgrade 7.0 to 7.4 , retest it and never found hang on again on 7.4
so i confuse whether it's a mysql's bug or system's bug 
my my.cnf:
[mysql]
[mysqld]
#Innodb
innodb_log_file_size = 2G
innodb_log_files_in_group = 3
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2
innodb_read_io_threads = 32
innodb_write_io_threads = 32
innodb_thread_concurrency = 256
innodb_commit_concurrency = 256
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_max_dirty_pages_pct=20
innodb_flush_neighbors = 0
innodb_open_files = 65535
innodb_buffer_pool_size = 32G
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout=60
#server
autocommit=on
character-set-server = utf8
disable-partition-engine-check
explicit_defaults_for_timestamp=true
general-log=0
interactive_timeout = 28800
join_buffer_size = 1M
lock_wait_timeout=60
log-output = FILE
log_timestamps=SYSTEM
long_query_time = 0.2
lower_case_table_names = 1
max_allowed_packet=1G
max_connections = 8000
max_heap_table_size = 128k
max_prepared_stmt_count = 16382
net_buffer_length=32k
open_files_limit = 65535
query_cache_type = 0
read_rnd_buffer_size = 128k
skip_name_resolve
slow_launch_time = 1
slow-query-log
sort_buffer_size = 2M
table_definition_cache=4000
table_open_cache = 65535 
thread_cache_size = 500
default_time_zone='+8:00'
tmp_table_size = 32M
transaction-isolation = READ-COMMITTED
wait_timeout=28800
#replication
skip-slave-start
expire_logs_days = 30
gtid-mode=ON
enforce-gtid-consistency
log-bin = mysql-bin
log-bin-index = mysql-bin.index
binlog-format = ROW
log_slave_updates
sync_binlog = 1000
relay-log = relay-bin
relay-log-index = relay-bin.index
binlog_cache_size = 32k
binlog_rows_query_log_events=on
binlog_row_image=FULL
port = 3306
server_id = 2
datadir = /data/test/data
socket = /data/test/data/mysql.sock
pid-file = /data/test/data/mysql.pid
tmpdir = /data/test/tmp
ssl-ca = /data/test/data/ca.pem
ssl-cert = /data/test/data/server-cert.pem
ssl-key = /data/test/data/server-key.pem
log-error = /data/test/logs/error.log
general_log_file = /data/test/logs/general.log
slow_query_log_file = /data/test/logs/slow.log
#####
slave-parallel-type=LOGICAL_CLOCK
slave-parallel-workers=8
master_info_repository=TABLE
relay_log_info_repository=TABLE
relay_log_recovery=ON
slave_preserve_commit_order = ON
How to repeat:
make a mts replicaiton mysql ,then run  sysbench
sysbench --test=/root/benchSQL/oltp_legacy/oltp.lua --oltp_tables_count=10 --oltp-table-size=200000 --mysql-user=root   --mysql-socket=/data/test/data/mysql.sock --threads=128 --time=120 --events=0 --oltp-test-mode=complex run
  
 
 
 
 
 
 
 
Description: use sysbench 1.0.13 (using bundled LuaJIT 2.1.0-beta2) to test MTS replication,when test with parameters below: sysbench --test=/root/benchSQL/oltp_legacy/oltp.lua --oltp_tables_count=10 --oltp-table-size=200000 --mysql-user=root --mysql-socket=/data/test/data/mysql.sock --threads=128 --time=120 --events=0 --oltp-test-mode=complex run the os7.0 quickly hang on with "Waiting for dependent transaction to commit"., and lag increase : Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 421 Master_UUID: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf:1892412-2959062 Executed_Gtid_Set: 471d803d-3bb6-11e8-8f86-28a6dbbf5ebf:1-1893240 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: the processlist look like this: +------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+ | 100 | system user | | NULL | Connect | 75381 | Waiting for master to send event | NULL | | 101 | system user | | NULL | Connect | 75375 | Waiting for dependent transaction to commit | NULL | | 102 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 103 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 104 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 105 | system user | | NULL | Connect | 76614 | System lock | NULL | | 106 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 107 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 108 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 109 | system user | | NULL | Connect | 76614 | Waiting for an event from Coordinator | NULL | | 7641 | root | localhost | NULL | Query | 0 | starting | show processlist | +------+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+ 11 rows in set (0.00 sec) i can't stop replication with command "stop slave" even can't stop mysql unless use kill -9 (mysql will quickly restart itself quickly). when restart mysql, then start replication again , it replicate several trx then hang on again with the same reason . if i restart mysql which hang on use "kill -9" , disable MTS before start replication, then it goes normal. firstly i guess maybe it's a msyql version bug ,so i replace 5.7.21 official rpm to test, then it repeats again. then i try to change centos 7.0.1406(3.10.0-123.el7.x86_64) to centos 7.4.1708(3.10.0-693.el7.x86_64) , and both version on os7.4 run MTS replication normally. even i copy the fault data, rpm , my.cnf on os7.0 to os7.4, then restart replication , "Waiting for dependent transaction to commit" never cause hang on under os7.4 finally, i run MTS on os7.0 to make it repeat the error , kill the mysql, upgrade 7.0 to 7.4 , retest it and never found hang on again on 7.4 so i confuse whether it's a mysql's bug or system's bug my my.cnf: [mysql] [mysqld] #Innodb innodb_log_file_size = 2G innodb_log_files_in_group = 3 innodb_autoinc_lock_mode = 2 innodb_flush_log_at_trx_commit = 2 innodb_read_io_threads = 32 innodb_write_io_threads = 32 innodb_thread_concurrency = 256 innodb_commit_concurrency = 256 innodb_io_capacity=2000 innodb_io_capacity_max=4000 innodb_max_dirty_pages_pct=20 innodb_flush_neighbors = 0 innodb_open_files = 65535 innodb_buffer_pool_size = 32G innodb_flush_method=O_DIRECT innodb_lock_wait_timeout=60 #server autocommit=on character-set-server = utf8 disable-partition-engine-check explicit_defaults_for_timestamp=true general-log=0 interactive_timeout = 28800 join_buffer_size = 1M lock_wait_timeout=60 log-output = FILE log_timestamps=SYSTEM long_query_time = 0.2 lower_case_table_names = 1 max_allowed_packet=1G max_connections = 8000 max_heap_table_size = 128k max_prepared_stmt_count = 16382 net_buffer_length=32k open_files_limit = 65535 query_cache_type = 0 read_rnd_buffer_size = 128k skip_name_resolve slow_launch_time = 1 slow-query-log sort_buffer_size = 2M table_definition_cache=4000 table_open_cache = 65535 thread_cache_size = 500 default_time_zone='+8:00' tmp_table_size = 32M transaction-isolation = READ-COMMITTED wait_timeout=28800 #replication skip-slave-start expire_logs_days = 30 gtid-mode=ON enforce-gtid-consistency log-bin = mysql-bin log-bin-index = mysql-bin.index binlog-format = ROW log_slave_updates sync_binlog = 1000 relay-log = relay-bin relay-log-index = relay-bin.index binlog_cache_size = 32k binlog_rows_query_log_events=on binlog_row_image=FULL port = 3306 server_id = 2 datadir = /data/test/data socket = /data/test/data/mysql.sock pid-file = /data/test/data/mysql.pid tmpdir = /data/test/tmp ssl-ca = /data/test/data/ca.pem ssl-cert = /data/test/data/server-cert.pem ssl-key = /data/test/data/server-key.pem log-error = /data/test/logs/error.log general_log_file = /data/test/logs/general.log slow_query_log_file = /data/test/logs/slow.log ##### slave-parallel-type=LOGICAL_CLOCK slave-parallel-workers=8 master_info_repository=TABLE relay_log_info_repository=TABLE relay_log_recovery=ON slave_preserve_commit_order = ON How to repeat: make a mts replicaiton mysql ,then run sysbench sysbench --test=/root/benchSQL/oltp_legacy/oltp.lua --oltp_tables_count=10 --oltp-table-size=200000 --mysql-user=root --mysql-socket=/data/test/data/mysql.sock --threads=128 --time=120 --events=0 --oltp-test-mode=complex run