Description:
The slave mysqld process sometimes hangs if there are many slave worker threads and under big stress.
The symptom is that the slave process sometimes hangs. Using pstack to print the stacks of each threads of the mysql slave process, I can see all slave worker threads are either waiting for more jobs to work on, or are waiting as a follower for the leader thread to complete. And since the slave sql thread has to wait before prior transactions to commit, it can not give more transactions to the waiting worker threads to work on, so the slave process hangs.
So I saw the code of Stage_manager::signal_done() function and found that the mysql_cond_broadcast(&m_cond_done) is called after mysql_mutex_unlock(&m_lock_done), but according to the doc of pthread_cond_broadcast, it should be called before the mutex is released:
In this page, http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html
Notice this sentence: "however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal()."
So I made such changes and then the hang never happened again in many following test runs. Patch is attached.
How to repeat:
I set up a simple binlog replication with one slave replicating from one master, the config files for them are as below. The slave and master simply runs on one computer server (running a linux variant of centos, Linux TENCENT64.site 3.10.0-1-tlinux2-0014.tl2 #1 SMP Wed Mar 11 11:43:03 CST 2015 x86_64 x86_64 x86_64 GNU/Linux) with 24 CPUs, massive memory and fast SSD storage.
I use sysbench to do stress testing, sysbench parameters: ./sysbench --num-threads=200 --test=./tests/db/oltp.lua --oltp-table-size=100000 --oltp_auto_inc=off --oltp-tables-count=10 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3338 --mysql-user=root --mysql-password=root --mysql-db=test --max-time=3600 --max-requests=0 --mysql-engine-trx=yes run
Slave config file:
[mysqld]
datadir=/mysql_data/mysql-instances/d/datadir
port=3339
tmpdir=/mysql_data/mysql-instances/d/tmpdir
socket=/mysql_data/mysql-instances/d/mysql.sock
server-id=13
innodb_flush_log_at_trx_commit=1
sync_binlog=1
log-bin=binlog
slave_parallel_type=logical_clock
slave_parallel_workers=128
gtid-mode=ON
enforce-gtid-consistency
relay-log=relay_log
log_slave_updates
Master config file:
[mysqld]
datadir=/mysql_data/mysql-instances/c/datadir
port=3338
tmpdir=/mysql_data/mysql-instances/c/tmpdir
socket=/mysql_data/mysql-instances/c/mysql.sock
server-id=12
innodb_flush_log_at_trx_commit=1
sync_binlog=1
log-bin=binlog
slave_parallel_type=logical_clock
max_connections=255
gtid-mode=ON
enforce-gtid-consistency
Suggested fix:
In Stage_manager::signal_done() function, signal all follower threads before releasing the 'm_lock_done' mutex.
Patch as below:
diff --git a/sql/binlog.cc b/sql/binlog.cc
index e8ddabd..ef34f35 100644
--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -1950,8 +1950,8 @@ void Stage_manager::signal_done(THD *queue)
mysql_mutex_lock(&m_lock_done);
for (THD *thd= queue ; thd ; thd = thd->next_to_commit)
thd->get_transaction()->m_flags.pending= false;
- mysql_mutex_unlock(&m_lock_done);
mysql_cond_broadcast(&m_cond_done);
+ mysql_mutex_unlock(&m_lock_done);
}
#ifndef DBUG_OFF