Bug #78006 IO replication threads hang on master
Submitted: 10 Aug 2015 7:27 Modified: 10 Aug 2015 8:37
Reporter: Manoj Bhalerao Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.12 OS:CentOS (6.2)
Assigned to: CPU Architecture:Any

[10 Aug 2015 7:27] Manoj Bhalerao
Description:
We have MySQL server replication hierarchy setup as below

db01   - Master
srch02 - Master + Slave of db01
srch01 - slave of srch02
srch03 - slave of srch02

We have faced below issues few times.

1) While replication is smooth for few days, suddenly slave server receive error 2013 and (lost connnection with master during query). Under such circumstances, the IO process on master server remains in 'init' state, slaves starts creating new SQL IO threads every one hour, and all these processes representing Slave IO thread on master remains in 'init' state forever. 'Show slave status' on slave status show Slave_IO and Slave_SQL both as running but master log positions do not change at all. After few hours, master server has many processes (depending upon the number of hours and slaves) and starts showining high CPU utilzations. Stopping slave does not stops these processes on master. Nor does killing these processes on master stops them. 'Show processlist' shows them 'Killed' but CPU utilization does not come down and stays high forever. The only option remains is to restart mysql server on master which also takes a lot of time under these circumstances.

2)
While above problem happens suddenly and hence I don't have any reproduction steps, similar issue occurred recently where slaves servers lost connection to master (srch02) due to master shutdown. It was observed that slaves retried to connect to master many times and as per log they eventually succeeded after master was up again, however slaves still kept creating new SQL threads every one hour and this caused master server high CPU utilization, process on master were hung doing nothing useful. In this particular case slaves remained as same master log position even after slave mysql restart or slave restart.

How to repeat:
For problem 2) above.

Below were steps:
1. Add two slaves for a master server.
2. Keep the system running for few days with heavy read/write activity.
3. Stop mysql on master
4. Restart master server.
[10 Aug 2015 7:28] Manoj Bhalerao
Attached error log on one of the slave server.
[10 Aug 2015 8:37] MySQL Verification Team
Hi Manoj,

Thank you for taking the time to report a problem.  
But version 5.6.12 is very old and many bugs were fixed since. Please upgrade to current version 5.6.26, try with it and inform us if problem persists. Please download a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.

Thanks,
Umesh