Description:
When the IO_THREAD is killed on the master, somehow the slave loses state i.e. it does not seem to maintain a connection to the master and times out based on slave_net_timeout.
How to repeat:
Setup a simple master-slave, for testing set slave_net_timeout on slave to 10seconds or shorter then, kill the corresponding IO_THREAD connection on the master. With log_warnings = 2 on the master you should see frequent reconnects of the IO_THREAD.
On the slave:
node2 [localhost] {msandbox} ((none)) > show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 55311
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 4205
Relay_Log_File: mysql_sandbox55312-relay-bin.000002
Relay_Log_Pos: 2302
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 4205
Relay_Log_Space: 2471
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 101
1 row in set (0.00 sec)
node2 [localhost] {msandbox} ((none)) > set global slave_net_timeout = 10;
Query OK, 0 rows affected, 1 warning (0.00 sec)
node2 [localhost] {msandbox} ((none)) >
On the master:
node1 [localhost] {msandbox} ((none)) > set global log_warnings = 2;
Query OK, 0 rows affected (0.00 sec)
node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| 6 | system user | | NULL | Connect | 325 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 325 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 8 | rsandbox | localhost:44899 | NULL | Binlog Dump | 325 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)
node1 [localhost] {msandbox} ((none)) > kill 8;
Query OK, 0 rows affected (0.00 sec)
node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| 6 | system user | | NULL | Connect | 337 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 337 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
| 10 | rsandbox | localhost:44906 | NULL | Binlog Dump | 2 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)
node1 [localhost] {msandbox} ((none)) > \q
Bye
[revin@forge rcsandbox_5_5_31]$ tail -f node1/data/msandbox.err
130521 23:15:33 [Note] Server socket created on IP: '127.0.0.1'.
130521 23:15:33 [Note] Event Scheduler: Loaded 0 events
130521 23:15:33 [Note] /wok/bin/mysql/5.5.31/bin/mysqld: ready for connections.
Version: '5.5.31-log' socket: '/tmp/mysql_sandbox55311.sock' port: 55311 MySQL Community Server (GPL)
130521 23:15:36 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox55311-relay-bin' to avoid this problem.
130521 23:15:36 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='55312', master_log_file='', master_log_pos='4'.
130521 23:15:36 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysql_sandbox55311-relay-bin.000001' position: 4
130521 23:15:36 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:55312',replication started in log 'FIRST' at position 4
130521 23:21:11 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:21 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:31 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:41 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
^C
[revin@forge rcsandbox_5_5_31]$
Suggested fix:
Restarting the slave in this case fixes the problem.