Bug #47721 Reboot master, slave doesn't drop tcp connection to master. Breaking replication
Submitted: 29 Sep 2009 17:32 Modified: 29 Sep 2009 20:28
Reporter: Robert Bubon Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.38 OS:Linux (CentOS 5.3 2.6.18-128.7.1.el5)
Assigned to: CPU Architecture:Any
Tags: reboot, replication, tcp connection

[29 Sep 2009 17:32] Robert Bubon
Description:
Background: 
I'm testing mysql-mmm-2.0.9 Master-Master setup and found this problem. I do not believe this is a mmm induced problem.

The test:
Host t1 is writer and is rebooted.
Host t2 becomes new writer
Host t2 replication thread fails drop the tcp connection to master. 
If you stop and start the slave on t2 all is well.
Nothing in the t2.err log relating to this problem.

!!!! Around an 90 minutes later I discovered the slave finally did drop the stale connection and open a new one to the master. 

Problem: Why did t2 not drop the tcp connection to t1 when it rebooted. 

Evidence: Netstat information after reboot of t1

[root@t2 ~]# netstat -a | grep tcp | grep mysql
tcp        0      0 *:mysql                     *:*                         LISTEN
tcp        0      0 t2.comet.ucar.edu:39041     t1.comet.ucar.edu:mysql     ESTABLISHED
tcp        0      0 t2.comet.ucar.edu:mysql     t1.comet.ucar.edu:45189     ESTABLISHED

In the above, the t2 server never dropped tcp connection to t1 after t1 was rebooted. Look at port 39041 above. The t1 server that was rebooted does not show this connection. See below

[root@t1 ~]# netstat -a | grep tcp | grep mysql
tcp        0      0 *:mysql                     *:*                         LISTEN
tcp        0      0 t1.comet.ucar.edu:45189     t2.comet.ucar.edu:mysql     ESTABLISHED

Mysql shows that replication is broken

# t1 master status
mysql> show master status;
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| binary-log.000007 |      106 |              |                  |
+-------------------+----------+--------------+------------------+

# t2 slave status
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 128.117.110.74
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binary-log.000006
          Read_Master_Log_Pos: 1369000
               Relay_Log_File: t2-relay-bin.000031
                Relay_Log_Pos: 1369146
        Relay_Master_Log_File: binary-log.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1369000
              Relay_Log_Space: 1369444
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

Note:
There is no iptables running. 
Complied with 
./configure --prefix=/usr/local/mysql --enable-thread-safe-client

How to repeat:
Setup Linux master and slave.
As root on master type 'reboot'
Monitor mysql tcp connections with

netstat -a | grep tcp | grep mysql
[29 Sep 2009 20:28] Sveta Smirnova
Thank you for the report.

This looks like you are using default slave-net-timeout, see also http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option_mysqld_slave-...