Bug #28674 mysql replication fails when network connection restarted/disturbed
Submitted: 25 May 2007 11:12 Modified: 7 Jul 2007 11:50
Reporter: kris B Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0.16 OS:Linux (Redhat 9)
Assigned to: Assigned Account CPU Architecture:Any

[25 May 2007 11:12] kris B
Description:

      I configured master-master replication on two machines. Things were fine for some time. When the network connection between the two boxes disconnected and connected again, the slave master is taking more time to sync data after reconnection. The master-connect-retry paramater is set to default value. 

Eventhough, the scenario is not reproduced, every time the network is disturbed, some times it is becoming a problem to restart the sync process. 

Please help in resolving this issue.

Thanks in advance.

Find the mysql config file and errors i got below:

Log file:
----------
070503 15:45:24 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.16-standard-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL) 

070503 15:45:24 [Note] Slave SQL thread initialized, starting replication in log 'interceptor-bin.000001' at position 376111, relay log './interceptor-relay-bin.000002' position: 241

------------> This is where the sync process stops
070503 15:45:27 [ERROR] Slave I/O thread: error connecting to master 'sqluser2@192.168.150.230:3306': Error: 'Lost connection to MySQL server during query'  errno: 2013  retry-time: 60  retries: 86400

070503 15:46:27 [Note] Slave I/O thread: connected to master 'sqluser2@192.168.150.230:3306',  replication started in log 'interceptor-bin.000001' at position 376111
-----------------------------------------------

my.cnf file:
Primary Master config file:
----------------------------
[mysqld]
server-id=100
log-bin=/var/lib/mysql/bin.log
log-bin-index=/var/lib/mysql/log-bin.index
relay-log=/var/lib/mysql/relay.log
relay-log-index=/var/lib/mysql/relay-log.index

log-slave-updates
replicate-same-server-id=0
auto_increment_increment=1
auto_increment_offset=1
skip-slave-start
log-slow-queries
log-slow-admin-statements
log-error=/var/log/interceptor-mysql.log

master_host=192.168.150.24
master_user=sqluser1
master_password=password
report_host=192.168.150.23

Secondary Master config file:
---------------------------

[mysqld]
server-id=20
log-bin=/var/lib/mysql/bin.log
log-bin-index=/var/lib/mysql/log-bin.index
relay-log=/var/lib/mysql/relay.log
relay-log-index=/var/lib/mysql/relay-log.index

log-slave-updates
replicate-same-server-id=0
auto_increment_increment=1
auto_increment_offset=1
skip-slave-start
log-slow-queries
log-slow-admin-statements
log-error=/var/log/interceptor-mysql.log

master_host=192.168.150.24
master_user=sqluser2
master_password=password
report_host=192.168.150.23

master_host=192.168.150.23
master_user=sqluser2
master_password=password
report_host=192.168.150.24

How to repeat:
Establish master master replication between two boxes with the specified configuration. Assume two boxes are named primary and secondary. Populate primary master with some huge amount of data(for ex, add more than 5000 records). The same data is reflecting the in Secondary slave server. 

    Now disconnect the network connection. On Secondary master, delete previously added records and add same or more number of records as earlier. Reconnect the network connection between the two boxes. 

    Now verify in Primary master. After waiting for "master-connect-retry" time or more, it is observed that data is not replicating from secondary master to primary master.
Note: "show slave status" is showing the Slave_IO_thread and Slave_SQL_thread entries as YES.

Eventhough, this is not happening every time we did the above steps, the problem persists frequently.
[28 May 2007 10:00] Sveta Smirnova
Thank you for the report.

But version 5.0.16 is quite old and many replication bugs were fixed since. Please upgrade to current 5.0.41 version, try with it and say us results.
[28 May 2007 13:56] kris B
As we are using 5.0.16 with our product, changing the version may affect other modules also. Before taking decision to upgrade to 5.0.41, i want to know the why this happen?

Thanks in advance.
[7 Jun 2007 11:50] Sveta Smirnova
Thank you for the feedback.

>Before taking decision to upgrade to 5.0.41, i want to know the why this happen?

Error messages in the initial post can show problem same as in bug #26489 or bug #25737. But in this case Slave_IO_Running and Slave_SQL_Running values wouldn't be "Yes".

Please indicate value of Seconds_Behind_Master on primary master.
[7 Jul 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".