Bug #1563 Replication terminates due to slave process time count being wildly wrong
Submitted: 15 Oct 2003 12:09 Modified: 7 Feb 2004 11:50
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:4.0.16-0 OS:Linux (Redhat Linux 9.0)
Assigned to: Dmitry Lenev CPU Architecture:Any

[15 Oct 2003 12:09] [ name withheld ]
Description:
Setting up replication between 2 machines, both running same versions of Redhat and the Mysql rpms downloaded from mysql.com.
The replication disconnects every 60 seconds.

How to repeat:
Doing a "show processlist" on the slave shows thread number 2 like this :

|  2 | system user |           | NULL | Connect | 4294967295 | Has read all relay log; waiting for the I/O slave thread to update it

The time field is wildly wrong and behaves erratically.
Doing a few quick "show processlist" shows the time counter going from 1,2 etc.
sometimes further, but ultimately jumping to an enormous number at which time it
obviously disconnects.
This results in an error on the slave like this  :

031015 21:07:15  Slave: received 0 length packet from server, apparent master shutdown: 
031015 21:07:15  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'xxxxx-bin.002' position 140752

The slave then reconnects and updates/replicates correctly, after which the timeout happens again with the enormous "time"-number.
[16 Oct 2003 7:42] [ name withheld ]
Just thought I'd add some machine specs :
IBM Aptiva with AMD-K6 at 233MHz, 192MB SDRAM, Seagate ST32122A.
[22 Oct 2003 4:57] [ name withheld ]
Does the same on 4.0.16.

|  1 | system user |           | NULL | Connect | 37   | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 33   | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0    | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time       | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 38         | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 4294967291 | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0          | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time       | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 40         | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 4294967293 | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0          | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time       | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 41         | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 4294967294 | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0          | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time       | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 42         | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 4294967295 | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0          | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 43   | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 0    | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |
|  3 | xxxxx       | localhost | NULL | Query   | 0    | NULL                                                                  | show processlist |
+----+-------------+-----------+------+---------+------+-------------------------------------------------------
[22 Oct 2003 4:58] [ name withheld ]
The .err file in /var/lib/mysql on the slave machine repeats this message every minute :

031022 13:55:42  Slave: received 0 length packet from server, apparent master shutdown: 
031022 13:55:42  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'xxxxx-bin.018' position 23393
031022 13:56:43  Slave: received 0 length packet from server, apparent master shutdown: 
031022 13:56:43  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'xxxxx-bin.018' position 30511
[10 Dec 2003 4:43] Robert Belgau
I have a similiar problem. I running on a Windows platform. My slave runs for quite a long time before it starts disconnecting. An then, whereas you are showing every minuite, mine is every hour. I have let it run for approximately 3 days before stopping and starting the slave (service) to clear the problem. The error log show that the slave disconnected at approximate 13:00 of Friday through 9:00 on Monday, every hour. I've included a sample of  a recent error log below:

 031210  2:56:52  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031210  2:56:52  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'derserver-bin.023' position 1717687
031210  3:57:14  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031210  3:57:14  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'derserver-bin.023' position 1717687
031210  4:57:36  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031210  4:57:36  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'derserver-bin.023' position 1717687
031210  5:57:58  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031210  5:57:58  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'derserver-bin.023' position 1717687
031210  6:58:20  Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031210  6:58:20  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'derserver-bin.023' position 1717687

I ran versions 4.0.12 - and 4.0.13 for almost a full year without this problem
[18 Dec 2003 2:49] [ name withheld ]
Does the same with 4.0.17.
Changed replication slave to a PII 450.
[30 Jan 2004 14:20] Dmitry Lenev
Hi, Sigurd!

Beg your pardon for such a long silence on my side. Does the problem that you had described still happens ?

I was investigating this case for some time now and was able to repeat similar behavior only using very artifical methods: I have to rewind my clock to 
reproduce wild time counts in processlist and have to start two slave servers with THE SAME SERVER ID to be partially able to mimic disconnection ... 

While I'll continue further investigation could you please provide some more information just for excluding such strange (but still possible) cases ?
Could you check that time on your slave goes ok ? (without backward jumps)
Just look at the output of simple bash command:
while true; do date; sleep 1; done 
for some time.

Also it would be great if you can exclude presence of two slave servers for one master (with same server id).

If you have nothing similar to situation I have described it would be great to obtain more information about configuration of your master and slave(s?).

It would be great if you will provide their configuration files, exceprts of error logs from slave and master and if it is possible excerpt from general query log on master (for the period of time where the problem occurs).

Thank you!
[30 Jan 2004 17:29] [ name withheld ]
Hi !

Ok, did the BASH time test for five minutes with no artificial jumps.
Time is reset every morning at 05.45 using ntpdate.

During the same time, the .err log on the slave reports :
040131  2:20:50  Slave: received 0 length packet from server, apparent master shutdown: 
040131  2:20:50  Slave I/O thread: Failed reading log event, reconnecting to retry, log 'linuxen-bin.220' position 12424690
.. every minute.

I have attached the slave cnf file, the master cnf file and the mysqld.log from the master from the same time period.

There are no other slaves on the network.

Let me know if there's something else you need !

Regards,
Sigurd
[30 Jan 2004 17:31] [ name withheld ]
Forgot to mention that the master.err file is empty for the same period, i.e. no error messages from the master in the same 5 minute period.
[30 Jan 2004 18:47] Dmitry Lenev
Thank you for this bug report. To properly diagnose the problem, we
need a backtrace to see what is happening behind the scenes. To
find out how to generate a backtrace, please read
http://www.mysql.com/doc/en/Making_trace_files.html

Once you have generated a backtrace, please submit it to this bug
report and change the status back to 'Open'. Thank you for helping
us make our products better.

Additional info:

Thank you very much Sigurd!!!

One more question: who is that "guy" that according to your master logs 
connecting from 192.168.1.123 and doing Binlog dump command once per minute?
(Isn't it mysqlbinlog ? Or something else ? Which version of mysqlbinlog or of
something else is it ?)

Thank you once again for your response!
[30 Jan 2004 19:01] Dmitry Lenev
Sorry, Sigurd! Please ignore those words about backtrace (have choosen wrong dropdown item).

But I am still very interested in identfying program which connects from 192.168.1.123 host! Looks very suspicious.
[31 Jan 2004 3:08] [ name withheld ]
Hi !

Your suspicions were right.
The other computer connecting was another replication machine running an older
version of MySQL on a Windows machine.
Stopping that resulted in no more disconnects.

I should have paid more attention to the query logs on the master I guess, instead of staring blindly at the slave logs :)

Thanks for your help !

Regards,
Sigurd
[5 Feb 2004 14:56] Dmitry Lenev
Hi once again, Sigurd!

So we confirmed that this wild time counts are not related to those disconnections so I'm decreasing severity for this bug.

But I believe you still experience those wild time counts ? Could you please do me a favor and check if time on your master is several (may be 2 or 3) seconds ahead of time on your slave ?

Thank you!
[6 Feb 2004 1:23] [ name withheld ]
Hi !

The time counts seem to be ok now :
  1 | system user |           | NULL | Connect | 174758 | Waiting for master to send event                                      | NULL             |
|  2 | system user |           | NULL | Connect | 693    | Has read all relay log; waiting for the I/O slave thread to update it | NULL             |

There have also been no disconnects after the older mysql replication slave was removed.

Sorry for the inconvenience !

Regards,
Sigurd Holter
[7 Feb 2004 11:50] Dmitry Lenev
I was able to repeat those wild time counts only if time on master is a bit ahead of time on slave (and code inspection have confirmed this). And judging from your logs I think you have the same problem.

But in this situation those wild time counts are expected and documented behaviour
(see http://www.mysql.com/doc/en/Replication_FAQ.html and for more info). 

So I am closing this report with Not A Bug diagnosis.