Bug #41634 Agent seems to fail to reconnect to database (and web ui events shown unclearly)
Submitted: 19 Dec 2008 12:10 Modified: 6 Mar 2009 20:10
Reporter: Simon Mudd (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S3 (Non-critical)
Version:2.0.0.7111 OS:Linux (CentOS 4 x86_64)
Assigned to: CPU Architecture:Any

[19 Dec 2008 12:10] Simon Mudd
Description:
The agent log reports this:

[root@bc07bprdb-01 agent]# tail -f mysql-monitor-agent.log 
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 162752' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 162753' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 165157' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 165195' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 166723' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 166725' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 166726' has was for old session-id: 1229447376604.132, current session-id: 1229525379133.74. Trashing it.
2008-12-18 10:55:00: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)
2008-12-18 10:56:00: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)
2008-12-18 11:04:10: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)

[root@bc07bprdb-01 agent3]# /opt/mysql/enterprise/agent/etc/init.d/mysql-monitor-agent restart
Shutting down MySQL Enterprise agent service....           [  OK  ]
Starting MySQL Enterprise agent service...                 [  OK  ]

The server was rebooted yesterday for a memory upgrade.

[root@bc07bprdb-01 agent]# uptime
 12:59:39 up 1 day,  1:55,  1 user,  load average: 0.77, 1.14, 1.17
[root@bc07bprdb-01 agent3]# mysql -e 'show status like "UPTIME"'
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime        | 93352 | 
+---------------+-------+
[root@bc07bprdb-01 agent]#

The restart script did not indicate the agent was not running so I assume it was. However the merlin server indicates that the server was not reachable.
I'll enclose a screen dump to show this.

How to repeat:
N/A

Suggested fix:
Please add a recover message so it's clear in the log that the problem has been resolved. If I can repeat this I will but I'm not sure I'll be able.
[19 Dec 2008 12:19] Simon Mudd
change Synopsis to be more precise
[19 Dec 2008 12:47] Simon Mudd
Same issue with another server. Same symptoms.

Server rebooted for new memory to be added.

# uptime
 13:38:37 up 1 day, 33 min,  1 user,  load average: 0.14, 0.04, 0.01
# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 53904
Server version: 5.0.68-enterprise-gpl-log MySQL Enterprise Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

root@master [(none)]> show status like 'uptime';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime        | 88388 | 
+---------------+-------+
1 row in set (0.00 sec)

root@master [(none)]> exit
Bye

# tail /opt/mysql/enterprise/agent/mysql-monitor-agent.log 
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 86125' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 86126' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 88662' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 88700' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 89910' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 89912' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-17 15:49:43: (critical) network-io.c:843: response to 'command: collect_data, task-id: 89913' has was for old session-id: 1229447376672.143, current session-id: 1229525379213.93. Trashing it.
2008-12-18 12:00:00: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)
2008-12-18 12:39:01: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)
2008-12-18 13:05:25: (critical) agent_mysqld.c:600: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3306, socket = ''): Can't connect to MySQL server on '127.0.0.1' (111) (mysql-errno = 2003)

The agent IS running.

# /opt/mysql/enterprise/agent/etc/init.d/mysql-monitor-agent start
Starting MySQL Enterprise agent service...(already running)[  OK  ]
[22 Dec 2008 16:54] Diego Medina
Hi Simon,

Thanks for your bug report. Looking at your log, I see that you get the error about not being able to connect to the mysql server a few times, but the agent actually tries to reconnect to the mysql server once every minute if there is a problem.

What I think happened in your case is that there were "some" connectivity problems, but the agent recovered. I say this because on the last log you posted, you get the error more or less every 30 minutes.

I tried reproducing the bug but once I restarted the mysql server, the agent was able to connect to it.

You have a valid point when asking for more event entries. I will file a separate feature request for it.

Thank you.
[22 Dec 2008 19:41] Diego Medina
See http://bugs.mysql.com/bug.php?id=41679 for the feature request
[23 Jan 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[23 Jan 2009 7:43] Simon Mudd
You close this ticket saying it is a duplicate of 41679 which I wrote afterwards.
If so please close the ticket and link the 2 bugs together.

The agent failing to notify of a correct reconnection to the database is still a problem in 2.0.2.7133 IIRC so the bug and the feature request are 2 different things.

Therefore I'm not sure it's appropriate to close this ticket.
[6 Feb 2009 7:55] Simon Mudd
After recent upgrade to server 2.0.4.7138 and agent 2.0.4.7139 we still have the same issue.

This is mainly manifested after a host/server restart.
- the agent comes up before the mysqld server, so the mysql instance is NOT running
- the mysqld comes up and works fine.
- the agent does not realise (does not try again to check?) that the server is up and running.
- hence reporting shown on the merlin server indicates that the mysql instance is not running when it is.

This bug is shown as non-critical, but for successful monitoring it is critical as it provides misleading "mysql down" indications to the users of merlin.

It thus wastes our time, and may lead us to ignore a real warning of the same type.

Therefore please consider increasing the severity of this bug.
[18 Feb 2009 18:54] Diego Medina
This is related to http://bugs.mysql.com/bug.php?id=42581
[6 Mar 2009 20:10] Mark Matthews
Treating this as a duplicate of Bug#42581 - since fixing that bug, makes this issue go away, other than the feature request of Bug#41679