Bug #59439 Agent should reconnect on error 2013 (lost connecition during query)
Submitted: 12 Jan 2011 8:51 Modified: 9 Jan 2015 16:38
Reporter: Daniël van Eeden Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S2 (Serious)
Version:2.3.1.2044 OS:Any
Assigned to: Jan Kneschke CPU Architecture:Any

[12 Jan 2011 8:51] Daniël van Eeden
Description:
Agent logs:
2011-01-11 17:16:00: (critical) agent_mysqld.c:512: [127.0.0.1:3306] mysql_real_query("SELECT value FROM mysql.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013)
Please GRANT SELECT ON mysql.inventory TO 'monitoragent'@'%';
This necessary for correct execution of the agent. Shutting down now.

Server error log:
110111 17:16:21 [Warning] Aborted connection 10 to db: 'unconnected' user: 'monitoragent' host: 'localhost' (Got an error reading communication packets)
110111 17:16:21 [Warning] Aborted connection 280340 to db: 'unconnected' user: 'monitoragent' host: 'localhost' (Got an error reading communication packets)

The agent and server are on the same host

How to repeat:
Force error 2013.

Expected: agent reconnects

Actual: agent stops

Suggested fix:
if ($errorno == 2013) {
  reconnect();
}
[12 Jan 2011 9:33] Andrii Nikitin
Verified looking at logs.

Agent must never stop monitoring on any condition except manual shutdown.
[4 Feb 2011 13:25] Enterprise Tools JIRA Robot
Mark Leith writes: 
Assigning to Jan to take a look at, escalation requested by Support.
[18 Feb 2011 12:28] Andrii Nikitin
The bug happens if query to mysql.inventory is being executed longer than 30
seconds. (Probably some java timeout is involved.)

How to repeat:

1. Stop monitored mysql instance
2. wait 2 minutes so agent disconnects from monitored mysql instance
3. Start monitored mysql instance and immediatelly lock mysql.inventory
table:

LOCK TABLES mysql.inventory WRITE;

4. Agent will shutdown in next 2 minutes showing "error 2013" while reading
mysql.inventory table
[14 Mar 2011 13:53] Mark Matthews
Andrii,

No java involved here. These queries are being executed by libmysql in the agent.
[14 Mar 2011 14:11] Andrii Nikitin
Hi Mark,

Yes, not sure why I mentioned Java here.
I found race conditions in agent source code, hope it will be fixed soon.
[22 Apr 2011 22:18] Marcos Palacios
Verified fixed on 2.3.2.2053

(critical) agent_mysqld.c:712: [127.0.0.1:5132] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', (critical) last message repeated 2 times
(critical) agent_mysqld.c:712: [127.0.0.1:5132] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 61 (mysql-errno = 2013)
[20 May 2011 12:32] MySQL Verification Team
I still see the same problem with 2.3.2.2054 and 2.3.3 as well. Using the steps Andrii mentioned above to simulate the problem.

Agent should try and reconnect and not shut down. The error for not connecting and for not being able to read the mysql.inventory table if it locked is 'similar'. The original bug report is for not being able to read mysql.inventory and shutting down.

2011-05-19 17:15:04: (critical) MySQL Monitor Agent 2.3.2.2054 started.
2011-05-19 17:15:04: (critical) network-io.c:324: successfully reconnected to dashboard at http://agent@localhost:18080/heartbeat
2011-05-19 17:15:04: (critical) agent_mysqld.c:734: successfully connected to database at 127.0.0.1:3307 as user agent (with password: YES)
2011-05-19 17:20:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port 
= 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013)
2011-05-19 17:21:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port 
= 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013)
2011-05-19 17:22:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port 
= 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013)
2011-05-19 17:23:01: (critical) agent_mysqld.c:734: successfully connected to database at 127.0.0.1:3307 as user agent (with password: YES)
2011-05-19 17:23:31: (critical) agent_mysqld.c:523: [127.0.0.1:3307] mysql_real_query("SELECT value FROM mysql.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the mysql.inventory table.
This is necessary for correct execution of the agent. Shutting down now.
[20 May 2011 14:52] Simon Mudd
A similar issue also seen in 2.3.3 and earlier (since 2.1 as reported IIRC):

2011-05-19 19:09:25: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat
2011-05-19 23:47:24: (critical) network-io.c:273: curl_easy_perform('https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat') failed: couldn't connect to host (curl-error = 'Couldn't connect to server' (7), os-error = 'Connection timed out' (110))
2011-05-19 23:47:27: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat
2011-05-20 06:40:11: (critical) network-io.c:273: curl_easy_perform('https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat') failed: couldn't connect to host (curl-error = 'Couldn't connect to server' (7), os-error = 'Connection timed out' (110))
2011-05-20 06:40:14: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat
2011-05-20 08:36:02: (critical) agent_mysqld.c:745: successfully connected to database at 127.0.0.1:3306 as user agent_user (with password: YES)
2011-05-20 08:36:40: (critical) agent_mysqld.c:745: successfully connected to database at 127.0.0.1:3306 as user agent_user (with password: YES)
2011-05-20 08:37:10: (critical) agent_mysqld.c:530: [127.0.0.1:3306] mysql_real_query("SELECT value FROM mysql.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the mysql.inventory table.
This is necessary for correct execution of the agent. Shutting down now.
##
## puppet to the rescue...
## 
2011-05-20 08:58:59: (critical) MySQL Monitor Agent 2.3.3.2061 started.
[4 Nov 2012 23:22] Craig Castle-Mead
Same issue with 2.3.7.2103

2012-11-02 10:56:05: (critical) exception received from server: Internal error: known items are required for list_instances processing
2012-11-02 22:57:04: (critical) exception received from server: Internal error: known items are required for list_instances processing
2012-11-02 23:46:11: (critical) agent_mysqld.c:774: successfully connected to database at /sql_data/data51/mysql.sock as user root (with password: YES)
2012-11-02 23:46:21: (critical) agent_mysqld.c:550: [unix:/sql_data/data51/mysql.sock] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent. Shutting down now.
2012-11-02 23:47:55: (critical) network-io.c:282: curl_easy_perform('http://agent@HOSTNAME:18080/heartbeat') failed: Operation timed out after 120001 milliseconds with 0 bytes received (curl-error = 'Timeout was reached' (28))
2012-11-02 23:49:55: (critical) network-io.c:282: curl_easy_perform('http://agent@HOSTNAME:18080/heartbeat') failed: Operation timed out after 120002 milliseconds with 0 bytes received (curl-error = 'Timeout was reached' (28))
[8 Apr 2013 15:11] Aftab Khan
I think it should retrying after N number of attempts before it gives up:

[code]
2013-04-06 08:50:20: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES)
2013-04-06 08:50:30: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui
d'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent.
2013-04-06 08:50:30: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES)
2013-04-06 08:50:40: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui
d'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent.
2013-04-06 08:50:40: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES)
2013-04-06 08:50:50: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui
d'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent.
2013-04-06 08:50:50: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES)
2013-04-06 08:51:00: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui
d'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent.
2013-04-06 08:51:00: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES)
2013-04-06 08:51:10: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui
d'") failed: Lost connection to MySQL server during query (errno=2013)
Could not get this instance's uuid from the `mysql`.inventory table.
This is necessary for correct execution of the agent.
[/code]

The problem we are seeing due to this behavior, and had no choice to stop mysql agent permanently, as it often cause MySQL to stop responding and we had to force restart MySQL server i.e. kill -9. MySQL error log shows following:
[code]
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 9679315, signal count 10074229
--Thread 47475361331520 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
--Thread 47474958076224 has waited at lock/lock0lock.c line 5489 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
--Thread 47474026314048 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
--Thread 47474025912640 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
--Thread 47474490321216 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
--Thread 47474489116992 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore:
Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1
waiters flag 1
[/code]