Bug #59439 | Agent should reconnect on error 2013 (lost connecition during query) | ||
---|---|---|---|
Submitted: | 12 Jan 2011 8:51 | Modified: | 9 Jan 2015 16:38 |
Reporter: | Daniël van Eeden | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S2 (Serious) |
Version: | 2.3.1.2044 | OS: | Any |
Assigned to: | Jan Kneschke | CPU Architecture: | Any |
[12 Jan 2011 8:51]
Daniël van Eeden
[12 Jan 2011 9:33]
Andrii Nikitin
Verified looking at logs. Agent must never stop monitoring on any condition except manual shutdown.
[4 Feb 2011 13:25]
Enterprise Tools JIRA Robot
Mark Leith writes: Assigning to Jan to take a look at, escalation requested by Support.
[18 Feb 2011 12:28]
Andrii Nikitin
The bug happens if query to mysql.inventory is being executed longer than 30 seconds. (Probably some java timeout is involved.) How to repeat: 1. Stop monitored mysql instance 2. wait 2 minutes so agent disconnects from monitored mysql instance 3. Start monitored mysql instance and immediatelly lock mysql.inventory table: LOCK TABLES mysql.inventory WRITE; 4. Agent will shutdown in next 2 minutes showing "error 2013" while reading mysql.inventory table
[14 Mar 2011 13:53]
Mark Matthews
Andrii, No java involved here. These queries are being executed by libmysql in the agent.
[14 Mar 2011 14:11]
Andrii Nikitin
Hi Mark, Yes, not sure why I mentioned Java here. I found race conditions in agent source code, hope it will be fixed soon.
[22 Apr 2011 22:18]
Marcos Palacios
Verified fixed on 2.3.2.2053 (critical) agent_mysqld.c:712: [127.0.0.1:5132] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', (critical) last message repeated 2 times (critical) agent_mysqld.c:712: [127.0.0.1:5132] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 61 (mysql-errno = 2013)
[20 May 2011 12:32]
MySQL Verification Team
I still see the same problem with 2.3.2.2054 and 2.3.3 as well. Using the steps Andrii mentioned above to simulate the problem. Agent should try and reconnect and not shut down. The error for not connecting and for not being able to read the mysql.inventory table if it locked is 'similar'. The original bug report is for not being able to read mysql.inventory and shutting down. 2011-05-19 17:15:04: (critical) MySQL Monitor Agent 2.3.2.2054 started. 2011-05-19 17:15:04: (critical) network-io.c:324: successfully reconnected to dashboard at http://agent@localhost:18080/heartbeat 2011-05-19 17:15:04: (critical) agent_mysqld.c:734: successfully connected to database at 127.0.0.1:3307 as user agent (with password: YES) 2011-05-19 17:20:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013) 2011-05-19 17:21:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013) 2011-05-19 17:22:01: (critical) agent_mysqld.c:712: [127.0.0.1:3307] agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 3307, socket = ''): Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (mysql-errno = 2013) 2011-05-19 17:23:01: (critical) agent_mysqld.c:734: successfully connected to database at 127.0.0.1:3307 as user agent (with password: YES) 2011-05-19 17:23:31: (critical) agent_mysqld.c:523: [127.0.0.1:3307] mysql_real_query("SELECT value FROM mysql.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the mysql.inventory table. This is necessary for correct execution of the agent. Shutting down now.
[20 May 2011 14:52]
Simon Mudd
A similar issue also seen in 2.3.3 and earlier (since 2.1 as reported IIRC): 2011-05-19 19:09:25: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat 2011-05-19 23:47:24: (critical) network-io.c:273: curl_easy_perform('https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat';) failed: couldn't connect to host (curl-error = 'Couldn't connect to server' (7), os-error = 'Connection timed out' (110)) 2011-05-19 23:47:27: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat 2011-05-20 06:40:11: (critical) network-io.c:273: curl_easy_perform('https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat';) failed: couldn't connect to host (curl-error = 'Couldn't connect to server' (7), os-error = 'Connection timed out' (110)) 2011-05-20 06:40:14: (critical) network-io.c:324: successfully reconnected to dashboard at https://merlin_agent@merlin-server.subdomain.example.com:443/heartbeat 2011-05-20 08:36:02: (critical) agent_mysqld.c:745: successfully connected to database at 127.0.0.1:3306 as user agent_user (with password: YES) 2011-05-20 08:36:40: (critical) agent_mysqld.c:745: successfully connected to database at 127.0.0.1:3306 as user agent_user (with password: YES) 2011-05-20 08:37:10: (critical) agent_mysqld.c:530: [127.0.0.1:3306] mysql_real_query("SELECT value FROM mysql.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the mysql.inventory table. This is necessary for correct execution of the agent. Shutting down now. ## ## puppet to the rescue... ## 2011-05-20 08:58:59: (critical) MySQL Monitor Agent 2.3.3.2061 started.
[4 Nov 2012 23:22]
Craig Castle-Mead
Same issue with 2.3.7.2103 2012-11-02 10:56:05: (critical) exception received from server: Internal error: known items are required for list_instances processing 2012-11-02 22:57:04: (critical) exception received from server: Internal error: known items are required for list_instances processing 2012-11-02 23:46:11: (critical) agent_mysqld.c:774: successfully connected to database at /sql_data/data51/mysql.sock as user root (with password: YES) 2012-11-02 23:46:21: (critical) agent_mysqld.c:550: [unix:/sql_data/data51/mysql.sock] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uuid'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. Shutting down now. 2012-11-02 23:47:55: (critical) network-io.c:282: curl_easy_perform('http://agent@HOSTNAME:18080/heartbeat';) failed: Operation timed out after 120001 milliseconds with 0 bytes received (curl-error = 'Timeout was reached' (28)) 2012-11-02 23:49:55: (critical) network-io.c:282: curl_easy_perform('http://agent@HOSTNAME:18080/heartbeat';) failed: Operation timed out after 120002 milliseconds with 0 bytes received (curl-error = 'Timeout was reached' (28))
[8 Apr 2013 15:11]
Aftab Khan
I think it should retrying after N number of attempts before it gives up: [code] 2013-04-06 08:50:20: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES) 2013-04-06 08:50:30: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui d'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. 2013-04-06 08:50:30: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES) 2013-04-06 08:50:40: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui d'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. 2013-04-06 08:50:40: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES) 2013-04-06 08:50:50: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui d'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. 2013-04-06 08:50:50: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES) 2013-04-06 08:51:00: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui d'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. 2013-04-06 08:51:00: (critical) agent_mysqld.c:1231: successfully connected to database at 127.0.0.1:3306 as user agent (with password: YES) 2013-04-06 08:51:10: (critical) agent_mysqld.c:1349: [127.0.0.1:3306] mysql_real_query("SELECT value FROM `mysql`.inventory WHERE name = 'uui d'") failed: Lost connection to MySQL server during query (errno=2013) Could not get this instance's uuid from the `mysql`.inventory table. This is necessary for correct execution of the agent. [/code] The problem we are seeing due to this behavior, and had no choice to stop mysql agent permanently, as it often cause MySQL to stop responding and we had to force restart MySQL server i.e. kill -9. MySQL error log shows following: [code] SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 9679315, signal count 10074229 --Thread 47475361331520 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 --Thread 47474958076224 has waited at lock/lock0lock.c line 5489 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 --Thread 47474026314048 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 --Thread 47474025912640 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 --Thread 47474490321216 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 --Thread 47474489116992 has waited at lock/lock0lock.c line 3898 for 18.00 seconds the semaphore: Mutex at 0x3c44118 created file srv/srv0srv.c line 945, lock var 1 waiters flag 1 [/code]