Bug #41068 | Agent runs out of filedescriptors, does not recover | ||
---|---|---|---|
Submitted: | 27 Nov 2008 11:09 | Modified: | 27 Feb 2009 11:19 |
Reporter: | Kay Roepke | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S1 (Critical) |
Version: | 2.0.0.7102 | OS: | Any |
Assigned to: | MC Brown | CPU Architecture: | Any |
[27 Nov 2008 11:09]
Kay Roepke
[2 Dec 2008 17:20]
Gary Whizin
1. update docs to explain how user can bump at the OS level 2. agent should try to increase at startup (like mysql server does) and add message level log entry either way
[17 Feb 2009 20:44]
Diego Medina
Verified fixed on 2.0.5.7144 Using debug log level I see (debug) chassis.c:1091: current RLIMIT_NOFILE = 256 (hard: 9223372036854775807) (debug) chassis.c:1095: trying to set new RLIMIT_NOFILE = 8192 (hard: 9223372036854775807) (debug) chassis.c:1103: set new RLIMIT_NOFILE = 8192 (hard: 9223372036854775807)
[27 Feb 2009 11:19]
Tony Bedford
An entry was added to the 2.0.5 changelog: In some circumstances the agent/proxy ran out of file descriptors, causing secondary failures. It could not recover from that state. The relevant part of the log file is shown here: 2008-11-27 11:11:00: (critical) last message repeated 2 times 2008-11-27 11:11:00: (critical) job_collect_os.c:411: sigar_cpu_info_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:445: sigar_cpu_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:411: sigar_cpu_info_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:445: sigar_cpu_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:411: sigar_cpu_info_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:445: sigar_cpu_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:411: sigar_cpu_info_list_get() failed 2008-11-27 11:11:00: (critical) job_collect_os.c:445: sigar_cpu_list_get() failed 2008-11-27 11:11:30: (critical) network-socket.c.292: socket(127.0.0.1:3306) failed: Too many open files (24) 2008-11-27 11:11:30: (critical) proxy-plugin.c.1532: Cannot connect, all backends are down. 2008-11-27 11:20:22: (critical) last message repeated 4 times 2008-11-27 11:20:22: (critical) network-io.c:215: curl_easy_perform('https://user:password@merlin-dashboard:443/heartbeat') failed: SSL connection timeout (curl-error = 'Timeout was reached' (28), os-error = 'Connection refused' (111))