Bug #23943 | Heat Chart too slow to show "down" server | ||
---|---|---|---|
Submitted: | 3 Nov 2006 14:31 | Modified: | 14 Dec 2006 4:20 |
Reporter: | Carsten Segieth | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Server | Severity: | S2 (Serious) |
Version: | 1.0.0 | OS: | Any (All) |
Assigned to: | Darren Oldag | CPU Architecture: | Any |
Tags: | heat chart, mer100 readme, up/down status |
[3 Nov 2006 14:31]
Carsten Segieth
[3 Nov 2006 14:45]
Carsten Segieth
agent log (log-level = message)
Attachment: mysql-service-agent.3307.zip (application/zip, text), 23.63 KiB.
[3 Nov 2006 14:46]
Carsten Segieth
here the times, the corresponding agent log is already attached: 14:30:00 stopped the agent (by mistake, but also a good test) 14:31:56 dashboard shows dead agent ----- 1:56 min 14:32:15 start agent 14:32:32 dashboard shows running agent ----- 0:17 min --> excellent time 14:33:00 stopped the MySQL server 14:35:15 dashboard shows dead server ----- 2:15 min 14:35:50 started the MySQL server 14:38:19 dashboard shows running server ----- 2:29 min 14:39:59 apache, tomcat, mysql log files and dumps saved, see https://intranet.mysql.com/~csegieth/merlin/VMXP2_*_2006-11-03-14.39.59,59.zip
[11 Dec 2006 16:45]
Darren Oldag
the architecture of the advisor evaluation still had some cruft from when it was non-"datum callback" based. fixes: 1) always update when a new datum is received. then, check frequency and consistency to evaluate on THIS callback. 2) base the eval time on the Datum timestamp that caused the evaluation. There were 'missing' evaluations due to the uncertainty of thread scheduling and using the system time as the evaluation time. putting in these fixes means the FIRST time the server gets a down indication from the agent, it will evaluate as such and show the server down. likewise, it will do the same for server up. there is the potential to speed up the server.* status data collections, too, but right now it is still one minute. NOTE: additional fix (for grins) honor the 'active' column from AgentMonitoring so a properly shut down AGENT will register down instantaneously. it falls back to checking heartbeat interval just in case the agent disappeared in a non-normal fashion. i had implemented this fix before in another patch, but that patch was never approved because it was mixed with something else. but, this seems like the proper place and time to do it.
[14 Dec 2006 4:20]
Bill Weber
Verified fixed in build 1.0.1.4391.